Home
Mellanox WinOF VPI User Manual
Contents
1. Mellanox Technologies 163 Rev 5 10 Step 3 Configure the Provider Address and Route records on Hyper V Host 1 Host 1 Only mtlael4 SNIC Get NetAdapter Port1 New NetVirtualizationProviderAddress InterfaceIndex NIC InterfaceIndex ProviderAddress 192 168 20 114 PrefixLength 24 New NetVirtualizationProviderRoute InterfaceIndex NIC InterfaceIndex DestinationPrefix 0 0 0 0 0 NextHop 192 168 20 1 Step 5 Configure the Virtual Subnet ID on the Hyper V Network Switch Ports for each Virtual Machine on each Hyper V Host Host 1 and Host 2 Run the command below for each VM on the host the VM is running on it i e the for mtlael4 005 mtlael4 006 on host 192 168 20 114 and for VMs mtlael5 005 mtlael15 006 on host 192 168 20 115 mtlael4 only Get VMNetworkAdapter VMName mtlae14 005 where MacAddress eq 00155D720100 Set VMNet workAdapter VirtualSubnetID 5001 Get VMNetworkAdapter VMName mtlae14 006 where MacAddress eq 00155D720101 Set VMNet workAdapter VirtualSubnetID 5001 A 2 Adding NVGRE Configuration to Host 15 Example The following is an example of adding NVGRE to Host 15 On both sides vSwitch create command Note that vSwitch configuration is persistent no need to configure it after each reboot New VMSwitch VSwMLNX NetAdapterName Portl AllowManagementOS true Shut down VMs Stop VM Name mtlael15 005 Force Co
2. For example in the configuration of HyperV with VMQ in case of an error detection no action will be taken Mellanox Technologies 136 Rev 5 10 3 10 NIC Resiliency NIC may unexpectedly hang due to failures in either one of the hardware firmware or software In these cases the problematic device should be isolated in order to prevent the non responsive NIC from back pressuring the entire cluster In addition to isolating the device this feature helps maintaining the ability to recover when exiting the hang state For information about the relevant registry keys for this feature please refer to Section 3 6 9 3 NIC Resiliency Registry Keys on page 115 Mellanox Technologies 137 Rev 5 10 4 Utilities 4 1 Snapshot Tool The snapshot tool scans the machine and provide information on the current settings of the oper ating system networking and hardware It is highly recommended to add this report when you contact the support team ae 4 1 4 Snapshot Usage The snapshot tool can be found at lt installation_directory gt tools MLNX_System_Snapshot exe The user can set the report location gt To generate the snapshot report Step 1 Optional Change the location of the generated file by setting the full path of the file to be generated or by pressing Set target file and choosing the directory that will hold the gener ated file and its file name Step2 Click on Generate HTML button
3. L L C iem 9 EgIEIBr m NIE K Mellanox Virtual Miniport Driver Team Team 7 gd mtx w2012r202 SIS Computer Dece ipe Network adapers E Disk drives Y My Display adapters Manufacturer Mellanox IEEE 1394 Bus host controllers Location lpkhown KK Monitors E3 K Network adapters Broadcom Netxtreme Gigabit Ethernet diccrc N X Broadcom Netxtreme Gigabit Ethernet 2 This device is working properly 5 S Broadcom Netxtreme Gigabit Ethernet 3 X Broadcom Netxtreme Gigabit Ethernet 4 X Mellanox Connecti 3 Ethernet Adapter S Mellanox ConnectX 3 Ethernet Adapter 2 S Mellanox Virtual Miniport Driver Team Team1 la E jh Other devices 5 Base System Device m Base System Device jy Base System Device jg Base System Device fp Base System Device a Base System Device jy Base System Device 4 Panos jjj Base System Device y Base System Device Pm jy Base System Device jg Base System Device iy Base System Device tp Base System Device r Device status Mellanox Technologies 43 Rev 5 10 4 Inthe VLAN tab click on New and fill up the details General VLAN Driver Details ans PMUNS_EN VLAN MLNX_EN LAN Mellanox x VLAN Name lant VLANs associated with this adapter VIANName do VLAN ID 1 VLAN Priority fi BI This dialog allows you to enter or modify the following VLAN The name can be any unique alphanumeric string The ID is a numbe
4. Mellanox Technologies 105 Rev 5 10 Table 14 Ethernet Registry Keys Value Name Default Description Value UseRSSForUDP 1 Used to execute RSS on UDP and Raw IP packet In forwarding scenario you can improve the performance by disable RSS on UDP or raw packet In such a case all the receive processing of these packets is done on the processor that was defined in DefaultRecvRingProcessor registry key The valid values are e O disabled e 1 Enabled Note This registry value is not exposed via UI SingleStream 0 It used to get the maximum bandwidth when using single stream traffic When setting the registry key to enabled the driver will forward the sending packet to another CPU This decrease the CPU utilization of the sender and allows sending in higher rate The valid values are e O disabled e 1 Enabled Note only relevant for Ethernet and IPoIB IgnoreFCS 0 The valid values are e 0 disabled e 1 enabled When enabled the device is configured to 1 Pass packets with FCS error to the driver the default is to drop FCS corrupted packets 2 Pass the 4 bytes of the FCS to the driver the default is to strip them Mellanox Technologies 106 Rev 5 10 3 6 6 1 Flow Control Options This group of registry keys allows the administrator to control the TCP IP traffic by pausing frame transmitting and or receiving operations By enabling the Flow Control mecha
5. single port tratfic each time Single stream traffic Improving performance tor running single stream tratfic each tme Dusi port traffic Improving performance for running traffic on both ports simutaneousty Clicking the Run Tuning button activates the general tuning as explained above and changes several driver registry entries for the current adapter and its sibling device once the sibling is an Ethernet device as well It also generates a log including the applied changes Users can view this log to restore the previous values The log path is SHOMEDRIVES Windows System32 LogFiles PerformanceTunning log Mellanox Technologies 120 l Rev 5 10 This tuning is required to be performed only once after the installation is completed and on one adapter only as long as these entries are not changed directly in the registry or by some other installation or script Ea Mellanox Technologies 121 Rev 5 10 3 8 1 5 1 Performance Tuning Tool Application You can also activate the performance tuning through a script called perf_tuning exe This script has 4 options which include the 3 scenarios described above and an additional manual tuning through which you can set the RSS base and number of processors for each Ethernet adapter The adapters you wish to tune are supplied to the script by their name according to the Network Connections Synopsis perf tuning exe s cl first connection name gt c2 s
6. vea man q vea man 4 3 4 Help Message gt To view the help message run the following command gt vea man vea man h If your adapter name has spaces in it you need to surround it with quotes T Examples n gt vea man a Ethernet 9 Adds a new adapter as a virtual duplicate of Ethernet 9 gt vea man r Ethernet 13 Removes virtual ethernet adapter Ethernet 13 Mellanox Technologies 141 Rev 5 10 4 4 InfiniBand Fabric Diagnostic Utilities The diagnostic utilities described in this chapter provide means for debugging the connectivity and status of InfiniBand IB devices in a fabric 4 4 1 Utilities Usage Common Configuration Interface and Addressing This section first describes common configuration interface and addressing for all the tools in the package Then it provides detailed descriptions of the tools themselves including operation synopsis and options descriptions error codes and examples Topology File Optional An InfiniBand fabric 1s composed of switches and channel adapter HCA TCA devices To iden tify devices in a fabric or even in one switch system each device is given a GUID a MAC equivalent Since a GUID is a non user friendly string of characters it is better to alias it to a meaningful user given name For this objective the IB Diagnostic Tools can be provided with a topology file which is an optional configuration file specifying the IB fabric topology in
7. Adapter Information Mellanox Driver Version 4 61 9938 0 Firmware Version 2 33 5000 Port Number 1 Bus Type PCI E 8 0 Gbps x8 Link Speed Part Number MCX354A FCCT Device Id 4103 Revision ld 0 Current MAC Address 02 02 C9 E9 C1 91 Pemanent MAC Address 02 02 C9 E9 C1 91 Network Status Disconnected Adapter Friendly Name Ethemet 4 IPv4 Address 169 254 1 40 Adapter User Name Qdfff IPoIB Save To File To save this information for debug purposes click Save to File and provide the output file name 3 2 5 Assigning Port IP After Installation For more information on port configuration please refer to Section 3 1 2 Assigning Port IP After Installation on page 30 under the Ethernet Network 3 2 6 Receive Side Scaling RSS For more information on port configuration please refer to Section 3 1 12 Receive Side Scaling RSS on page 58 under the Ethernet Network 3 2 7 Multiple Interfaces over non default PKeys Support 3 2 7 1 System Requirements Operating Systems Windows Server 2008 R2 Windows Server 2012 and Windows Server 2012 R2 3 2 7 2 Using Multiple Interfaces over non default PKeys OpenSM enables the configuration of partitions PKeys in an InfiniBand fabric IPoIB supports the creation of multiple interfaces via the part_man tool Each of those interfaces can be config Mellanox Technologies 61 Rev 5 10 ured to use a different partition from the ones that were confi
8. 1i Microsoft ACPI Compliant System jE Microsoft Generic IPMI Compliant Device 2 5 Extracting Files Without Running Installation To extract the files without running installation perform the following steps Step 1 Open a CMD console Windows Server 2012 2012 R2 Click Start gt Task Manager gt File gt Run new task gt and enter CMD Step 2 Extract the driver and the tools gt MLNX VPI WinOF 5 10 All win2012 x64 exe a To extract only the driver files gt MLNX VPI WinOF 5 10 All win2012 x64 exe a vMT DRIVERS ONLY 1 Step 3 Click Next to create a server image Welcome to the InstallShield Wizard for MLNX VPI The InstallShield R Wizard will install MLNX VPI on your computer To continue click Next WARNING This program is protected by copyright law and international treaties Mellanox Technologies 21 Rev 5 10 Step 4 Click Change and specify the location in which the files are extracted to Network Location Specify a network location For the server image of the product Enter the network location or click Change to browse to a location Click Install to create a server image of MLNX VPI at the specified network location or click Cancel to exit the wizard Network location rr Step 5 Click Install to extract this folder or click Change to install to a different folder Network Location Specify a network location For the server image
9. The fabric must use the same protocol stack in order for nodes to communicate cu The default RoCE mode in Windows is MAC based The default RoCE mode in Linux is IP based In order to communicate between Windows and Linux over RoCE please change the RoCE mode in Windows to IP based 3 1 4 3 RoCE v2 UDP Port In RoCEv2 the RDMA payload is encapsulated as UDP payload with a specific UDP destination port number indicating that the payload is RDMA Prior to WinOF Rev 5 10 the destination port number indicating RoCEv2 traffic was 1021 Start ing WinOF Rev 5 10 the default destination port number used is 4791 This is to comply with The Internet Assigned Numbers Authority IANA guidance The UDP destination port is a configurable parameter of the driver For its registry key please refer to Table 20 RoCE Options on page 114 3 1 4 3 1 Driver Upgrade Considerations Since the default RoCEv2 port is changed in WinOF 5 10 50000 upgrade from an older version that uses the RoCEv2 with the default port will effectively change the port used for RoCEv2 Therefore on a system that uses an older version with RoCEv2 and the default port when upgrading to Rev 5 10 50000 or newer it is advised that the entire group of computers be upgraded at the same time in order to maintain RoCEv2 connectivity Mellanox Technologies 36 Rev 5 10 To allow gradual upgrade without affecting the RoCEv2 connectivity it is possible to over
10. includes additional layer two protocol overhead Mellanox Technologies 135 Rev 5 10 Table 27 RDMA Activity RDMA Activity Counters Description RDMA Inbound Frames sec The number in frames of layer two frames that carry incoming RDMA traffic RDMA Initiated Connections The number of outbound connections established RDMA Outbound Bytes sec The number of bytes for all outgoing RDMA traffic This includes additional layer two protocol overhead RDMA Outbound Frames sec The number in frames of layer two frames that carry outgoing RDMA traffic a These counters are only implemented in NDK and are not implemented in NDSPI 3 9 System Recovery upon Error Detection Upon error detection WinOF can initiate reset in order to recover from the error automatically WinOF differentiates between two types of resets Software reset upon error detection WinOF automatically closes and re opens all NDIS resources No HCA reset is performed Hardware reset HCA is reset all driver resources NDK and NDIS automatically close and re open WinOF handles the reset flow as follows Table 28 RDMA Activity SR IOV Configuration 1P91B Native HyperV with VF oru VF GR IOV Host gu RoCE Ethernet VMQ over Machine PF KVM ESX HyperV ROUTE Software Software No operation Software Software No operation reset reset silent success reset reset silent success
11. 3 1 5 2 Adapter Teaming Adapter teaming can group a set of ports inside a network adapter or a number of physical net work adapters into virtual adapters that provide the fault tolerance and load balancing functions Depending on the teaming mode one or more interfaces can be active The non active interfaces in a team are in a standby mode and will take over the network traffic in the event of a link failure in the active interfaces All of the active interfaces in a team participate in load balancing opera tions by sending and receiving a portion of the total network traffic Teaming Types 1 Fault Tolerance Provides automatic redundancy for the server s network connection If the primary adapter fails the secondary adapter currently in a standby mode takes over Fault Tolerance is the basis for each of the following teaming types and is inherent in all teaming modes 2 Switch Fault Tolerance Provides a failover relationship between two adapters when each adapter is connected to a separate switch 3 Send Load Balancing Provides load balancing of transmit traffic and fault tolerance The load balancing performs only on the send port 4 Load Balancing Send amp Receive Provides load balancing of transmit and receive traffic and fault tolerance The load balancing splits the transmit and receive traffic statically among the team adapters without changing the base of the traffic loading based on the source destination MAC and
12. TxForwardingProcessor n Operating Systems support NDIS6 2 RssBaseProcNumber MaxRssProcessors n Operating Systems support NDIS6 3 NumRSSQueues RssMaxProcNumber Examples For example if the adapter is represented by Local Area Connection 6 and Local Area Con nection 7 For single port stream tuning type perf tuning exe s cl Local Area Connection 6 c2 Local Area Connection 7 or to set one adapter only perf tuning exe s cl Local Area Connection 6 For single stream tuning type perf tuning exe st cl Local Area Connection 6 c2 Local Area Connection 7 or to set one adapter only perf tuning exe st cl Local Area Connection 6 For dual port streams tuning type perf tuning exe d cl Local Area Connection 6 c2 Local Area Connection 7 For forwarding streams tuning type perf tuning exe f cl Local Area Connection 6 c2 Local Area Connection 7 For manual tuning of the first adapter to use RSS on CPUs 0 3 perf tuning exe m cl Local Area Connection 6 b 0 n 4 In order to restore defaults type perf tuning exe r cl Local Area Connection 6 c2 Local Area Connection 7 Mellanox Technologies 125 Rev 5 10 3 8 1 6 SR IOV Tuning To achieve best performance on SR IOV VF please run the following powershell commands on the host Set VMNetworkAdapter Name Network Adapter VMName vml IovQueuePairsRequested 4 OR Set VMNetworkAdapter Na
13. perfquery Queries InfiniBand ports performance and error counters Optionally it displays aggregated counters for all ports of a node It can also reset counters after reading them or simply reset them ibping Uses vendor MADs to validate connectivity between IB nodes On exit IP ping like output is shown ibping is run as client server how ever the default is to run it as a client Note also that in addition to ibping a default server is implemented within the kernel ibnetdiscover Performs IB subnet discovery and outputs a readable topology file GUIDs node types and port numbers are displayed as well as port LIDs and NodeDescriptions All nodes and links are displayed full topology Optionally this utility can be used to list the current con nected nodes by node type The output is printed to standard output unless a topology file is specified ibtracert Uses SMPs to trace the path from a source GID LID to a destination GID LID Each hop along the path is displayed until the destination is reached or a hop does not respond By using the m option multicast path tracing can be performed between source and destination nodes sminfo Optionally sets and displays the output of a sminfo query in a readable format The target SM is the one listed in the local port info or the SM specified by the optional SM lid or by the SM direct routed path ibclearerrors Clears the PMA error counters in PortCounters by either
14. uration on page 69 Section 5 3 Ethernet Related Troubleshooting on page 153 Rev 4 70 May 4 2014 Added the following sections Section 2 3 Installing Mellanox WinOF Driver on page 14 Section 2 5 Extracting Files Without Running Installa tion on page 21 Section 3 5 3 5 Removing NVGRE configuration on page 74 Section 3 5 4 Single Root I O Virtualization SR IOV on page 75 Section 3 5 1 Virtual Ethernet Adapter on page 70 Section 3 5 4 2 SR IOV InfiniBand over KVM on page 76 Section 3 1 11 Lossless TCP on page 54 Section 2 9 Booting Windows from an iSCSI Target on page 24 Section 3 6 Configuration Using Registry Keys on page 92 Removed the following sections Documentation Mellanox Technologies 4 J Rev 5 10 Table 1 Document Revision History Document Revision Date Changes Rev 4 60 February 13 2014 Updated the following sections Section 3 5 2 Hyper V with VMQ on page 71 Section 3 5 3 3 Enabling Disabling NVGRE Offload ing on page 73 Added the following sections Section 3 5 3 4 Verifying the Encapsulation of the Traffic on page 74 Section 3 5 1 Virtual Ethernet Adapter on page 70 December 30 Updated the following sections 2013 Section 3 1 4 1 2 Configuring Windows Host on page 34 Updated the example in Step 5 Section 3 8 1 5 1 Performa
15. 3 4 1 2 2 Verifying SMB Configuration Use the following PowerShell cmdlets to verify SMB Multichannel is enabled confirm the adapters are recognized by SMB and that their RDMA capability is properly identified On the SMB client run the following PowerShell cmdlets PS Get SmbClientConfiguration Select EnableMultichannel PS Get SmbClientNetworkInterface On the SMB server run the following PowerShell cmdlets PS Get SmbServerConfiguration Select EnableMultichannel PS Get SmbServerNetworkInterface PS netstat exe xan match 445 3 4 1 2 3 Verifying SMB Connection To verify the SMB connection on the SMB client Step 1 Copy the large file to create a new session with the SMB Server Step 2 Open a PowerShell window while the copy is ongoing Step 3 Verify the SMB Direct is working properly and that the correct SMB dialect is used PS Get SmbConnection PS Get SmbMultichannelConnection PS netstat exe xan match 445 T If you have no activity while you run the commands above you might get an empty list y due to session expiration and no current connections pP 3 4 1 3 Verifying SMB Events that Confirm RDMA Connection To confirm RDMA connection verify the SMB events Step 1 Open a PowerShell window on the SMB client Step2 Run the following cmdlets NOTE Any RDMA related connection errors will be displayed as well PS Get WinEvent LogName Microsoft Windows SMBClient Opera
16. Mellanox Technologies 119 Rev 5 10 3 8 1 5 Tuning the Ethernet Network Adapter The Ethernet Network Adapter general tuning can be performed during installation by modifying some of Windows registries as explained in section Registry Tuning on page 32 Specific sce narios tuning can be set post installation manually To improve the network adapter performance activate the performance tuning tool as fol lows Step 1 Start the Device Manager open a command line window and enter devmgmt msc Step 2 Open Network Adapters Step 3 Select Mellanox Ethernet adapter right click and select Properties Step 4 Select the Performance tab Step 5 Choose one of the tuning scenarios Single port traffic Improves performance for running single port traffic each time Single stream traffic Optimizes tuning for applications with single connection Dual port traffic Improves performance for running traffic on both ports simultaneously Forwarding traffic Improves performance for running scenarios that involve both ports for exam ple via IXIA Multicast traffic Improves performance when the main traffic runs on multicast 6 Click on Run Tuning button Detak Events Power Managemert General Advanced Information Performance Driver AMA Perfomance Tuning Tool Mellanox Tuning Scenario C Single por tralfic e C Mutticast traffic C Single steam traffic Restore Default Settings Run Tuning
17. Outer ETH Header Outer IP GRE Header Inner ETH Header Original Ethernet Payload 3 5 3 5 Removing NVGRE configuration Step 1 Set VSID back to 0 on each Hyper V for each Virtual Machine where VSID was set PS Get VMNetworkAdapter lt VMName gt a where MacAddress eq lt VMMacAddress gt b Set VMNetworkAdapter VirtualSubnetID 0 VMName the name of Virtual machine VMMacAddress the MAC address of VM s network interface associated with vSwitch that was connected to Mellanox device Step2 Remove all lookup records same command on all Hyper V hosts PS Remove NetVirtualizationLookupRecord Step3 Remove customer route same command on all Hyper V hosts PS Remove NetVirtualizationCustomerRoute Step 4 Remove Provider address same command on all Hyper V hosts PS Remove NetVirtualizationProviderAddress Mellanox Technologies 74 J Rev 5 10 Step 5 Remove provider routed for a Hyper V host PS Remove NetVirtualizationProviderRoute Step 6 For HyperV running Windows Server 2012 only disable network adapter binding to ms netwnv service PS Disable NetAdapterBinding EthInterfaceName a ComponentID ms netwnv EthInterfaceName Physical NIC name 3 5 4 Single Root I O Virtualization SR IOV Single Root I O Virtualization SR IOV is a technology that allows a physical PCIe device to present itself multiple times through the PCIe bus This technology enables multiple virtual in
18. Tn enahle Sita Sensing nlease chanse ALITA If the NIC 9 Step3 Inthis step you can perform the following functions Ifyou choose the HW Defaults option the port protocols will be determined according to the NIC s hardware default values Choose the desired port protocol for the available port s If you choose IB or ETH both ends of the connection must be of the same type IB or ETH Enable Auto Sensing by checking the AUTO checkbox If the NIC does not support Auto Sensing the AUTO option will be grayed out If you choose AUTO the current setting will indicate the actual port settings IB or ETH cL For firmware 2 32 5000 and above there is an option to set port personality using mlxconfig tool For further details please refer to MFT User Manual cu Mellanox Technologies 29 Rev 5 10 3 1 2 Assigning Port IP After Installation By default your machine is configured to obtain an automatic IP address via a DHCP server In some cases the DHCP server may require the MAC address of the network adapter installed in your machine To obtain the MAC address Step 1 Open a CMD console Windows Server 2012 2012 R2 Click Start gt Task Manager gt File gt Run new task gt and enter CMD Step 2 Display the MAC address as Physical Address gt ipconfig all Configuring a static IP is the same for both IPoIB and Ethernet adapters gt To assign a static IP a
19. Windows System Snapshot 1 3 SISI xx File About Mellanox TECHNOLOGIES Windows System Information Snapshot Utility Set target file CSU sers Administrator Desktop system_snar Generate HTML Once the report is ready the folder which contains the report will be opened automatically 4 2 part man Virtual IPoIB Port Creation Utility part man is used to add remove show virtual IPoIB ports Each Mellanox IPoIB port can have multiple virtual IPoIB ports which can use the default PKey value Oxffff or a non default value supplied by the user Usage part man exe v lt add rem gt network connection name iname pkey part man exe v lt show remall gt ISSE USR ST hell Mellanox Technologies 138 Rev 5 10 Options Description add Add a virtual adapter rem Remove a virtual adapter When using the rem command provide the connection name of the newly created virtual adapter You may also specify the iname and pkey if needed to disambiguate All are provided by part_man show remall Removal all virtual adapters show Show the existing virtual adapters help Provide help text V Increases the verbosity level h Provides a help text network connection name The name ofa local area connection as in Network Connec tions in Control Panel For example Local Area Connection 2 quotes are necessary around the name only if it contains a space ina
20. ond buffer holds the data This method reduces the cache hits and improves the performance The valid values are 0 disable e 1 enable Note This registry value is not exposed via the UI VlanId eth 0 Enables packets with VlanId It is used when no team IPoIB 0 intermediate driver is used The valid values are 0 disable No Vlan Id is passed 1 4095 Valid Vlan Id that will be passed Note This registry value is only valid for Ethernet TxForwardingProcessor Automati The processor that will be used to forward the packets cally selected sent by the forwarding thread based on RSS Default is based on number of rings and number of configuration cores on the machine Note This registry value is not exposed via the UI DefaultRecvRingProces Automati The type of processor which will be used for the sor cally selected default Receive ring This variable handles packets based on RSS that are not handled by RSS This can be non TCP configuration UDP packets or even UDP packets if they are config ured to use the default ring Note This registry value is not exposed via the UI TxInterruptProcessor Automati The type of processor which will be used to handle the cally selected TX completions The default is based on a number of based on RSS rings and a number of cores on the machine configuration Note This registry value is not exposed via the UI NumRSSQueues eth 8 The maximum number of the RSS queues that the IPo
21. port number Parameters type Description Allowed Values and Default roce_mode DWORD Sets the RoCE mode The fol RoCE MAC Based 0 lowing are the possible RoCE e RoCE IP Based 5 modes e RoCE over IP 1 RoCE MAC Based v1 e RoCE over UPD 2 RoCEIP Based v1 e No RoCE 4 RoCE over IP v1 5 Default No RoCE e RoCE over UDP v2 No RoCE NOTE The default value depends on the WinOF pack age used roce udp dport DWORD Sets the RoCE v2 UDP destina e 1 65535 e Default IANA Port 4791 Mellanox Technologies 114 Rev 5 10 3 6 9 3 NIC Resiliency Registry Keys Table 21 NIC Resiliency Registry Keys Key Name Key Type Values Description DeviceRxStallWatermark DWROD 0 8000 Time period for a single receive packet Default 0 processing that indicates that the packet is about to become stalled Value is given in mSec 0x0 indicates that processing time is not monitored DeviceRxStallTimeout DWROD 0 8000 Time period for a single receive packet Default 1000 processing that indicates that the device is not responsive Value is given in mSec 0x0 indicates that processing time is not monitored 3 6 9 4 General Registry Keys Registry Keys location for machine configuration HKLM SYSTEM CurrentControlSet Services mlx4_bus Parameters Table 22 General Registry Keys Key Name Key Type Values Description AllowResetOnError DWORD e O0 disabl
22. 2 eae c en Functionality n S Microsoft Network Adapter Multiplexor Driver Number of Polls on Receive 4 gt Other devices Preferred NUMA node jg Base System Device fy Base System Device jg Base System Device in Base System Device in Base System Device im Base System Device fm Race Sustem Device 3 1 4 2 RoCEv2 RoCE has two addressing modes MAC based GIDs and IP address based GIDs If the IP address changes while the system is running the GID for the port will automatically be updated with the new IP address using either IPv4 or IPv6 RoCE IP based allows RoCE traffic between Windows and Linux systems which use IP based GIDs by default Mellanox Technologies 34 Rev 5 10 A straightforward extension of the RoCE protocol enables traffic to operate in layer 3 environ ments This capability is obtained via a simple modification of the RoCE packet format Instead of the GRH used in RoCE routable RoCE packets carry an IP header which allows traversal of IP L3 Routers and a UDP header that serves as a stateless encapsulation layer for the RDMA Transport Protocol Packets over IP Figure 2 RoCE and RoCE Frame Format Differences EtherType indicates that packet is RoCE ie next header is IB GRH RoCE RoCEv2 EtherType indicates that packet is IP UDP dport number Indicates i e next header is IP ip protocol number that next header is IB BTH indicates that packet is UDP The proposed RoCE
23. LogFiles PerformanceTunin g log Cl Show release notes Mellanox Technologies 18 Rev 5 10 Ifthe firmware upgrade and the restore of the network configuration fails the following message will be displayed InstallShield Wizard Completed The InstallShield Wizard has successfully installed MLNX VPI Click Finish to exit the wizard You chose to run performance tuning The log File can be found at C windows System32 LogFiles PerformanceTunin g log Firmware upgrade failed with error 8 Error Description We could not burn the new version Please refer to the UM For instructions on how to manually burn firmware We Failed to restore the network configuration with error code 3 Show release notes 2 3 2 Unattended Installation If no reboot options are specified the installer restarts the computer whenever necessary without 4 displaying any prompt or warning to the user P Use the norestart or forcerestart standard command line options to control reboots The following is an example ofa MLNX WinOF win2012 x64 unattended installation session Step 1 Open a CMD console Windows Server 2012 2012 R2 Click Start gt Task Manager gt File gt Run new task gt and enter CMD Step 2 Install the driver Run gt MLNX VPI WinOF 5 10 All win2012 x64 exe S v qn Step 3 Optional Manually configure your setup to contain the logs option MLNX VPI WinOF 5 10 All
24. Mellanox MLNX_VPI ETH ProgramFiles Mellanox MLNX_VPI HW m1x4_bus ProgramFiles Mellanox MLNX_VPI IB IPoIB To see the Mellanox network adapter device and the Ethernet or IPoIB network device depending on the used card for each port display the Device Manager and expand System devices or Network adapters MT WMI default value is True MT RESTORECONF default value is True PERFCHECK default value is True MT NDPROPERTY default value is True Mellanox Technologies 20 J ell dl E l Rev 5 10 Figure 1 Installation Results File Action View Help 9 m 5 Bm i PRS X IBM USB Remote NDIS Network Device X Mellanox ConnectX 3 Ethernet Adapter KP Mellanox ConnectX 3 Ethernet Adapter 2 Microsoft Kernel Debug Network Adapter p Ports COM amp LPT b Print queues b D Processors b P Security devices b 2 Software devices b Storage controllers 4 jE System devices jV ACPI Fixed Feature Button 785 Broadcom BCM5709C NetXtreme II GigE 48 ji Broadcom BCMS709C NetXtreme Il GigE 49 785 Composite Bus Enumerator jM Direct memory access controller j Intel R 5520 5500 Physical and Link Layer Registers Port 1 3427 1M Intel R 5520 5500 Routing and Protocol Layer Register Port 1 3428 785 Intel R 5520 5500 X58 I O Hub Control Status and RAS Registers 3423 ji Intel R ICH10 Family SMBus Controller 3A30 7 Intel R ICH10 LPC Interface Controller 3A18 jE Mellanox ConnectX 3 VPI MT04099 Network Adapter
25. Netxtreme Gigabit Ethernet 2 Mellanox Connectx 3 Ethernet Adapter Mellanox Connectx 3 Ethernet Adapter 2 Ports COM amp LPT Processors 785 System devices Universal Serial Bus controllers T fps EEE H A Step 2 Right click one of Mellanox ConnectX Ethernet adapters under Network adapters list and left click Properties Select the Teaming tab from the Properties window It is not recommended to open the Properties window of more than one adapter at the same time Aha Teaming dialog enables creating modifying or removing a team Note that only Mellanox Technologies adapters can be part of the team To create a new team perform the following Step 1 Click Create Step2 Enter a unique team name Step3 Selecta team type Mellanox Technologies 40 Rev 5 10 Step 4 Select the adapters to be included in the team that have not been associated with a VLAN Step 5 Optional Select Primary Adapter A failover team type implements an active passive scenario where only one interface is active at any given time When the active one is disconnected one of the other interfaces becomes active When the primary link comes up the team interface returns to transfer data using the primary interface If the primary adapter is not selected the primary interface is selected randomly Step 6 Optional Failback to Primary The Failback to Primary option checked box specifies that the team will swi
26. Primary checked when the primary adapter becomes available the team will switch to the primary even though the current active adapter can continue functioning as the active one e Failback to Primary unchecked when the primary adapter becomes available the active adapter will remain active even though the primary can function as the active one Mellanox Technologies 65 J Rev 5 10 General Advanced Information Performance Teaming Driver Details Events Power Management Fail Over Settings Mellanox L Mellanox ConnectX 3 IPoIB Adapter LU Mellanox ConnectX 3 IPoIB Adapter 2 LU Mellanox ConnectX 3 Pro IPoIB Adapter Mellanox ConnectX 3 Pro IPoIB Adapter 2 The administrator can configure a Fail over team of adapters and associate up to 8 Mellanox ConnectX adapters to this team Fail over should be used to increase the system reliability upon a link failure f A team provides redundancy through automatic fail over from an bd oc ores The newly created virtual Mellanox adapter representing the team will be displayed by the Device Manager under Network adapters in the following format see the figure below Mellanox Virtual Miniport Driver Team team name File Action View Help e m Bm amp p R Computer b Disk drives p WS Display adapters b ef DVD CD ROM drives b ta Human Interface Devices b ca IDE ATA ATAPI controllers 4 IEEE 1394
27. Processor E5 Product Family Core i7 IIO PCI Express Root Port 3c 3C0A 1M Intel R Xeon R Processor E5 Product Family Core i7 IIO PCI Express Root Port 2b 3C05 pM Intel R Xeon R Processor E5 Product Family Core i7 IIO PCI Express Root Port 2c 3C06 1M Intel R Xeon R Processor E5 Product Family Core i7 IIO PCI Express Root Port 1a 3C02 1M Intel R Xeon R Processor E5 Product Family Core i7 IIO PCI Express Root Port 1b 3C03 1M Intel R Xeon R Processor E5 Product Family Core i7 IIO PCI Express Root Port 3d 3C0B 1M Intel R Xeon R Processor E5 Product Family Core i7 IIO PCI Express Root Port 2d 3C07 1 Mellanox ConnectX 3 PRO VPI MT04103 Network Adapter 1 Mellanox ConnectX 3 VPI MT04099 Network Adapter l Microsoft ACPI Compliant Power Meter Device 1M Microsoft ACPI Compliant System 1 Microsoft Generic IPMI Compliant Device jE Microsoft System Management BIOS Driver 1 Microsoft Virtual Drive Enumerator 1M NDIS Virtual Network Adapter Enumerator ji PCI Express Root Complex jV PCI Express Root Complex j PCI standard PCI to PCI bridge jE Plug and Play Software Device Enumerator jV Remote Desktop Device Redirector Bus pM System CMOS real time clock 1M System speaker M Sustem timer Mellanox Technologies 60 Rev 5 10 Step2 Select the Information tab from the Properties sheet Details Events Power Management General Advanced Information Performance Driver
28. Results adr cc ke ee ne Rete me e me recs 21 Figure 2 RoCE and RoCE Frame Format Differences 2 0 0 cee eee ee eee 35 Figure 3 ROCE Protocol Stack itsetietoinen een hn 36 Figure 4e gt Lossless TOP in say cet ees ce ee en pee Pe E E ea Peach bei ae es EROR est teres 55 Figure 5 NVGRE Packet Structure 0 0 eect ene e eens 72 Figure 6 Operating System Supports SR IOV 0 0 ccc eee eh 80 Figure 7 SReIOV Supports ccs eerie nwo a ncaa E E TH ween usar etd 81 Figure 8 cHypere Manager scistu eedu airo ed oi eee AGES INQUE ard waives MEA ERR 81 Figure 9 Connect Virtual Hard DISK 82 Figure 10 System Event Log 1 1 eh ra 88 Figure 11 Virtual Switch with SR IOV 0 1 eect e eens 89 Figure 12 Adding a VMNIC to a Mellanox V switch 0 cece eect eee 90 Figure 13 Enable SR IOV on VMNIC 1 0 eee tenn ee eens 91 Figure 14 Virtual Function in the VM 1 ete s 92 Mellanox Technologies 1 J Rev 5 10 Document Revision History Table 1 Document Revision History Document Revision Date Changes Rev 5 10 December 2015 Added an entry to the bottom of Table 38 Virtualiza tion Related Issues on page 157 Updated Section 2 9 Booting Windows from an iSCSI Target on page 24 November 2015 Minor edits User Manual revision number changed to 5 10 instead of 5 10 50000 Updated references to other documents September 2015 Updated the following sect
29. To enable SR IOV using flint Step 1 Download MFT for Windows www mellanox com gt Products gt Software gt Firmware Tools Step 2 Get the device ID look for the pcicont string in the output gt mst status Mellanox Technologies 83 J Rev 5 10 Example MST devices mt4103 pci cro mt4103 pciconf0 Step3 Verify that HCA is configured for SR IOV by dumping the device configuration file to user chosen location ini device file ini flint d device dc ini device file ini Step 4 Verify in the HCA section of the ini that the following fields appear HCA num pfs 1 total vfs 16 Seley m true Warning Care should be taken in increasing the number of VFs All servers A are guaranteed to support 16 VFs More VFs can lead to exceeding the BIOS limit of MMIO available address space Step 5 Ifthe fields do not appear please edit the ini file and add them manually Parameter Recommended Value num_pfs 1 Note This field is optional and might not always appear total vfs lt 0 126 gt The chosen value should be within BIOS limit of MMIO available address space sriov_en true Step 6 Create a binary image using the modified ini file gt mlxburn fw lt fw name gt mlx conf lt ini device file gt ini wrimage lt file name gt bin Step 7 Burn the firmware The file lt file name gt bin is a firmware binary file with SR IOV enabled that has 16 VFs gt flin
30. Work with VLAN in Windows Server 2012 In this procedure you DO NOT create a VLAN rather use an existing VLAN ID To configure a port to work with VLAN using the Device Manager Step 1 Open the Device Manager Step2 Go to the Network adapters Step 3 Right click Properties on Mellanox ConnectX 3 Ethernet Adapter card Step 4 Go to Advanced tab Step 5 Choose the VLAN ID in the Property window Mellanox Technologies 44 Rev 5 10 Step 6 Set its value in the Value window ud Device Manager File Action View Help Details Events Power Management debi BH sa N General Advanced Information Performance Driver p K Monitors The following properties are available for this network adapter Click 4 S Network adapters the property you want to change on the left and then select its value Embedded Broadcom NetXtreme 5721 PCI E Gigabit NIC extlco as SP Embedded Broadcom NetXtreme 5721 PCI E Gigabit NIC 2 Property Value amp Hyper V Virtual Ethernet Adapter 2 RSS load balancing Profle A D SJ KY Mell RSS Maximum Processor Number H amp Mellanox ConnectX 3 Ethernet Adapter 2 Rx Interupt Moderation Profile Lu Mellanox ConnectX 3 Ethernet Adapter 3 Rx Interrupt Moderation Type Microsoft Kernel Debug Network Adapter Send Buffers Y Send Completion Method p TW Ports COM arr TCP UDP Checksum Offload IPv p dh Print queues TCP UDP Checksum Offload IP b D Processors T
31. affect IPoIB performance For the complete list of registry entries that may be added changed by the performance tuning procedure see MLNX VPI WinOF Registry Keys following the path below http www mellanox com page products dyn product family 32 amp mtag windows sw drivers To improve performance activate the performance tuning tool as follows Step 1 Start the Device Manager open a command line window and enter devmgmt msc Step 2 Open Network Adapters Step3 Right click the relevant IPoIB adapter and select Properties Mellanox Technologies 126 Rev 5 10 Step 4 Select the Advanced tab Step 5 Modify performance parameters properties as desired 3 8 3 Tunable Performance Parameters The following is a list of key parameters for performance tuning Jumbo Packet The maximum available size of the transfer unit also known as the Maximum Transmission Unit MTU For IPoIB the MTU should not include the size of the IPoIB header 4B For example if the network adapter card supports a 4K MTU the upper threshold for payload MTU is 4092B and not 4096B The MTU of a network can have a substantial impact on perfor mance A 4K MTU size improves performance for short messages since it allows the OS to coalesce many small messages into a large one Valid MTU values range for an Ethernet driver is between 614 and 9614 Valid MTU values range for an IPoIB driver is between 1500 and 4092 All devices on t
32. and obtain other information about the InfiniBand node ibsysstat is run as client server Default is to run as client saquery Issues the selected SA query Node records are queried by default smpdump Gets SM attributes from a specified SMA The result is dumped in hex by default 4 5 Fabric Performance Utilities The performance utilities described in this chapter are intended to be used as a performance micro benchmark They support both InfiniBand and RoCE T For further information on the following tools please refer to the help text of the tool by running the help command line parameter ao Mellanox Technologies 145 Rev 5 10 Table 30 Fabric Performance Utilities Utility Description nd_write_bw This test is used for performance measuring of RDMA Write requests in Microsoft Windows Operating Systems nd_write_bw is perfor mance oriented for RDMA Write with maximum throughput and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alternatively cus tomized test duration time nd_write_bw runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation nd_write_lat This test is used for performance measuring of RDMA Write requests in Microsoft Windows Operating Systems nd_write_lat is perfor mance oriented for RDMA Writ
33. be trans ferred on multiple inter faces Close the network capture tool on the physical adapter card and set it on the team interface instead No Ethernet connectiv ity on 10Gb adapters after activating Perfor mance Tuning part of the installation A TepWindowSize registry value might have been added Remove the value key under HKEY LOCAL MACHINENSYSTEMNCur rentControlSet Ser vices Tcpip Parameters TcpWind owSize Or e Set its value to oxFFFF Packets are being lost The port MTU might have been set to a value higher than the maximum MTU supported by the switch Change the MTU according to the maxi mum MTU supported by the switch NVGRE changes done on a running VM are not propagated to the VM The configuration changes might not have taken effect until the OS is restarted Stop the VM and afterwards perform any NVGRE configuration changes on the VM connected to the SR IOV enabled virtual switch Mellanox Technologies 154 Rev 5 10 5 4 Performance Related Troubleshooting Table 37 Performance Related Issues Issue Cause Solution Low performance issues The OS profile might notbe 1 Go to Power Options in the Con configured for maximun trol Panel Make sure Maximum performace Performance is set as the power scheme 2 Reboot the machine Flow Control is dis When a kernel debuggeris Setthe registry key as following abled when k
34. incorrect OpenSM behavior Please do not run more than two instances of OpenSM in the subnet 3 2 3 Modifying IPoIB Configuration gt To modify the IPoIB configuration after installation perform the following steps Step 1 Open Device Manager and expand Network Adapters in the device display pane Step2 Right click the Mellanox IPoIB Adapter entry and left click Properties Step3 Click the Advanced tab and modify the desired properties The IPoIB network interface is automatically restarted once you finish modifying IPoIB parame p ters Consequently it might affect any running traffic 3 2 4 Displaying Adapter Related Information To display a summary of network adapter software firmware and hardware related information such as driver version firmware version bus interface adapter identity and network port link information perform the following steps Step 1 Display the Device Manager File Action View Help e9 m B HW FERS 1M Intel R Xeon R Processor E5 Product Family Core i7 IIO PCI Express Root Port 2d 3C07 1M Intel R Xeon R Processor E5 Product Family Core i7 IIO PCI Express Root Port 1b 3003 1M Intel R Xeon R Processor E5 Product Family Core i7 IIO PCI Express Root Port 2a 3004 jE Intel R Xeon R Processor E5 Product Family Core i7 IIO PCI Express Root Port 3a in PCI Express Mode 3C08 1M Intel R Xeon R Processor E5 Product Family Core i7 IIO PCI Express Root Port 3b 3C09 1M Intel R Xeon R
35. issues described in this section If a problem persists and you are unable to resolve it please contact your Mellanox representative or Mellanox Support at support mellanox com Mellanox Technologies 150 Rev 5 10 5 1 Installation Related Troubleshooting Table 31 Installation Related Issues Issue Cause Solution Machine may become Upgrade requires unload There are two solutions for this issue unresponsive during ing the old driver first and If possible load an OS image with driver upgrade from this is when the machine the new driver installed WinOF v4 70 or earlier may become unresponsive Reboot the machine prior to the upgrade operation to reduce the probability of hitting the machine freeze issue The installation of An incorrect driver version Use the correct driver package accord WinOF fails with the might have been installed ing to the CPU architecture following error mes e g you are trying to install sage a 64 bit driver on a 32 bit This installation machine or vice versa package is not sup ported by this pro cessor type Contact your product ven dor The installation of A known issue in windows Follow the recommendation in the arti WinOF fails and reads installer when using the cle as follows The chain MSI feature as installation cannot described in the following be done while the link RDSH service is http remtech word enabled please dis
36. live 16 Protocol UoP 17 Header checksum OxDd7c correct Source 11 7 33 148 11 7 33 148 Destination 11 7 33 149 11 7 33 145 Source GeoIP Unknown Destination GeoIP Unknown User Datagram Protocol Sre Port 49153 49153 Ost Port expl 1021 3 Data 1040 bytes gi 3 1 11 Lossless TCP 3 1 11 1 System Requirements Operating Systems Windows Server 2008 R2 Windows Server 2012 Windows Server 2012 R2 Windows 7 Client and Windows 8 1 Client 3 1 11 2 Using Lossless TCP Inbound packets are stored in the data buffers They are split into Lossy and Lossless according to the priority field in the 802 1Q VLAN tag In DSCP based PFC all traffic is directed to the Lossless buffer Packets are taken out of the packet buffer in the same order they were stored Mellanox Technologies 54 Rev 5 10 and moved into processing where a destination descriptor ring is selected The packet is then scattered into the appropriate memory buffer pointed by the first free descriptor Figure 4 Lossless TCP F gy 5 EJ 2 3 E 2 XOFF threshold When the Lossless packet buffer crosses the XOFF threshold the adapter sends 802 3x pause frames according to the port configuration Global pause or per priority 802 1 Qbb pause PFC where only the priorities configured as Lossless will be noted in the pause frame Packets arriv ing while the buffer is full are dropped immediately During packet processing if the
37. might have been an 1 Open the event log and look under RSS configuration mis System for the mlx4ethx match between the TCP source stack and the Mellanox 2 If found enable RSS run netsh adapter int tcp set global rss enabled or a less recommended suggestion as it will cause low performance Disable RSS on the adapter run netsh int tcp set global rss no dynamic balancing Mellanox Technologies 153 Rev 5 10 Table 36 Ethernet Related Issues Issue Cause Solution The driver fails to start and a yellow sign appears near the Mel lanox ConnectX 10Gb Ethernet Adapter in the Device Manager display Code 10 A hardware error might have occurred Disable and re enable Mellanox Con nectX Adapter from the Device Man ager display In case it does not work refer to support The driver fails to start and in the Event log under the mlx4 bus source the following error Message appears RUN FW command failed with error 22 A wrong firmware image might have been pro grammed on the adapter card See Section 2 7 Firmware Upgrade on page 24 No connectivity to a Fault Tolerance team while using network capture tools e g Wireshark The network capture tool might have captured the network traffic of the non active adapter in the team This is not allowed since the tool sets the packet filter to promiscuous thus causing traffic to
38. of VFs burnt in firmware Step 6 Verify the new values were set correctly PS Get MInxPCIDeviceSriovSetting Example Caption MLNX PCIDeviceSriovSettingData Mellanox ConnectX 3 PRO VPI MT04103 Network Adapter Description Mellanox ConnectX 3 PRO VPI MT04103 Network Adapter ElementName HCA 0 InstanceID PCINVVEN 1583 amp DEV 1007 amp SUBSYS 22F5103C amp REV 00 V24BEO5FFFFBO9E2E000 Name HCA 0 Source amp g SystemName LAB NALABJSS5EE SriovEnable g Trug SriovPortlNumVFs 8 SriovPort2NumVFs 8 SriovPortMode amp 2 PSComputerName Step 7 Check in the System Event Log that SR IOV is enabled Step a Open the View Event Logs Event Viewer Go to Start gt Control Panel gt System and Security gt Administrative Tools gt View Event Logs Event Viewer Step b Open the System logs Event Viewer Local gt Windows Logs gt System Mellanox Technologies 87 J Rev 5 10 Figure 10 System Event Log File Action View Help e ale R Event Viewer Lo ISS Cum Mn mage ees RP Actions b S Custom View G H 4 K Windows Loc Date and Time Source EventID Task Category li yt AA f Applicati 12 11 2014 10 47 53 AM Hyper V VmSwitch 21 1014 Open Saved Lo f Security 12 nbs Ns E WF Create Custom E Setup 12 11 2014 10 47 51 AM mb4eth63 l fal System 12 11 2014 10 47 51 AM mb bus amp Import Custom 9 32 4004 408 4834 iii E Forwarde a p Applications inti Ev
39. ok Downloaded WDSNBP from 11 0 0 83 11 0 0 83 WDSNBP started using DHCP Referral Conta i i p 11 0 0 83 Gateway 9 0 0 0 Architect lt 6 Contacting Server 11 0 0 83 TFTP Download boot x86 wdsnbp com Downloaded WDSNBP from 11 0 0 83 1l winga 083 Press F12 for network service boot ACCNT ECTE n R ntacting Server P Download bootNx64Npxeboot niz Current User s rcon 10 0 72 43 B d 3 Choose the relevant boot image from the list of all available boot images presented windows Boot Manager Server IP 11 0 0 83 choose an operating system to start Use the arrow keys to highlight your choice then press ENTER icrosoft Windows Setup 2012 x64 4 60RC10 Eth gt Microsoft Windows Setup 2012 x64 4 60RC10 IB Microsoft windows PE x64 2012 4 60RC10 VPI Microsoft windows PE x64 2012 4 60RC10 Eth 4 Choose the Operating System you wish to install ii Es 07 Select the operating system you want to install Operating system Language Architecture Da Windows Server 2012 SERVERDATACENTER 4 60RC10 Eth en US x64 j Windows Server 2012 SERVERDATACENTER 4 60RC10 IB en US x64 12 gt Description Windows Server 2012 SERVERDATACENTER 5 Run the Windows Setup Wizard Mellanox Technologies 26 J Rev 5 10 6 Choose iSCSI target drive to install Windows and follow the instructions presented by the installation Wizard Where do you want t
40. packets use a well known UDP destination port value that unequivocally distinguishes the datagram Similar to other protocols that use UDP encapsulation the UDP source port field is used to carry an opaque flow identifier that allows network devices to imple ment packet forwarding optimizations e g ECMP while staying agnostic to the specifics of the protocol header format The UDP source port is calculated as follows UDP SrcPort SrcPort XOR DstPort OR 0xcooo where SrcPort and DstPort are the ports used to establish the connection For example in a Network Direct application when connecting to a remote peer the destination IP address and the destination port must be provided as they are used in the calculation above The source port provision is optional Furthermore since this change exclusively affects the packet format on the wire and due to the fact that with RDMA semantics packets are generated and consumed below the AP applications can seamlessly operate over any form of RDMA service including the routable version of RoCE as shown in Figure 2 RoCE and RoCE Frame Format Differences in a completely transparent wav 1 Standard RDMA APIs are IP based already for all existing RDMA technologies Mellanox Technologies 35 Rev 5 10 Figure 3 RoCE Protocol Stack A RDMA Application T 3 5 2 OFA Open Fabric Alliance Stack v RDMA API Verbs A z 9 z a RoCE v1 RoCE v2 g menee 0
41. set For example crspace fwtrace eq dump and eq print file_index The file number of this type in the set Mellanox Technologies 161 Rev 5 10 Example Name SingleFunc 4 0 0 p000 eth down 1 eq dump O log The default number of sets of files for each event is 20 It can be changed by adding DumpE ventsNum DWORD32 parameter under HKLM System CurrnetControlSet Services mlx4_ bus Parameters and setting it to another value Mellanox Technologies 162 Rev 5 10 Appendix A NVGRE Configuration Scripts Examples The setup is as follow for both examples below Hypervisor mtlael4 Porti 192 168 20 114 24 VM on mtlael4 mtlael14 005 172 16 14 5 16 Mac 00155D720100 VM on mtlael4 mtlael4 006 172 16 14 6 16 Mac 00155D720101 Hypervisor mtlael5 Porti 192 168 20 115 24 VM on mtlae15 mtlael15 005 172 16 15 5 16 Mac 00155D730100 VM on mtlae15 mtlael15 006 172 16 15 6 16 Mac 00155D730101 A 1 Adding NVGRE Configuration to Host 14 Example The following is an example of adding NVGRE to Host 14 On both sides vSwitch create command Note that vSwitch configuration is persistent no need to configure it after each reboot New VMSwitch VSwMLNX NetAdapterName Port1 AllowManagementOS true Shut down VMs Stop VM Name mtlael4 005 Force Confirm Stop VM Name mtlael4 006 Force Confirm Connect VM to vSwitch maybe you have to switch off VM before doing manual does also wor
42. should be lossless all the way in the switches Step2 Set SMB policy to a desired priority only if SMD Traffic running Mellanox Technologies 166 Rev 5 10 Step 3 Recommended Direct ALL TCP UDP traffic to a lossy priority by using the IPProtocol MatchCondition TCP is being used for MPI control channel smpd while UDP is being used for other 3 services such as remote desktop PI Arista switches forwards the pcp bits e g 802 1p priority within the vlan tag from ingress to egress to enable any two End Nodes in the fabric as to maintain the priority along the route In this case the packet from the sender goes out with priority X and reaches the far end node with the same priority X The priority should be losslessin the switches ae gt To force MSMPI to work over ND and not over sockets add the following in mpiexec com mand env MPICH DISABLE ND 0 env MPICH DISABLE SOCK 1 B 5 Configuring MPI Step 1 Configure all the hosts in the cluster with identical PFC see the PFC example below Step2 Run the WHCK ND based traffic tests to Check PFC ndrping ndping ndrpingpong ndpingpong Step3 Validate PFC counters during the run time of ND tests with Mellanox Adapter QoS Counters in the perfmon Step 4 Install the same version of HPC Pack in the entire cluster NOTE Version mismatch in HPC Pack 2012 can cause MPI to hung Step 5 Validate the MPI base infrastructure with simple comm
43. than by the operating system Enabling offloading services increases transmission performance Due to offload tasks such as checksum calculations performed by adapter hardware rather than by the operating system and therefore with lower latency In addition CPU resources become more available for other tasks Table 12 Off load Registry Keys Value Name Default Value Description LsoV1IPv4 l Large Send Offload Version 1 IPv4 The valid values are 0 disable 1 enable LsoV2IPv4 l Large Send Offload Version 2 IPv4 The valid values are 0 disable e 1 enable LsoV2IPv6 l Large Send Offload Version 2 IPv6 The valid values are 0 disable e 1 enable LSOSize eth 64000 The maximum number of bytes that the TCP IP stack can IPoIB 6400 pass to an adapter in a single packet 0 This value affects the memory consumption and the NIC performance The valid values are MTU 1024 up to 64000 Note This registry key is not exposed to the user via the UI If LSOSize is smaller than MTU 1024 LSO will be dis abled LSOMinSegment eth 2 The minimum number of segments that a large TCP packet IPoIB 2 must be divisible by before the transport can offload it to a NIC for segmentation The valid values are 2 up to 32 Note This registry key is not exposed to the user via the UI LSOTcpOptions eth 1 Enables that the miniport driver to segment a large TCP IPoIB 1 packet whose TCP header c
44. this feature but do not 2 rovides the services that have to be installed on the same server P use to create and manage lachines and their resources ual machine is a virtualized Pr system that operates in an execution environment This bu to run multiple operating simultaneously 4 Remote Server Administration Tools 4 Role Administration Tools 4 Hyper V Management Tools Tools Hyper V Module for Windows PowerShell Tools Hyper V GUI Management Tools Include management tools if applicable Cancel Previous Next install Cancel Mellanox Technologies 78 Rev 5 10 Step 3 Install Hyper V Management Tools Features gt Remote Server Administration Tools gt Role Administration Tools gt Hyper V Administration Tool DESTINATION SERVER Select features Iedev 072 2 i on tl Before You Begin Select one or more features to install on the selected server Installation Tyce Features Description LA Wrongs va un y Hyper V Management Tools Multipath 1 0 includes GUI and command line tools for managing Hyper V Network Load Balancing Peer Name Resolution Protocol Quality Windows Audio Video Experience RAS Connection Manager Administration Kit CMA _ Remote Assistance Remote Differential Compression emote Se Administration Tools Feature Administration Tools AD DS and AD LDS Tools H
45. values Enabled default Set RSS Mode Mellanox Technologies 127 Rev 5 10 Disabled The hardware is configured once to use the Toeplitz hash function and the indirection table is never changed ST IOAT is not used while in RSS mode aa Receive Completion Method Sets the completion methods of the received packets and can affect network throughput and CPU utili zation Polling Method Increases the CPU utilization as the system polls the received rings for the incoming packets However it may increase the network performance as the incoming packet is handled faster Interrupt Method Optimizes the CPU as it uses interrupts for handling incoming messages However in certain scenarios it can decrease the network throughput Adaptive Default Settings A combination of the interrupt and polling methods dynamically depending on traffic type and network usage Choosing a different setting may improve network and or system performance in certain configu rations Interrupt Moderation RX Packet Count Number of packets that need to be received before an interrupt is generated on the receive side default 5 Interrupt Moderation RX Packet Time Maximum elapsed time in usec between the receiving of a packet and the generation of an inter rupt even if the moderation count has not been reached default 10 Rx Interrupt Moderation Type Sets the rate at which the controller moderates or delay
46. waking the InfiniBand subnet topology or using an already saved topology file ibstat Displays basic information obtained from the local IB driver Output includes LID SMLID port state link width active and port physical state vstat Displays information on the HCA attributes Mellanox Technologies 144 J Rev 5 10 Table 29 Diagnostic Utilities Utility Description osmtest Validates InfiniBand subnet manager and administration SM SA Default is to run all flows with the exception of the QoS flow osmtest provides a test suite for opensm ibaddr Displays the lid and range as well as the GID address of the port specified by DR path lid or GUID or the local port by default ibcacheedit Allows users to edit an ibnetdiscover cache created through the cache option in ibnetdiscover 8 iblinkinfo Reports link info for each port in an IB fabric node by node Option ally iblinkinfo can do partial scans and limit its output to parts of a fab ric ibqueryerrors Reports the port error counters which exceed a threshold for each port in the fabric The default threshold is zero 0 Error fields can also be suppressed entirely In addition to reporting errors on every port ibqueryerrors can report the port transmit and receive data as well as report full link information to the remote port if available ibsysstat Uses vendor MADs to validate connectivity between InfiniBand nodes
47. win2012 x64 exe S v qn v l vx LogFile Starting from MLNX WinOF v4 55 the log option is enabled automatically The default path of the log is LOCALAPPDATASNMMLNX WinOF 1logO Step 4 Optional If you do not wish to upgrade your firmware version gt MLNX VPI WinOF 5 10 All win2012 x64 exe v MT SKIPFWUPCRD 1 Step 5 Optional If you wish to control the installation of the WMI CIM provider gt MLNX VPI WinOF 5 10 All win2012 x64 exe v MT WMI 1 1 MT SKIPFWUPGRD default value is False Mellanox Technologies 19 Rev 5 10 Step 6 Optional If you wish to control whether to restore network configuration or not gt MLNX VPI WinOF 5 10 All win2012 x64 exe v MT RESTORECONF 1 For further help please run MLNX VPI WinOF 5 10 All win2012 x64 exe v h Step 7 Optional If you wish to control whether to execute performance tuning or not gt MLNX VPI WinOF 5 10 All win2012 x64 exe vPERFCHECK 0 vPERFCHECK 0 Step 8 Optional If you wish to control whether to install ND provider or not MLNX VPI WinOF 5 10 All win2012 x64 exe vMT NDPROPERTY 1 Applications that hold the driver files such as ND applications will be closed during the 2d unattended installation 2 4 Installation Results Upon installation completion you can verify the successful addition of the network card s through the Device Manager Upon installation completion the inf files can be located at ProgramFiles
48. 0 switchport trunk allowed vlan 11 config if Et20 switchport mode trunk config if Et20 dcbx mode ieee config if Et20 priority flow control mode on config if Et20 priority flow control priority 3 no drop 3 1 4 5 1 Using Global Pause Flow Control gt To enable Global Pause on ports that face the hosts perform the following config interface et10 config if Et10 flowcontrol receive on config if Et10 flowcontrol send on Mellanox Technologies 37 J Rev 5 10 3 1 4 5 2 Using Priority Flow Control PFC gt To enable Global Pause on ports that face the hosts perform the following config interface et10 config if Et10 dcbx mode ieee config if Et10 priority flow control mode on config if Et10 priority flow control priority 3 no drop 3 1 4 6 Configuring Router PFC only The router uses L3 s DSCP value to mark the egress traffic of L2 PCP The required mapping maps the three most significant bits of the DSCP into the PCP This is the default behavior and no additional configuration is required 3 1 4 6 1 Copying Port Control Protocol PCP between Subnets The captured PCP option from the Ethernet header of the incoming packet can be used to set the PCP bits on the outgoing Ethernet header 3 1 4 7 Configuring the RoCE Mode Configuring the RoCE mode requires the following e RoCE mode is configured per driver and is enforced on all the devices in the system Th
49. 2 p3 Optional mutually exclusive parameters pl p2 p3 Variables for which users supply specific values Italic font enable Emphasized words Italic font These are emphasized words Note text This is a note Warning text M May result in system insta L bility Mellanox Technologies 8 J Rev 5 10 Common Abbreviations and Acronyms Table 3 Abbreviations and Acronyms Abbreviation Acronym Whole Word Description B Capital B is used to indicate size in bytes or multiples of bytes e g IKB 1024 bytes and 1MB 1048576 bytes b Small b is used to indicate size in bits or multiples of bits e g IKb 1024 bits FW Firmware HCA Host Channel Adapter HW Hardware IB InfiniBand LSB Least significant byte Isb Least significant bit MSB Most significant byte msb Most significant bit NIC Network Interface Card NVGRE Network Virtualization using Generic Routing Encapsulation SW Software VPI Virtual Protocol Interconnect IPoIB IP over InfiniBand PFC Priority Flow Control PR Path Record RDS Reliable Datagram Sockets RoCE RDMA over Converged Ethernet SL Service Level MPI Message Passing Interface EoIB Ethernet over InfiniBand QoS Quality of Service ULP Upper Level Protocol VL Virtual Lane TC Traffic Class Me
50. 2 Only Enable the Windows Network Virtualization binding on the physical NIC of each Hyper V Host Host 1 and Host 2 PS Enable NetAdapterBinding EthInterfaceName a ComponentID ms_netwnv lt EthInterfaceName gt Physical NIC name Step 2 Create a vSwitch PS New VMSwitch vSwitchName NetAdapterName EthInterfaceName AllowManagementOS Strue Step 3 Shut down the VMs PS Stop VM Name VM Name Force Confirm Step 4 Configure the Virtual Subnet ID on the Hyper V Network Switch Ports for each Virtual Machine on each Hyper V Host Host 1 and Host 2 PS Add VMNetworkAdapter VMName VMName SwitchName vSwitchName StaticMacAddress StaticMAC Address Step 5 Configure a Subnet Locator and Route records on all Hyper V Hosts same command on all Hyper V hosts PS New NetVirtualizationLookupRecord CustomerAddress VMInterfaceIPAddress 1 n ProviderAddress lt HypervisorInterfaceIPAddress1 gt VirtualSubnetID lt virtualsubnetID gt MACAddress VMmacaddressi Rule TranslationMethodEncap PS New NetVirtualizationLookupRecord CustomerAddress VMInterfaceIPAddress 2 n ProviderAddress HypervisorInterfaceIPAddress2 VirtualSubnetID lt virtualsubnetID gt MACAddress lt VMmacaddress2 gt Rule TranslationMethodEncap a This is the VM s MAC address associated with the vSwitch connected to the Mellanox device Mellanox Technologies 73 J Rev 5 10 Step 6 Add customer route o
51. 2013 Updated the following sections Section 5 Troubleshooting on page 150 Section 1 2 WinOF Set of Documentation on page 12 Added the following sections Section 3 8 4 Adapter Proprietary Performance Counters on page 129 Rev 4 2 October 20 2012 Added the following sections Section 3 4 1 Deploying Windows Server 2012 and Above with SMB Direct on page 68 and its subsec tions Section 3 1 6 Header Data Split on page 45 e Section 4 2 part man Virtual IPoIB Port Creation Utility on page 138 Updated Section 3 8 Performance Tuning and Counters on page 118 Rev 3 2 0 July 23 2012 No changes Rev 3 1 0 May 21 2012 Added section Tuning the IPoIB Network Adapter Added section Tuning the Ethernet Network Adapter Added section Performance tuning tool application Removed section Tuning the Network Adapter Removed section part man Removed section ibdiagnet Mellanox Technologies 6 J Rev 5 10 Table 1 Document Revision History Document Revision Date Changes Rev 3 0 0 February 08 2012 Added section RDMA over Converged Ethernet RoCE and its subsections Added section Hyper V with VMQ Added section Network Driver Interface Specification NDIS Added section Header Data Split Added section Auto Sensing Added section Adapter Teaming Added section Port Protocol Configuration Added section Advanced Configuration for Infin
52. 4 Table 21 NIC Resiliency Registry Keys 0 0 ccc cc eee teen eens 115 Table 22 General Registry Keys 1 0 0 0 ccc ete hs 115 Table 23 Performance Tuning Tool Application Options 0 00 ce cece eee 122 Table 24 Mellanox Adapter Traffic Counters 20 0 ccc e 130 Table 25 Mellanox Adapter Diagnostics Counters 0 0 cece eee eh 131 Table 26 Mellanox Qos Counters nnno 0c era 134 Vable 27 RDMA ACtivity se seres V e bide di Ve o aye AINE eae oi V ea ats 135 Table 28 RDMA ACtMILY scan erie iam ORE Ub RD CR E D RWV DOR Pd se 136 Table 29 Diagnostic Utilities s irssi tenet e ene n ene 143 Table 30 Fabric Performance Unnes eect s 146 Table 31 Installation Related Issues 2 0 ccc eee ene n enna 151 Table 32 Setup Return Codes 0 ccc ee enn e nee en ene 151 Table 33 Firmware Burning Warning Codes 00 0 cece cette nena 152 Table 34 Restore Configuration Warnings 0 0 c eee eee n 152 Table 35 InfiniBand Related Issues 153 Mellanox Technologies 6 J Rev 5 10 Table 36 Ethernet Related Issues 0 ccc ccc cece nent e ete n ene ne hh 153 Table 37 Performance Related Issues 0 eee cece es 155 Table 38 Virtualization Related Issues 00 cece nen teens 157 Table 39 Events Causing Automatic State Dumps 0 0 c eee eee eee 159 Mellanox Technologies 7 J Rev 5 10 List of Figures Figure 1 Installation
53. 6 2 Show T00l nx ht cet Sts ie ete i et Pn ide ee 149 Chapter 5 Troubleshooting x e s e cc cece ccc cee hh Hf n nnn 150 5 1 Installation Related Troubleshooting nunana nauuna 151 5 1 1 Installation Error Codes and Troubleshooting ee 151 5 2 InfiniBand Related Troubleshooting n 0 0 c cece eee ene 153 5 3 Ethernet Related Troubleshooting 0 0 0 0 cece eens 153 5 4 Performance Related Troubleshooting 00 cece eee eens 155 S4 l General Diagnostic nss seges gast ein GA RAV Us 156 5 5 Virtualization Related Troubleshooting 00 0 cece eee eee 157 5 6 Reported Driver Events lisse 158 5 7 Extracting WPP Traces 159 58 State Dumping cv eee R3 aT GaN AR A oe AER 159 Appendix A NVGRE Configuration Scripts Examples 163 A l Adding NVGRE Configuration to Host 14 Example 163 A 2 Adding NVGRE Configuration to Host 15 Example 164 Appendix B Windows MPI MS MPI sss s s x e e eee ee eens 166 Bull OVerview uio ak bie Gael c an Shade a RI RED Tha Pari wll Bade 4k 166 B 2 System Requirements 0 c eee cece cette ene 166 B3 RunningMPI ueuseeeeeee ee 166 B 4 Directing MSMPI Traffic 0 2 cee ccc e 166 B 5 Running MSMPI on the Desired Priority 0 000200 eee 166 B 6 Configuring MPI pecisstssche cncink shwineorncdh am one bale tae eee ba 167 By URPEG Examplessocsh4 2 0cnw eld tesec poe fie s
54. AGE Mellanox TECHNOLOGIES Mellanox Technologies 350 Oakmead Parkway Suite 100 Sunnyvale CA 94085 U S A www mellanox com Tel 408 970 3400 Fax 408 970 3403 O Copyright 2015 Mellanox Technologies All Rights Reserved Mellanox amp Mellanox logo BridgeX amp CloudX logo Connect IB ConnectX CoolBox CORE Direct amp GPUDirect InfiniHost InfiniScale amp Kotura Kotura logo Mellanox Federal Systems Mellanox Open Ethernet Mellanox ScalableHPC Mellanox Connect Accelerate Outperform logo Mellanox Virtual Modular Switch MetroDX MetroX MLNX OS Open Ethernet logo PhyX SwitchX TestX The Generation of Open Ethernet logo UFM Virtual Protocol Interconnect Voltaire and Voltaire logo are registered trademarks of Mellanox Technologies Ltd Accelio CyPU FPGADirect M HPC X InfiniBridge LinkX Mellanox Care Mellanox CloudX Mellanox Multi Host Mellanox NEO Mellanox PeerDirect Mellanox Socket Direct Mellanox Spectrum NVMeDirect StPU Spectrum logo Switch IB Unbreakable Link are trademarks of Mellanox Technologies Ltd All other trademarks are property of their respective owners 2 Mellanox Technologies Document Number MLNX 15 3280 Rev 5 10 Table of Contents Document Revision History cece cece eee cece nn f n nn 2 About this Manual e 44e er I re ee v viejo LP REC pens 8 Intend
55. As using the vea man tool For further details on usage please refer to vea man Virtual Ethernet on page 140 Virtual Ethernet Interfaces created by VEA_man are not tuned by the automatic perfor mance tuning script for optimal performance please follow the performance tuning guide and apply relevant changes to the VEA interface 3 5 1 1 System Requirements Operating Systems Windows Server 2012 and Windows Server 2012 R2 Firmware version 2 31 5050 and above 3 5 1 2 VEA Feature Limitations e RoCE RDMA is supported only on the physical VEA e MTU JumboFrame registry key QoS and Flow Control are only configured from physical VEA No bandwidth allocation between the two interfaces Both interfaces share the same link speed SR IOV and VEA are not supported simultaneously Only one of the features can be used at any given time Mellanox Technologies 70 J Rev 5 10 3 5 2 Hyper V with VMQ 3 5 2 1 System Requirements Operating Systems Windows Server 2008 R2 Windows Server 2012 and Windows Server 2012 R2 3 5 2 2 Using Hyper V with VMQ Mellanox WinOF Rev 5 10 includes a Virtual Machine Queue VMQ interface to support Mic rosoft Hyper V network performance improvements and security enhancement VMQ interface supports Classification of received packets by using the destination MAC address to route the packets to different receive queues NIC ability to use DMA to transfer packets directl
56. Bienes v Enable virtual machine queue E IDE Controller 0 cs Hard Drive Win8Srv_DC_ S E IDE Controller 1 1 DVD Drive Mone BE SCSI Controller j Network Adapter Not connected 4 Network Adapter Internal Virtual Switch j Network Adapter Mellanox SRIOV Virtual Switch 64 fre 920 Hardware Acceleration Advanced Features 9 COM 1 None 17 COM 2 None M Diskette Drive None amp Management i Name vmi IPsec task offloading Support from a physical network adapter and the guest operating system is required to offload IPsec tasks When sufficient hardware resources are not available the security associations are not offloaded and are handled in software by the guest operating system v Enable IPsec task offloading Select the maximum number of offloaded security associations from a range of 1 to 4096 2 Offloaded SA Maximum number 5 Single root I O virtualization Single root I O virtualization 5R IOV requires specific hardware It also might require drivers to be installed in the quest operating system When sufficient hardware resources are not available network connectivity is provided through the virtual switch Enable SR IOV Step 7 Start and connect to the Virtual Machine Select the newly created Virtual Machine and go to Actions panel gt Connect In the virtual machine window go to Actions gt Start Step 8 Copy the WinOF driver package to the
57. DSCP Registry Keys Settings Registry Key Description TxUntagPriorityTag If 0x1 do not add 802 1Q tag to transmitted packets which are assigned 802 1p priority but are not assigned a non zero VLAN ID i e priority tagged Default 0x0 for DSCP based PFC set to 0x1 RxUntaggedMapToLossless If Ox1 all untagged traffic is mapped to the lossless receive queue Default 0x0 for DSCP based PFC set to 0x1 RroceDscpMarkPriorityFlowCon A value to mark DSCP for RoCE packets assigned to trol ID CoS ID when priority flow control is enabled The valid values range is from 0 to 63 Default is ID value e g Priority ToDscpMapping Table 3 is 3 ID values range from 0 to 7 DscpBasedEtsEnabled If Ox1 all Dscp based ETS feature is enabled if 0x0 disabled Default 0x0 DscpForGlobalFlowControl Default DSCP value for flow control Default 0x1a Pd For changes to take affect please restart the network adapter after changing this registry E key 3 1 10 7 1Default Settings When DSCP configuration registry keys are missing in the miniport registry the following defaults are assigned Table 8 DSCP Default Registry Keys Settings Registry Key Default Value TxUntagPriorityTag 0 RxUntaggedMapToLossles 0 PriorityToDscpMappingTable 0 0 PriorityToDscpMappingTable 1 1 PriorityToDscpMappingTable 2 2 PriorityToDscpMappingTable 3 3 Mellanox Technologies 53 J Re
58. F Release Notes 1 3 Windows MPI MS MPI Message Passing Interface MPI is meant to provide virtual topology synchronization and com munication functionality between a set of processes MPI enables running one process on several hosts Windows MPI runs over the following protocols Sockets Ethernet Network Direct ND For further details on MPI please refer to Appendix B Windows MPI MS MPI on page 166 Mellanox Technologies 12 J Rev 5 10 2 Installation 2 1 Hardware and Software Requirements Table 5 Hardware and Software Requirements Description Package Windows Server 2008 R2 64 bit MLNX VPI WinOF only 5 10 All win2008R2 x64 exe Windows 7 Client 64 bit only Windows Server 2012 64 bit only MLNX VPI WinOF 5 10 All win2012 x64 exe Windows Server 2012 R2 64 bit MLNX VPI WinOF only 5 10 All win2012R2 x64 exe Windows 8 1 Client 64 bit only a The Operating System listed above must run with administrator privileges b These servers are not signed by Microsoft yet to be signed in a short period of time Required Disk Space for Installation is 100MB P 2 2 Downloading Mellanox WinOF Driver To download the exe according to your Operating System please follow the steps below Step 1 Obtain the machine architecture For Windows Server 2012 2012 R2 1 To go to the Start menu position your mouse in the bottom right corner of the Remote De
59. HRESHOLD 1 env MPICH DISABLE ND 0 env MPICH DISABLE SOCK 1 affinity c testl exe Running MPI pallas test over ETH gt exempiexec exe p 19020 hosts 4 11 11 146 101 11 21 147 101 112i A751 11 11 145 101 env MPICH NETMASK 11 0 0 0 255 0 0 0 env MPICH ND ZCOPY THRESHOLD 1 env MPICH DISABLE ND 1 env MPICH DISABLE SOCK 0 affinity c testl exe Mellanox Technologies 168
60. IB 8 device should use Note This registry key is only in Windows Server 2012 and above Mellanox Technologies 103 Rev 5 10 Table 13 Performance Registry Keys Value Name Default Value Description BlueFlame eth 1 The latency critical Send WQEs to the device When a IPoIB 1 BlueFlame is used the WQEs are written directly to the PCI BAR of the device in addition to memory so that the device may handle them without having to access memory thus shortening the execution latency For best performance it is recommended to use the BlueFlame when the HCA is lightly loaded For high bandwidth scenarios it is recommended to use regular posting without BlueFlame The valid values are 0 disable e 1 enable Note This registry value is not exposed via the UI MaxRSSProcessors eth 8 The maximum number of RSS processors IPoIB 8 Note This registry key is only in Windows Server 2012 and above 3 6 6 Ethernet Registry Keys The following section describes the registry keys that are only relevant to Ethernet driver Table 14 Ethernet Registry Keys Value Name Dealt Description Value RoceMaxFrameSize 1024 The maximum size of a frame or a packet that can be sent by the RoCE protocol a k a Maximum Transmission Unit MTU Using larger RoCE MTU will improve the perfor mance however one must ensure that the entire system including switches supports the defined MTU Eth
61. IP addresses 5 Adaptive Load Balancing The same functionality as Load Balancing Send amp Receive In case of traffic load in one of the adapters the load balancing channels the traffic between the other team adapter Mellanox Technologies 39 J Rev 5 10 6 Dynamic Link Aggregation 802 3ad Provides dynamic link aggregation allowing creation of one or more channel groups using same speed or mixed speed server adapters 7 Static Link Aggregation 802 3ad Provides increased transmission and reception throughput in a team comprised of two to eight adapter ports through static configuration If the switch connected to the HCA supports 802 3ad the recommended setting is teaming mode 6 3 1 5 2 1 Creating a Team Teaming is used to balance the workload of packet transfers by distributing the workload over a team of network instances and to set a secondary network instance to take over packet indications and information requests if the primary network instance fails How to Create a Team The following steps describe the process of creating a team Step 1 Display the Device Manager n Device Manager lel xi File Action View Help tlasa Hal 785 Computer a Disk drives R Display adapters gt DVD CD ROM drives US Human Interface Devices C IDE ATA ATAPI controllers Keyboards A Mice and other pointing devices E Monitors amp Network adapters E Broadcom Netxtreme Gigabit Ethernet La Broadcom
62. Interface ense reci Ero e a n ne 116 3 72 Win Lin x nd rping Test uA vg aee ER VE 116 3 8 Performance Tuning and Counters 0 0 0 c cee eese 118 3 8 1 General Performance Optimization and Tuning aeaaeae 118 3 8 2 Application Specific Optimization and Tuning 0 2 0 0 00 ee 126 3 83 Tunable Performance Parameters 0 e cece ee eee 127 3 8 4 Adapter Proprietary Performance Counters 00 0 eese 129 3 9 System Recovery upon Error Detection 0 00 cece eee 136 3 10 NIG Resiliency ie e eee tate a RE E oh ala aes 137 Chapter4 Utilities 2a een ue ae ose wee acne CR ahaa ACA T 4 1 Snapshot Tool 1 eh enpeiee mme P enpeketeupeve penbeva pu e Wess 138 41 1 Snapshot Usage vsus ves v LP ERES REY NW ART 138 4 2 part man Virtual IPoIB Port Creation Utility llle esses 138 Mellanox Technologies 4 J Rev 5 10 4 3 vea man Virtual Ethernet eh 140 4 3 1 Adding a New Virtual Adapter 140 4 3 2 Removing a Virtual Ethernet Adapter 000 c eee eee 141 4 3 3 Querying the Virtual Ethernet Database 141 43 4 Help Message ice Leche aetate eU GO RANG eee 141 4 4 InfiniBand Fabric Diagnostic Utilities 0 0 ees 142 4 4 1 Utilities Usage Common Configuration Interface and Addressing 142 4 5 Fabric Performance Utilities cece eh 145 AG AMIKtOON ik eet uetus ze eet vetus tea hate tes al oh As iets 148 SKD wdbg To0l eon T tea MEM edn at dn dad dg et ge 148 4
63. JM Mellanox TECHNOLO GIES Connect Accelerate Outperform Mellanox WinOF VPI User Manual Rev 5 10 www mellanox com Rev 5 10 NOTE THIS HARDWARE SOFTWARE OR TEST SUITE PRODUCT PRODUCT S AND ITS RELATED DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES AS IS WITH ALL FAULTS OF ANY KIND AND SOLELY FOR THE PURPOSE OF AIDING THE CUSTOMER IN TESTING APPLICATIONS THAT USE THE PRODUCTS IN DESIGNATED SOLUTIONS THE CUSTOMER S MANUFACTURING TEST ENVIRONMENT HAS NOT MET THE STANDARDS SET BY MELLANOX TECHNOLOGIES TO FULLY QUALIFY THE PRODUCT S AND OR THE SYSTEM USING IT THEREFORE MELLANOX TECHNOLOGIES CANNOT AND DOES NOT GUARANTEE OR WARRANT THAT THE PRODUCTS WILL OPERATE WITH THE HIGHEST QUALITY ANY EXPRESS OR IMPLIED WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT ARE DISCLAIMED IN NO EVENT SHALL MELLANOX BE LIABLE TO CUSTOMER OR ANY THIRD PARTIES FOR ANY DIRECT INDIRECT SPECIAL EXEMPLARY OR CONSEQUENTIAL DAMAGES OF ANY KIND INCLUDING BUT NOT LIMITED TO PAYMENT FOR PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY FROM THE USE OF THE PRODUCT S AND RELATED DOCUMENTATION EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAM
64. Keys are not supported the new vIPoIB port will stay in a discon nected state until the configuration is fixed For further details about partitions configurations for OpenSM please refer to the section titled Partitions in Mellanox OFED for Linux User Manual For further details about pre and post configurations for the new vIPoIB port please refer to 3 2 7 Multiple Interfaces over non default PKeys Support on page 61 E P The part_man tool allows the creation of up to 64 vIPoIB interfaces 32 per port 4 3 vea man Virtual Ethernet vea man is a set of commands allows you to add or remove a VEA or query the existing Mella nox ethernet adapters and see which are virtual and which are physical 4 3 1 Adding a New Virtual Adapter To add a new virtual adapter run the following command gt vea man a adapter name gt adapter name gt is the name of the existing physical adapter which will be d essentially cloned The new adapter will be named by system default rules Mellanox Technologies 140 Rev 5 10 4 3 2 Removing a Virtual Ethernet Adapter gt To remove a virtual ethernet adapter run the following command gt vea_man r lt adapter name gt 4 3 3 Querying the Virtual Ethernet Database Querying the virtual ethernet database reports all physical and virtual ethernet adapters on all Mellanox cards in the system To query the virtual ethernet database run the following command
65. LT PriorityValue8021Action 0 DSCPAction 8 PolicyStore activestore Configure DSCP with value 16 for TCP IP connections with a range of ports PS New NetQosPolicy TCP1 DSCPAction 16 IPDstPortStartMatchCondition 31000 IPDst PortEndMatchCondition 31999 IPProtocol TCP PriorityValue8021Action 0 PolicyStore activestore Configure DSCP with value 24 for TCP IP connections with another range of ports PS New NetQosPolicy TCP2 DSCPAction 24 IPDstPortStartMatchCondition 21000 IPDst PortEndMatchCondition 31999 IPProtocol TCP PriorityValue8021Action 0 PolicyStore activestore Configure two Traffic Classes with bandwidths of 16 and 80 PS New NetQosTrafficClass name TCP1 priority 3 bandwidthPercentage 16 Algorithm ETS PS New NetQosTrafficClass name TCP2 priority 5 bandwidthPercentage 80 Algorithm ETS 3 1 10 6 Configuring DSCP to Control PFC for RDMA Traffic Create a QoS policy to tag the ND traffic for port 10000 with CoS value 3 PS New NetQosPolicy ND10000 NetDirectPortMatchCondition 10000 PriorityVal ue8021Action 3 Related Commands Get NetAdapterQos Gets the QoS properties of the network adapter e Get NetQosPolicy Retrieves network QoS policies Get NetQosFlowControl Gets QoS status per priority Mellanox Technologies 52 J Rev 5 10 3 1 10 7 Registry Settings The following attributes must be set manually and will be added to the miniport registry Table 7
66. NumQP in the load in a guest machine reserve enough QPs forthe mlx4 bus registry in SR IOV environ specific VF ment and appears with yellow bang in the Device Manager 5 6 Reported Driver Events The driver records events in the system log of the Windows server event system which can be used to identify diagnose and predict sources of system problems To see the log of events open System Event Viewer as follows Right click on My Computer click Manage and then click Event Viewer OR 1 Click start gt Run and enter eventvwr exe 2 In Event Viewer select the system log The following events are recorded e Mellanox ConnectX EN 10Gbit Ethernet Adapter X has been successfully initialized and enabled Failed to initialize Mellanox ConnectX EN 10Gbit Ethernet Adapter e Mellanox ConnectX EN 10Gbit Ethernet Adapter X has been successfully initialized and enabled The port s network address is lt MAC Address gt The Mellanox ConnectX EN 10Gbit Ethernet was reset Failed to reset the Mellanox ConnectX EN 10Gbit Ethernet NIC Try disabling then re enabling the Mellanox Ethernet Bus Driver device via the Windows device manager Mellanox ConnectX EN 10Gbit Ethernet Adapter X has been successfully stopped Failed to initialize the Mellanox ConnectX EN 10Gbit Ethernet Adapter X because it uses old firmware version old firmware version You need to burn firmware version new fi
67. OPROM Enabled Select Item Slot5 PCI E OPROM Enabled Change Option Slot amp PCI E OPRDH Enabled General Help Load Onboard LAN 1 Option ROM Enabled Save and Exit Load Unboard LAN 2 Option ROM Disabled Exit Onboard LAN Option Rom Select PHE Hoots Graphic Adapter Priority Onboard UGA v02 68 C Copyright 1985 2009 fimerican Meyatrends Inc BIOS SETUP UTILITY Microcode Rev 14 aj When enabled a UMN Cache L1 256 KB can utilize the Cache L2 1024 KB El additional HW Caps Cache L3 12288 KH Zl provided by Intel R Ratio Status Unlocked Min 12 Max 18 Sl Virtualization Tech Ratio Actual Value 18 Note A full reset is required to change CPI Ratin TAutal the setting C1E Support Enabled Hardware Prefetcher Enabled Adjacent Cache Line Prefetch Enabled DCW Prefetcher Enabled Data Reuse Opt imizal iun Enabled Select Screen MPS and ACPI HADT ordering Modern ordering Select Iten Change Option Execute Disable Hit Capability Enabled Sl p General Help Intel AES NI Disabled a Save and Exit Simultaneous Multi Threading Enabled i Exit fictiue Processor Cores IAL Intel R EIST Technology Enabled v02 68 O Copyright 1965 2009 American Megatrends Inc For further details please refer to the vendor s website Mellanox Technologies 77 J Rev 5 10 3 5 4 3 2 Installing Hypervisor Operating System SR IOV Ethernet Only To install Hypervisor Operating System Step 1 In
68. P networks It uses the 6 bit Dif ferentiated Services Field DS or DSCP field in the IP header for packet classification purposes Using Layer 3 classification enables you to maintain the same classification semantics beyond local network across routers Every transmitted packet holds the information allowing network devices to map the packet to the appropriate 802 1Qbb CoS For DSCP based PFC or ETS the packet is marked with a DSCP value in the Differentiated Services DS field of the IP header 3 1 10 1 System Requirements Operating Systems Windows Server 2008 R2 Windows Server 2012 and Windows Server 2012 R2 Firmware version 2 30 8000 or higher 3 1 10 2 Setting the DSCP in the IP Header Marking DSCP value in the IP header is done differently for IP packets constructed by the NIC e g RDMA traffic and for packets constructed by the IP stack e g TCP traffic ForIP packets generated by the IP stack the DSCP value is provided by the IP stack The NIC does not validate the match between DSCP and Class of Service CoS values CoS and DSCP values are expected to be set through standard tools such as PowerShell command New NetQosPolicy using Priority Value8021Action and DSCPAction flags respectively For IP packets generated by the NIC RDMA the DSCP value is generated according to the CoS value programmed for the interface CoS value is set through standard tools such as PowerShell command New NetQosPolicy using Priorit
69. Pro IPolB Adapter S Mellanox ConnectX 3 Pro IPolB Adapter 2 p Ports COM amp LPT p deh Print queues b D Processors b CZ Storage controllers p pill System devices p Universal Serial Bus controllers Mellanox Technologies 64 Rev 5 10 Step2 Right click one of Mellanox ConnectX IPoIB adapters under Network adapters list and left click Properties Select the Teaming tab from the Properties window It is not recommended to open the Properties window of more than one adapter simulta neously 5 The Teaming dialog enables creating modifying or removing a team Only Mellanox Technologies adapters can be part of the team To create a new team perform the following Step 1 Click Create Step 2 Enter a unique team name Step3 Select the adapters to be included in the team Step 4 Optional Select Primary Adapter An InfiniBand team implements an active passive scenario where only one interface is active at any given time When the active one is disconnected one of the other interfaces becomes active When the primary link comes up the team interface returns to transfer data using the primary interface If the primary adapter is not selected the primary interface is selected randomly Step 5 Optional Failback to Primary This checkbox specifies the team s behavior when the active adapter is not the primary one and the primary adapter becomes available connected e Failback to
70. SR IOV Virtual Functions tis recommended that the feature be used only when the port is configured to maintain flow control tis recommended not to exceed typical timeout values of management protocols usu ally in the order of several seconds norder for the feature to effectively prevent packet drops the DPC load duration needs to be lower than the TCP retransmission timeout The feature is only activated if neither of the ports is IB 3 1 11 7 System Requirements Operating Systems Windows Server 2012 or Windows Server 2012 R2 Firmware 2 31 5050 3 1 11 8 Enabling Disabling Lossless TCP This feature is controlled using the registry key DelayDropTimeout that enables Lossless TCP capability in hardware and by Set OID OID_MLX_DROPLESS MODE which triggers transition to from Lossless poll mode 3 1 11 8 1Enabling Lossless TCP Using The Registry Key DelayDropTimeout Registry Key location HKLMNSYSTEMNCurrentControlSetVControlVClassVClassV 4d36e972 e325 11ce bf c1 08002be10318 lt nn gt DelayDropTimeout Mellanox Technologies 56 J Rev 5 10 For instructions on how to find interface index in registry lt nn gt Please refer to Section 3 6 2 Finding the Index Value of the Network Interface on page 93 Key Name Key Type Values Description Delay REG D e 0 disabled Choosing values between 1 65534 enables the DropTim WORD default feature but the chosen value limits the amount of
71. Startup Properties Mellanox Technologies 48 J Rev 5 10 4 Move to PowerShell Scripts tab Local Group Policy Editor n x File Action View Help 2m E 5 H rm EJ Local Computer Policy c Scripts 4 Ei Computer Configuration s ap Name b 3 Software Settings ttt 4 Windows Settings 2 Scripts PowerShell Scripts P Mf Name Resolution Poi 1 Display Properties E Shutdown 3 lame Resolution Polic 5j Scripts Startup Shutdown Deog EZ Windows PowerShell Startup Scripts for Local Computer b g ecurity Settings Contains computer startup scripts gt du Policy based QoS b Administrative Templates Name Parameters 4 amp Wero b 3 Software Settings p 2 Windows Settings b 73 Administrative Templates Add Remove Forthis GPO run scripts in the following order l Not configured v i PowerShell scripts require at least Windows 7 or Windows Server 2008 R2 Show Files OK Cancel 5 Click Add The script should include only the following commands PS Remove NetQosTrafficClass PS Remove NetQosPolicy Confirm False PS set NetQosDcbxSetting Willing 0 PS New NetQosPolicy SMB Policystore Activestore NetDirectPortMatchCondition 445 PriorityValue8021Action 3 PS New NetQosPolicy DEFAULT Policystore Activestore Default PriorityValue8021Ac tion 3 PS New NetQosPolicy TCP Policystore Activestore IPProtocolM
72. Step 1 Open Device Manager and go to Network Adapters Step2 Right click gt Properties on Mellanox Connect X Ethernet Adapter Mellanox Technologies 93 Rev 5 10 Step 3 Go to Details tab Step 4 Select the Driver key and obtain the nn number In the below example the index equals 0010 zh Device Manager File Action View Help e m o E m Pas di et General Advanced I Information I Performance I Driver bf Computer b DG dee Details Events I Power Management cs p Me Display adapters gt ca IDE ATA ATAPI controllers 4 IEEE 1394 host controllers Texas Instruments 1394 OHCI Compliant Host Controlle Property Mk Mellanox ConnectX 3 Ethemet Adapter p amp Keyboards Driver key gt f Mice and other pointing devices gt S Monitors pid 4 Network adapters 4d3629722325 1 1ce bfc1 08002be 103180010 J S Broadcom NetXtreme Gigabit Ethernet 5 Broadcom NetXtreme Gigabit Ethernet 6 S Broadcom NetXtreme Gigabit Ethernet 7 Broadcom NetXtreme Gigabit Ethernet 8 S Hyper V Virtual Ethernet Adapter 2 2 Hyper V Virtual Ethernet Adapter 3 G z Mellanox ConnectX 3 Pro Bisma Adapter Mellanox ConnectX 3 Pro Ethernet Adapter 2 a Other devices n Base System Device b P Ports COM amp LPT 3 6 3 Basic Registry Keys This group contains the registry keys that control the basic operations of the NIC Table 11 Basi
73. Step 5 Choose the Tx Throughput Port Arbiter option Step 6 Set one of the following values Best Effort Default Default behavior No precedence is given to this port over the other Guaranteed Give higher precedence to this port Not Present No configuration exists defaults are used Mellanox Technologies 45 Rev 5 10 3 1 8 Configuring Quality of Service QoS 3 1 8 1 System Requirements Operating Systems Windows Server 2008 R2 Windows Server 2012 and Windows Server 2012 R2 3 1 8 2 QoS Configuration Prior to configuring Quality of Service you must install Data Center Bridging using one of the following methods gt To Disable Flow Control Configuration Device manager gt Network adapters gt Mellanox ConnectX 3 Ethernet Adapter gt Properties gt Advanced tab Device Manager File Action View Help e m E Hs NIS 4 Z clx apk 04 p R Computer b ca Disk drives Details Events Power Management p MY Display adapters General Advanced Information Performance Driver P Gq IDE ATA ATAPI controllers ET p amp Keyboards the property you want to change on the left and then select its value b n Mice and other pointing devices onthe right b Ki Monitors Value 4 amp Network adapters Eee qup rr RN Disabled Vy Broadcom NetXtreme Gigabit Ethernet y Broadcom NetXtreme Gigabit Ethernet 2 Vy Broadcom NetXtreme Gigabit Ethernet 3 IPV4 Checksum Offl
74. VM using Mellanox VMNIC IP address Step 9 Install WinOF driver package on the VM Step 10 Reboot the VM at the end of installation Step 11 Verify that Mellanox Virtual Function appears in the device manager Mellanox Technologies 91 Rev 5 10 Figure 14 Virtual Function in the VM File Action View Help e9 m Hs a amp Network adapters Mellanox ConnectX 3 Ethernet Adapter Microsoft Hyper V Network Adapter amp Microsoft Hyper V Network Adapter 3 Microsoft Kernel Debug Network Adapter p 3 Ports COM amp LPT p Print queues gt D Processors b 7 Storage controllers 4 V System devices jk ACPI Fixed Feature Button amp Composite Bus Enumerator KE KE KE jk Direct memory access controller jk Intel 82371AB EB PCI to ISA bridge ISA mode jki Intel 82443BX Pentium R Il Processor to PCI Bridge jki Mellanox ConnectX 3 VPI MT04100 PCle 3 0 5GT s IB FDR 40GigE Virtual Network Adapter amp Microsoft ACPI Compliant System CER To achieve best performance on SR IOV VF please run the following powershell com mands on the host 2 For 10Gbe PS Set VMNetworkAdapter Name Network Adapter VMName vm1 IovQueue PairsRequested 4 For 40Gbe and 56Gbe PS Set VMNetworkAdapter Name Network Adapter VMName vm1 IovQueue PairsRequested 8 3 6 Configuration Using Registry Keys Mellanox IPoIB and Ethernet drivers use registry keys to control the NIC operations The regis try keys recei
75. alid request Number of remote invalid request errors when the local machine errors generates outbound traffic i e NAK was received indicating that the other end detected invalid OpCode request Responder Invalid request Number of remote invalid request errors when the local machine errors receives inbound traffic Requester Remote access Number of remote access errors when the local machine generates errors outbound traffic i e NAK was received indicating that the other end detected wrong rkey Responder Remote access Number of remote access errors when the local machine receives errors inbound traffic i e the local machine received RDMA request with wrong rkey Requester RNR NAK Number of RNR Receiver Not Ready NAKs received when the local machine generates outbound traffic Responder RNR NAK Number of RNR Receiver Not Ready NAKs sent when the local machine receives inbound traffic Requester out of order Number of Out of Sequence NAK received when the local sequence NAK machine generates outbound traffic i e the number of times the local machine received NAKs indicating OOS on the receiving side Responder out of order Number of Out of Sequence packet received when the local sequence received machine receives inbound traffic i e the number of times the local machine received messages that are not consecutive Requester resync Number of resync operations when the local machine generates outbound traffic Responder resync N
76. ands such as hostname B 5 1 PFC Example In the example below ND and NDK go to priority 3 that configures no drop in the switches The TCP UDP traffic directs ALL traffic to priority 1 Install dcbx Install WindowsFeature Data Center Bridging Remove the entire previous settings Remove NetQosTrafficClass Remove NetQosPolicy Confirm False Setthe DCBX Willing parameter to false as Mellanox drivers do not support this feature Set NetQosDcbxSetting Willing 0 Mellanox Technologies 167 Rev 5 10 Create a Quality of Service QoS policy and tag each type of traffic with the relevant priority In this example we used TCP UDP priority 1 ND NDK priority 3 New NetQosPolicy SMB NetDirectPortMatchCondition 445 PriorityValue8021Action 3 New NetQosPolicy DEFAULT Default PriorityValue8021Action 3 New NetQosPolicy TCP IPProtocolMatchCondition TCP PriorityValue8021Actionl New NetQosPolicy UDP IPProtocolMatchCondition UDP PriorityValue8021Action 1 Enable PFC on priority 3 Enable NetQosFlowControl 3 Disable Priority Flow Control PFC for all other priorities except for 3 Disable NetQosFlowControl 0 1 2 4 5 6 7 Enable QoS on the relevant interface Enable netadapterqos Name B 5 2 Running MPI Command Examples Running MPI pallas test over ND gt mpiexec exe p 19020 hosts 4 11 11 146 101 11 21 147 101 SEDI UE T 11 11 145 101 env MPICH NETMASK 11 0 0 0 255 0 0 0 env MPICH ND ZCOPY T
77. ansport to provide the platform for deploying RDMA technology in mainstream data center application at 10GigE 40GigE and 56GigE link speed ConnectX EN with its hardware offload support takes advantage of this efficient RDMA transport InfiniBand services over Ethernet to deliver ultra low latency for performance critical and transaction intensive applica tions such as financial database storage and content delivery networks RoCE encapsulates IB transport and GRH headers in Ethernet packets bearing a dedicated ether type While the use of GRH is optional within InfiniBand subnets it is mandatory when using RoCE Applications writ ten over IB verbs should work seamlessly but they require provisioning of GRH information when creating address vectors The library and driver are modified to provide mapping from GID to MAC addresses required by the hardware 3 1 4 1 RoCE Configuration In order to function reliably RoCE requires a form of flow control While it is possible to use global flow control this is normally undesirable for performance reasons The normal and optimal way to use RoCE is to use Priority Flow Control PFC To use PFC it must be enabled on all endpoints and switches in the flow path In the following section we present instructions to configure PFC on Mellanox ConnectXTM cards There are multiple configuration steps required all of which may be performed via Power Shell Therefore although we present each step indivi
78. ant to perform this action Performing the operation SetValue on target MLNX PCIDeviceSriovSettingData MLNX P CIDeviceSriovSettingData Mellanox ConnectX 3 PRO VPI MT04103 Network Adapter InstanceID PCIVEN 15B3 amp DEV 1007 amp SUBSYS 22F5103C amp R oue Y Yes A Yes to All N No L No to All S Suspend Help default is Y Y Mellanox device is a dual port single PCI function Virtual Functions pool belongs to both ports To define how the pool is divided between the two ports use the Powershell ae SriovPort1NumVFs command 1 SriovPortMode 2 Enables SR IOV on both ports SriovPortINumVFs 8 amp SriovPort2NumVFs 8 Enable 8 Virtual Functions for each port when working in manual mode By default there are assigned 16 virtual functions on the first port Mellanox Technologies 85 J Rev 5 10 SR IOV mode configuration parameters Parameter Name Values Description SriovEnable e 0 RoCE default Configures the RDMA or SR IOV mode SR IOV The default WinOF configuration mode is RoCE To switch to SR IOV set the sriovEnable registry key value to 1 By default in SR IOV mode all VF pool belongs to Port 1 To change the VF pool distribution change the PortMode to manual and choose how many VFs to assign to each port Note RDMA is not supported in SR IOV mode SriovPortMode 0 auto_portl Configures the number of VFs to be enabled default by the bus driver to each
79. arameters into HKLM System CurrnetControlSet Services mlx4eth63 Parameters and setting the follow ing bits ENABLE DUMP ON PORT DOWN 1 e 11 i e 0x0800 ENABLE DUMP ON PORT UNKNOWN 1 e 12 i e 0x1000 Events EQ_STUCK and TXCQ_STUCK can be disabled by setting the following bits using the Ethernet Mode Flags parameters DISABLE DUMP ON EQ STUCK 1 lt lt 9 i e 0x0200 DISABLE DUMP ON TXCQ STUCK 1 lt lt 10 i e 0x0400 The set consists of the following files e 3 consecutive mstdump files 2 EQ dump files e FW trace file These files are created in the SystemRoot temp directory and should be sent to Mellanox Support for analysis when debugging WinOF driver problems Their names have the following format Driver mode of work card location event tag name event num ber event name file type file index log where Driver mode of work The mode of driver work For example SingleFunc card location In form bus device function For example 4 0 0 event tag name One symbol tag See in Table 39 Events Causing Automatic State Dumps on page 159 event_number The index of dump files set and created for this event This number is restricted by the hidden Registry parameter DumpEventsNum event_name A short string naming the event For example eth down 1 Ethernet port passed to DOWN state Mellanox Technologies 160 Rev 5 10 file_type Type of file in the
80. are Tx Rx checksum calculation Large Send off load 1 e TCP Segmentation Off load Hardware multicast filtering Adaptive interrupt moderation Support for MSI X interrupts Support for Auto Sensing of Link level protocol NDK with SMB Direct NDvl and v2 API support in user space e VMQ for Hypervisor e CIM and PowerShell Ethernet Only Hardware VLAN filtering Header Data Split RDMA over Converged Ethernet RoCEv1 RDMA over Converged Ethernet e RoCE MAC Based v1 RoCEIP Based v1 RoCE over UDP v2 DSCP over IPv4 NVGRE hardware off load in ConnectX 3 Pro Ports TX arbitration Bandwidth allocation per port Enhanced Transmission Selection ETS SR IOV Ethernet on Windows Server 2012 R2 Hypervisor with Windows Server 2012 and above guests InfiniBand Only SR IOV over KVM Hypervisor Diagnostic tools Mellanox Technologies 11 J Rev 5 10 For the complete list of Ethernet and InfiniBand Known Issues and Limitations WinOF Release Notes www mellanox com gt Products gt Software gt InfiniBand VPI Drivers gt Windows SW Drivers 1 1 Supplied Packages Mellanox WinOF driver Rev 5 10 includes the following package e MLNX VPI WinOF version All OS arch exe In this package the port default is auto and RoCE is enabled 1 2 WinOF Set of Documentation Under lt installation_directory gt Documentation License file User Manual this document MLNX VPI WinO
81. at effect on optimizing network throughput and CPU utilization The valid values are e 0 Low Latency Implies higher rate of interrupts to achieve better latency or to handle scenarios where only a small number of streams are used 1 Moderate Interrupt moderation is set to midrange defaults to allow maximum throughput at minimum CPU utilization for com mon scenarios 2 Aggressive Interrupt moderation is set to maximal values to allow maxi mum throughput at minimum CPU utilization for more intensive multi stream scenarios TxIntModerationProfile eth 1 Enables the assignment of different interrupt moderation IPoIB 1 profiles for send completions Interrupt moderation can have great effect on optimizing network throughput and CPU utilization The valid values are 0 Low Latency Implies higher rate of interrupts to achieve better latency or to handle scenarios where only a small number of streams are used e 1 Moderate Interrupt moderation is set to midrange defaults to allow maximum throughput at minimum CPU utilization for com mon scenarios 2 Aggressive Interrupt moderation is set to maximal values to allow maxi mum throughput at minimum CPU utilization for more intensive multi stream scenarios Mellanox Technologies 96 J Rev 5 10 3 6 4 Off load Registry Keys This group of registry keys allows the administrator to specify which TCP IP offload settings are handled by the adapter rather
82. atchCondition TCP PriorityValue8021Action 1 PS New NetQosPolicy UDP Policystore Activestore IPProtocolMatchCondition UDP PriorityValue8021Action 1 PS Disable NetQosFlowControl 0 1 2 4 5 6 7 PS Enable NetAdapterQos InterfaceAlias port1 PS Enable NetAdapterQos InterfaceAlias port2 PS Enable NetQosFlowControl Priority 3 PS New NetQosTrafficClass name SMB class priority 3 bandwidthPercentage 50 Algorithm ETS 6 Browse for the script s location 7 Click OK 8 To confirm the settings applied after boot run PS get netqospolicy policystore activestore 3 1 8 3 Enhanced Transmission Selection Enhanced Transmission Selection ETS provides a common management framework for assign ment of bandwidth to frame priorities as described in the IEEE 802 1Qaz specification http www iece802 org 1 files public docs2008 az wadekar ets proposal 0608 v1 01 pdf Mellanox Technologies 49 Rev 5 10 For further details on configuring ETS on Windows Server please refer to http technet microsoft com en us library hh967440 aspx 3 1 9 Configuring the Ethernet Driver The following steps describe how to configure advanced features Step 1 Display the Device Manager File Action View Help e 9 m E 8 m I DE UN US Y Ports COM amp LPT gen Print queues D Processors Storage controllers jM System devices R ACPI Fixed Feature Button R Composite Bus Enumerator Direct memory access
83. bw This test is used for performance measuring of Send requests in Micro soft Windows Operating Systems nd send bw is performance ori ented for Send with maximum throughput and runs over Microsoft s NetworkDirect standard The level of customizing for the user is rela tively high User may choose to run with a customized message size customized number of iterations or alternatively customized test dura tion time nd_send_bw runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation Mellanox Technologies 146 Rev 5 10 Utility Description nd_send_lat This test is used for performance measuring of Send requests in Micro soft Windows Operating Systems nd send lat is performance oriented for Send with minimum latency and runs over Microsoft s NetworkDi rect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alternatively customized test duration time nd send lat runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation Mellanox Technologies 147 Rev 5 10 4 6 mIxtool mlxtool is a general utility used for debugging and accessing the driver using a command line Usage mlxtool exe lt tool name gt lt tool arguments gt 4 6 1 dbg Tool This tool is used to extract debug information Usage mlxtool exe dbg
84. c Registry Keys Default SEU Value Name Value Description JumboPacket eth 1514 The maximum size of a frame or a packet that can be IPoIB 4096 sent over the wire This is also known as the maximum transmission unit MTU The MTU may have a signifi cant impact on the network s performance as a large packet can cause high latency However it can also reduce the CPU utilization and improve the wire efficiency The standard Ethernet frame size is 1514 bytes but Mellanox drivers support wide range of packet sizes The valid values are Ethernet 600 up to 9600 IPoIB 1500 up to 4092 Note All the devices across the network switches and routers should support the same frame size Be aware that different network devices calculate the frame size differ ently Some devices include the header i e information in the frame size while others do not Mellanox adapters do not include Ethernet header infor mation in the frame size i e when setting JumboPacket to 1500 the actual frame size is 1514 Mellanox Technologies 94 Rev 5 10 Table 11 Basic Registry Keys Default awe Value Name Value Description ReceiveBuffers eth 512 The number of packets each ring receives This parameter IPoIB 512 affects the memory consumption and the performance Increasing this value can enhance receive performance but also consumes more system memory In case of lack of received buffers dropped packets
85. cal port states of an Infini Band port It also allows adjusting the link speed that is enabled on any InfiniBand port If the queried port is a switch port then ibportstate can be used to Disable enable or reset the port Validate the port s link width and speed against the peer port ibroute Uses SMPs to display the forwarding tables for unicast LinearFor wardingTable or LFT or multicast MulticastForwardingTable or MFT for the specified switch LID and the optional lid mlid range The default range is all valid entries in the range of 1 to FDBTop Mellanox Technologies 143 Rev 5 10 Table 29 Diagnostic Utilities Utility Description ibdump Dumps InfiniBand Ethernet and all RoCE versions traffic that flows to and from Mellanox ConnectX 3 ConnectX 3 Pro NIC s ports It provides a similar functionality to the tcpdump tool on a standard Ethernet port The ibdump tool generates packet dump file in pcap for mat This file can be loaded by the Wireshark tool www wire shark org for graphical traffic analysis This provides the ability to analyze network behavior and performance and to debug applications that send or receive RDMA network traffic Run ibdump h to display a help message which details the tools options smpquery Provides a basic subset of standard SMP queries to query Subnet man agement attributes such as node info node description switch info and port info
86. ceive instead of the CPU default Enabled TCP UDP Checksum Offload for IPv6 packets Enables the adapter to compute TCP UDP checksum over IPv6 packets upon transmit and or receive instead of the CPU default Enabled Large Send Offload LSO Allows the TCP stack to build a TCP message up to 64KB long and sends it in one call down the stack The adapter then re segments the message into multiple TCP packets for transmission on the wire with each pack sized according to the MTU This option offloads a large amount of kernel processing time from the host CPU to the adapter IB Options Configures parameters related to InfiniBand functionality SA Query Retry Count Sets the number of SA query retries once a query fails The valid values are 1 64 default 10 SA Query Timeout Sets the waiting timeout in millisecond of an SA query completion The valid values are 500 60000 default 1000 ms 3 8 4 Adapter Proprietary Performance Counters Proprietary Performance Counters are used to provide information on Operating System applica tion service or the drivers performance Counters can be used for different system debugging purposes help to determine system bottlenecks and fine tune system and application perfor mance The Operating System network and devices provide counter data that the application can consume to provide users with a graphical view of the system s performance quality WinOF counters hold the standard Win
87. controller ji Generic Bus Intel R 5000 Series Chipset Error Reporting Registers 25FO Intel R 5000 Series Chipset Error Reporting Registers 25FO Intel R 5000 Series Chipset Error Reporting Registers 25FO Intel R 5000 Series Chipset FBD Registers 25F5 Intel R 5000 Series Chipset FBD Registers 25F6 Intel R 5000 Series Chipset PCI Express x4 Port 3 25E3 Intel R 5000 Series Chipset PCI Express x4 Port 5 25E5 Intel R 5000 Series Chipset PCI Express x4 Port 6 25E6 Intel R 5000 Series Chipset PCI Express x4 Port 7 25E7 Intel R 5000 Series Chipset PCI Express x8 Port 2 3 25F7 Intel R 5000 Series Chipset Reserved Registers 25F1 E Intel R 5000 Series Chipset Reserved Registers 25F3 Intel R 5000X Chipset Memory Controller Hub 25C0 Intel R 5000X Chipset PCI Express x16 Port 4 7 25FA Intel R 6311ESB 6321ESB PCI Express Downstream Port E1 3510 Intel R 6311ESB 6321ESB PCI Express to PCI X Bridge 350C Intel R 6311ESB 6321ESB PCI Express Upstream Port 3500 Intel R 631xESB 6321ESB 3100 Chipset LPC Interface Controller 2670 K Intel R 631xESB 6321ESB 3100 Chipset PCI Express Root Port 1 2690 Intel R 631xESB 6321ESB 3100 Chipset PCI Express Root Port 2 2692 K Intel R 631xESB 6321ESB 3100 Chipset SMBus Controller 2698 7 Intel R 82801 PCI Bridge 244E R Mellanox ConnectX 3 MT04099 Network Adapter R Mellanox ConnectX 3 MT04099 Network Adapter Microsoft ACPI Compliant Sys
88. ddress to a network port after installation Step 1 Open the Network Connections window Locate Local Area Connections with Mellanox devices WD Net Network Con v 6 Search Network Connections Organize v fi e Name Status X Ethernet N Ethernet 2 Ethernet 3 Ethernet 4 Network cable unplugged Network Unidentified network Unidentified network Local Area Connection Unidentified network Step 2 Right click a Mellanox Local Area Connection and left click Properties Networking Sharing Connect using IBM USB Remote NDIS Network Device This connection uses the following items JB QoS Packet Scheduler JB File and Printer Sharing for Microsoft Networks Microsoft Network Adapter Multiplexor Protocol Link Layer Topology Discovery Mapper 1 0 Driver Link Layer Topology Discovery Responder Intemet Protocol Version 6 TCP IPv6 Intemet Protocol Version 4 TCP IPv4 Install Uninstall Properties Description Allows your computer to access resources on a Microsoft network Step 3 Select Internet Protocol Version 4 TCP IPv4 from the scroll list and click Properties Mellanox Technologies 30 Rev 5 10 Step 4 Select the Use the following IP address radio button and enter the desired IP information General Alternate C
89. dows CounterSet API that includes Network Interface RDMA activity SMB Direct Connection 3 8 4 1 Supported Standard Performance Counters 3 8 4 1 1 Proprietary Mellanox Adapter Traffic Counters Proprietary Mellanox adapter traffic counter set consists of global traffic statistics which gather information from ConnectX 3 and ConnectX 3 Pro network adapters and includes traffic Mellanox Technologies 129 Rev 5 10 statistics and various types of error and indications from both the Physical Function and Virtual Function Table 24 Mellanox Adapter Traffic Counters Mellanox Adapter Traffic Counters Description Bytes IN Bytes Received Shows the number of bytes received by the adapter The counted bytes include framing characters Bytes Received Sec Shows the rate at which bytes are received by the adapter The counted bytes include framing characters Packets Received Shows the number of packets received by ConnectX 3 and Con nectX 3Pro network interface Packets Received Sec Shows the rate at which packets are received by ConnectX 3 and ConnectX 3Pro network interface Bytes Packets OUT Bytes Sent Shows the number of bytes sent by the adapter The counted bytes include framing characters Bytes Sent Sec Shows the rate at which bytes are sent by the adapter The counted bytes include framing characters Packets Sent Shows the number of packets sen
90. dually you may ultimately choose to write a PowerShell script to do them all in one step Note that administrator privileges are required for these steps For further information about RoCE configuration please refer to https community mellanox com 3 1 4 1 1 System Requirements The following are the driver s prerequisites in order to set or configure RoCE e RoCE ConnectX 3 and ConnectX 3 Pro firmware version 2 30 3000 or higher e RoCEv2 ConnectX 3 Pro firmware version 2 31 5050 or higher e All InfiniBand verbs applications which run over InfiniBand verbs should work on RoCE links if they use GRH headers Operating Systems Windows Server 2008 R2 Windows Server 2012 Windows Server 2012 R2 Windows 7 Client Windows 8 1 Client Mellanox Technologies 33 J Rev 5 10 Set HCA to use Ethernet protocol Display the Device Manager and expand System Devices Please refer to Section 3 1 1 2 Port Protocol Configuration on page 28 3 1 4 1 2 Configuring Windows Host E Since PFC is responsible for flow controlling at the granularity of traffic priority it is necessary to assign different priorities to different types of network traffic adi As per RoCE configuration all ND NDK traffic is assigned to one or more chosen pri orities where PFC is enabled on those priorities Configuring Windows host requires configuring QoS To configure QoS please follow the pro cedure described in Section 3 1 8 Configu
91. e gt Run new task gt and enter CMD Step 2 Uninstall the driver Run MLNX VPI WinOF 5 10 All win2012 x64 exe S x v qn Mellanox Technologies 23 Rev 5 10 2 7 Firmware Upgrade If the machine has a standard Mellanox card with an older firmware version the firmware will be updated automatically as part of the installation of the WinOF package For information on how to upgrade firmware manually please refer to MFT User Manual www mellanox com gt Products gt InfiniBand VPI Drivers gt Firmware Tools 2 8 Upgrading Mellanox WinOF Driver The upgrade process differs between various Operating Systems Windows Server 2012 and above When upgrading from WinOF version 4 2 to version 4 40 and above the MLNX WinOF driver does not completely uninstall the previous version but rather upgrades only the components that require upgrade The network configuration is saved upon driver upgrade When upgrading from Inbox or any other version the network configuration is automati cally saved upon driver upgrade 2 9 Booting Windows from an iSCSI Target 2 9 1 Configuring the WDS DHCP and iSCSI Servers 2 9 1 1 Configuring the WDS Server To configure the WDS server 1 Install the WDS server 2 Extract the Mellanox drivers to a local directory using the a parameter For boot over Ethernet when using adapter cards with older firmware version than 2 30 8000 you need to extract the PXE package otherwise
92. e Side Scaling RSS 61 3 2 7 Multiple Interfaces over non default PKeys Support n nsusssaas ensa 61 3 2 8 Teaming coe UU RERUM nS 64 3 3 Management 0 uL od dag ES re ess olde v eis 67 3 3 1 PowerShell Configuration 0 00 c cee eee 67 344 Storage Protocol sys sea cocina dew Sw hinder a Sana pac GRD e ae a RAS 68 3 4 1 Deploying Windows Server 2012 and Above with SMB Direct 68 3 5 o Virtualization uere ete Bag Sos BR al ond GR aS Meee eos 70 3 5 1 Virtual Ethernet Adapter 0 0 eee 70 3 5 2 Hyper V with VMO oc eet sale eens Bi het a a ERICH EE s 71 3 5 3 Network Virtualization using Generic Routing Encapsulation NVGRE 71 3 5 4 Single Root I O Virtualization SR IOV 00 0 cece eee 75 3 6 Configuration Using Registry Keys 00 0 cece ees 92 3 6 1 Finding the Index Value of the HCA 0 0 0 cece 93 3 6 2 Finding the Index Value of the Network Interface nn nuunnannnnnnna 93 3 63 Basic Registry Keys eaa RR cece cee cent eee e nes 94 3 64 Off load Registry Keys oe ese R A L dus upra Poe eed ee a Ae abe 97 3 6 5 Performance Registry Keys 000s cece cette eee 99 3 6 6 Ethernet Registry Keys 104 3 6 7 IPolB Registry Keys uero Ove Si a GE ai tod E ENS 109 3 6 8 General Registry Values 2 0 0 ccc ccc eh 111 3 6 9 MEX BUS Registry Keys 0 0 00 ccc ccc en 112 3 7 Software Development Kit SDK 116 3 7 1 Network Direct
93. e When enabled this setting will allow an SR default IOV IB guest VM driver to gracefully l enable recover from a case where the hypervisor driver is stuck by resetting the guest driver otherwise when a hypervisor is stuck the VM will require a restart to recover Caution This setting cannot be enabled when user space RDMA applications such as MPI are running in the VM UpdateGIDTim DWORD e 0 10000 Polling interval in milliseconds of local IP erFrequency Default address changes for updating RDMA IP 3000 based GIDs Mellanox Technologies 115 Rev 5 10 3 7 Software Development Kit SDK Software Development Kit SDK is a set of development tools that allows the creation of Infini Band applications for MLNX_VPI software package The SDK package contains header files libraries and code examples To compile the examples provided with the SDK you must install Windows Driver Kit WDK version 8 1 and higher over Visual Studio 2013 To open the SDK package you must run the sdk exe file and get the complete list of files SDK package can be found under lt installation_directory gt B SDK It is highly recommended to program the applications over the ND API and not over the IBAL API In WinOF Rev 5 10 the interface version for the IBAL API was updated Therefore in order for applications that were compiled with previous SDKs to work with WinOF Rev 5 10 they must be re compiled with the ne
94. e embedded OpenSM in the WinOF package for testing purpose in small cluster Oth E erwise we recommend using OpenSM from FabricIT EFM or UFM or MLNX OS LP OpenSM can run as a Windows service and can be started manually from the following directory installation directory Mools OpenSM as a service will use the first active port unless it receives a specific GUID OpenSM can be registered as a service from either the Command Line Interface CLI or the PowerShell The following are commands used from the CLI To register it as a service execute the OpenSM service gt Sc create OpenSM binPath c Program Files Mellanox MLNX VPINIBNTools Vopensm exe service start auto gt To start OpenSM as a service gt sc start OpenSM gt Torun OpenSM manually gt opensm exe For additional run options enter opensm exe h The following are commands used from the PowerShell gt To register it as a service execute the OpenSM service gt New Service Name OpenSM BinaryPathName C Program Files Mel lanox MLNX VPI IB Tools opensm exe service L 128 DisplayName OpenSM Description OpenSM for IB subnet StartupType Automatic Mellanox Technologies 59 J Rev 5 10 To start OpenSM as a service run gt Start Service OpenSM1 Notes For long term running please avoid using the v verbosity option to avoid exceeding disk quota Running OpenSM on multiple servers may lead to
95. e following license agreement carefully Copyright c 2005 2015 Mellanox Technologies All rights reserved Redistribution and use in source and binary forms with or without modification are permitted provided that the following conditions are met Redistributions of source code must retain the above copyright notice this list of conditions and the following disclaimer v Nadi dmi nnn in h Faw mant oA tha h O 1 accept the terms in the license agreement I do not accept the terms in the license agreement InstallShield __ aa j cec CT Step 8 Select the target folder for the installation Destination Folder Click Next to install to this folder or click Change to install to a differen Install MLNX_VPI to C Program Files Mellanox MLNX_VPI InstallShield 1 MT WMI default value is True 2 MT RESTORECONE default value is True Mellanox Technologies 15 Rev 5 10 Step 9 The firmware upgrade screen will be displayed in the following cases Ifthe user has an OEM card in this case the firmware will not be updated Ifthe user has a standard Mellanox card with an older firmware version the firmware will be updated accordingly However if the user has both OEM card and Mellanox card only Mellanox card will be updated Firmware Upgrade Vi Upgrade the HCA s firmware version Recommended Upgrading the firmware ver
96. e session run the following command logman stop Mellanox Kernel ets When opening a support ticket it is advised to attach the file to the ticket 5 8 State Dumping Upon several types of events the drivers can produce a set of files reflecting the current state of the adapter Automatic state dumps are done upon the following events Table 39 Events Causing Automatic State Dumps Event Type Description Provider Default Tag FATAL ERR The driver detects an error that does Mlx4 bus On f not allow the device to function nor mally and requires a reset CMD TIMEOUT Timeout on a command sent to HCA Mlx4 bus On c EQ_STUCK The driver decides that an Event MIx4eth63 On e Queue is stuck TXCQ STUCK The driver decides that the transmit Mlx4eth63 On t completion queue is stuck Mellanox Technologies 159 Rev 5 10 Table 39 Events Causing Automatic State Dumps Event Type Description Provider Default Tag PORT STATE Adapter passes to port down state or Mlx4eth63 Off p port unknown state ON IOCTL User application asks to generate Mlx4 bus N A U dump files where Provider The driver creating the set of files Default Whether or not the state dumps are created by default upon this event Tag Part of the file name used to identify the event that has triggered the state dump PORT STATE events can be enabled by adding Ethernet Mode Flags DWORD32 p
97. e startup Scripts A Step 1 Change the Windows PowerShell execution policy To change the execution policy please refer to Step 1 in Section 3 3 1 PowerShell Configuration on page 67 Step2 Remove the entire previous QoS configuration PS Remove NetQosTrafficClass PS Remove NetQosPolicy Confirm False Step3 Set the DCBX Willing parameter to false as Mellanox drivers do not support this feature PS set NetQosDcbxSetting Willing 0 Step 4 Create a Quality of Service QoS policy and tag each type of traffic with the relevant prior ity In this example TCP UDP use priority 1 SMB over TCP use priority 3 PS New NetQosPolicy DEFAULT store Activestore Default PriorityValue8021Action 3 PS New NetQosPolicy TCP store Activestore IPProtocolMatchCondition TCP Priority Value8021Action 1 PS New NetQosPolicy UDP store Activestore IPProtocolMatchCondition UDP Priority Value8021Action 1 New NetQosPolicy SMB SMB PriorityValue8021Action 3 Step 5 Create a QoS policy for SMB over SMB Direct traffic on Network Direct port 445 PS New NetQosPolicy SMBDirect store Activestore NetDirectPortMatchCondition 445 PriorityValue8021Action 3 Step 6 Optional If VLANs are used mark the egress traffic with the relevant VlanID The NIC is referred as Ethernet 4 in the examples below PS Set NetAdapterAdvancedProperty Name Ethernet 4 RegistryKeyword VlanID Reg istryValue 55 Step 7 Optional Con
98. e supported RoCE modes depend on the firmware installed If the firmware does not support the needed mode the fallback mode would be the maximum supported RoCE ae mode of the installed NIC RoCE mode can be enabled and disabled via PowerShell gt To enable RoCEv1 using the PowerShell Open the PowerShell and run PS Set MlnxDriverCoreSetting RoceMode 1 To enable RoCE using the PowerShell Open the PowerShell and run PS Set MlnxDriverCoreSetting RoceMode 2 To disable any version of RoCE using the PowerShell e Open the PowerShell and run PS Set MlnxDriverCoreSetting RoceMode 0 gt To check current version of RoCE using the PowerShell e Open the PowerShell and run PS Get MlnxDriverCoreSetting Mellanox Technologies 38 J Rev 5 10 Example output Caption DriverCoreSettingData mlx4 bus Description Mellanox Driver Option Settings RoceMode T0 3 1 5 Teaming and VLAN Windows Server 2012 and above supports Teaming as part of the operating system Please refer to Microsoft guide NIC Teaming in Windows Server 2012 following the link below http www microsoft com en us download confirmation aspx id 40319 Mellanox WinOF drivers provide teaming solutions for Windows Server 2008R2 operating sys tem and client operating systems namely Windows 7 and Windows 8 1 3 1 5 1 System Requirements Ethernet teaming is supported only in Windows 7 Client Windows 8 1 client and Windows Server 2008R2
99. e with minimum latency and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized mes sage size customized number of iterations or alternatively customized test duration time nd_write_lat runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation nd read bw This test is used for performance measuring of RDMA Read requests in Microsoft Windows Operating Systems nd read bw is performance oriented for RDMA Read with maximum throughput and runs over Microsoft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized mes sage size customized number of iterations or alternatively customized test duration time nd read bw runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation nd read lat This test is used for performance measuring of RDMA Read requests in Microsoft Windows Operating Systems nd read lat is performance oriented for RDMA Read with minimum latency and runs over Micro soft s NetworkDirect standard The level of customizing for the user is relatively high User may choose to run with a customized message size customized number of iterations or alternatively customized test duration time nd read lat runs with all message sizes from 1B to 4MB powers of 2 message inlining CQ moderation nd send
100. econd connection name gt perf tuning exe d cl first connection name gt c2 second connection name gt perf tuning exe f cl first connection name c2 second connection name gt perf tuning exe m cl first connection name b base RSS processor number n number of RSS processors perf tuning st cl first connection name gt c2 second connection name gt Options Table 23 Performance Tuning Tool Application Options Flag Description S Single port traffic scenario This option can be followed by one or two connection names The tuning will restore the default settings on the second connection and performed on the first connection This option automatically sets SendCompletionMethod 0 e RecvCompletionMethod 2 e ReceiveBuffers 1024 In Operating Systems support NDIS6 3 RssProfile 4 Additionally this option chooses the best processors to assign to DefaultRecvRingProcessor TxInterruptProcessor TxForwardingProcessor n Operating Systems support NDIS6 2 RssBaseProcNumber MaxRssProcessors n Operating Systems support NDIS6 3 NumRSSQueues RssMaxProcNumber Mellanox Technologies 122 Rev 5 10 Table 23 Performance Tuning Tool Application Options Flag Description d Dual port traffic scenario This option must be followed by two connection names The tuning in this case is code pendent This option automatically sets SendCompleti
101. ed Audiences dero Ren Eee pe we bsp dede beh etta 8 Documentation Conventions 0 ce cece eee een een eens 8 Common Abbreviations and Acronyms 000 c ccc ee eee 9 Related Documents ok lus a c ERR ede ieee RS ES EE MS 10 Chapter 1 Introduction ovr s s s ERE GES Coke ECR UREE CEA x II 1 1 Supplied Packages ee n 12 1 2 WinOF Set of Documentation 0 0 cc cc ee een ees 12 1 3 Windows MPI MS MPI 0000s cece n 12 Chapter 2 Installation ss s ss s s s s ceed x e eg ERU OOD PXECONI UY Cre RE Tere Ld 2 Hardware and Software Requirements 2 0 0 0 ccc cece eee ees 13 2 2 Downloading Mellanox WinOF Diver 00 00 cee eee eee eee 13 2 3 Installing Mellanox WinOF Driver 0 0 cece ee eee 14 2 3 1 Attended Installation lisse 14 2 3 2 Unattended Installation 00 00 19 2 4 Tnstallation Resultsy sars Saaf Ed RR EXER EE 20 2 5 Extracting Files Without Running Installation aasa 21 2 6 Uninstalling Mellanox WinOF Dyer 23 2 6 1 Attended Uninstallation llle 23 2 6 2 Unattended Uninstallation 0 0 00 23 24 gt Firmwares perade ss cues de dea cea Ee CU RR eatin Pede gt 24 2 8 Upgrading Mellanox WinOF Driver 00 cee eee eens 24 2 9 Booting Windows from an iSCSI Target nuunuu unaaana ane 24 2 9 1 Configuring the WDS DHCP and iSCSI Servers 0 0 00 000 24 2 9 2 Configuring the Client Machine eee ene 25 93 Installing AS CSI Sei ciate eae
102. ede PrseS sete ches 167 B 8 Running MPI Command Examples 00 c eee ee eee eee 168 Mellanox Technologies 5 J Rev 5 10 List of Tables Table 1 Document Revision History 0 00 ccc ccc cent e teen nee 2 Table 2 Documentation Conventions en 8 Table 3 Abbreviations and Acronyms 0 0 c ec cece eect nent n nee 9 Table 4 Related Documents 00 ence I n eens 10 Table 5 Hardware and Software Requirements nauuna nnana cee eee 13 Table 6 Reserved IP Address Options 00 ccc ccc cette een enna 25 Table 7 DSCP Registry Keys Settings nnn nnn annann 53 Table 8 DSCP Default Registry Keys Settings 0 0 ccc cee eee nee 53 Table 9 Lossless TCP Associated Events 0 0 eee ene eee s 57 Table 10 Registry Keys Setting 2 0 0 cece a 58 Table Ll Basic Registry Keys 1o dee mper v em oa ep eate angle Hele Gale wales 94 Table 12 Off load Registry Keys 97 Table 13 Performance Registry Keys sese n 99 Table I4 Etherriet Registry Keys e eene con E REOR TENE ZR D EI 104 Table 15 Flow Control Options x s 107 Table t6 VMQ Options lt i 3 40530 fb bwRPPLRHPRRRELRRECHWES RI A ev im Bee aE RU Uv EC ROUTE 108 Table 17 IPoIB Registry Keys 0 ccc e ms 109 Table 18 General Registry Values e 111 Table 19 SRIOV Registry Keys ssseseeee eh hs 112 Table20 RoGE Opt OnSs iret eerte eee re eec VOR Reino TT Ce e Ve eR aF 11
103. efer to Section 3 1 1 2 Port Protocol Configuration on page 28 3 1 1 2 Port Protocol Configuration Step 1 Display the Device Manager and expand System devices File Action View Help 4 bl ROI RSS Rs p T Ports COM amp LPT p men Print queues gt D Processors b lt Storage controllers 4 Mi System devices R ACPI Fixed Feature Button J Composite Bus Enumerator pM Direct memory access controller ji Generic Bus pill Intel R 5000 Series Chipset Error Reporting Registers 25F0 pl Intel R 5000 Series Chipset Error Reporting Registers 25FQ R Intel R 5000 Series Chipset Error Reporting Registers 25FQ pM Intel R 5000 Series Chipset FBD Registers 25F5 op Intel R 5000 Series Chipset FBD Registers 25F6 op Intel R 5000 Series Chipset PCI Express x4 Port 3 25E3 op Intel R 5000 Series Chipset PCI Express x4 Port 5 25E5 op Intel R 5000 Series Chipset PCI Express x4 Port 6 25E6 pM Intel R 5000 Series Chipset PCI Express x4 Port 7 25E7 pM Intel R 5000 Series Chipset PCI Express x8 Port 2 3 25F7 78S Intel R 5000 Series Chipset Reserved Registers 25F1 R Intel R 5000 Series Chipset Reserved Registers 25F3 pill Intel R 5000X Chipset Memory Controller Hub 25C0 gE Intel R 5000X Chipset PCI Express x16 Port 4 7 25FA pM Intel R 6311ESB 6321ESB PCI Express Downstream Port E1 3510 785 Intel R 6311ESB 6321ESB PCI Express to PCI X Bridge 350C JM Intel R 6311ESB 6321ESB PCI E
104. ent 53 mid bus Sub bered 5 B Subscription E Properties General Details QB Find SriovMaster 130 0 0 SRIOV was successfully enabled Running in master mode H Save All Events Attach a Task T Log Name System Source md bus Logged 12 11 2014 10 47 52 AM Event ID 53 Task Category None Level Warning Keywords Classic User N A Computer reg l vrt 053 mtl labs minx OpCode More Information Event Log Online Help Clear Log Y Filter Current L E Event Properties 3 Attach Task To 3 5 4 5 Configuring Operating Systems 3 5 4 5 1 Configuring Virtual Machine Networking InfiniBand SR IOV Only For further details on enabling configuring SR IOV on KVM please refer to the section titled Single Root IO Virtualization SR IOV in Mellanox OFED for Linux User Manual 3 5 4 5 2 Configuring Virtual Machine Networking Ethernet SR IOV Only To configure Virtual Machine networking Step 1 Create an SR IOV enabled Virtual Switch over Mellanox Ethernet Adapter Go to Start gt Server Manager gt Tools gt Hyper V Manager In the Hyper V Manager Actions gt Virtual SwitchManager gt External Create Virtual Switch Step2 Set the following Name External network Enable single root I O virtualization SR IOV Mellanox Technologies 88 Rev 5 10 Figure 11 Virtual Switch with SR IOV Z Virtual Switches T New virtual network switch
105. eout e l time a packet may wait for a free descriptor The 65535 enable value is in units of 100 microseconds with inaccu d0 racy of up to 2 units The chosen time ranges between 100 microseconds and 6 5 seconds For example DelayDropTimeout 3000 limits the wait time to 300 miliseconds 200 microsec onds Choosing the value of 65535 enables the feature but the amount of time a packet may wait for a free descriptor 1s infinite Note Changing the value of the DelayDropTime out registry key requires restart of the network interface 3 1 11 8 2Entering Exiting Lossless Mode Using Set OID OID MLX DROPLESS MODE In order to enter poll mode registry value of DelayDropTimeout should be non zero and OID MLX DROPLESS MODE Set OID should be called with Information Buffer containing 1 e OID MLX DROPLESS MODE value 0xFFA0C932 OID Information Buffer Size 1 byte OID Information Buffer Contents 0 exit poll mode 1 enter poll mode 3 1 11 9 Monitoring Lossless TCP State In order to allow state transition monitoring events are written to event log with mlx4 bus as the source The associated events are listed in Table 9 Table 9 Lossless TCP Associated Events Event ID Event Description 0x0057 Device Name Dropless mode entered on port X Packets will not be dropped 0x0058 Device Name Dropless mode exited on port X Drop mode entered packets may now be dropped 0x0059 De
106. equent user intervention Unattended Installation An automated installation procedure that requires no user intervention Both Attended and Unattended installations require administrator privileges A 2 3 1 Attended Installation The following is an example ofa MLNX WinOF win2012 x64 installation session Step 1 Double click the exe and follow the GUI instructions to install MLNX WinOF As of MLNX WinOF v4 55 the log option is enabled automatically The default path of the log is SLOCALAPPDATASMMLNX WinOF 1og0 Step 2 Optional Manually configure your setup to contain the logs option gt MLNX VPI WinOF 5 10 All win2012 x64 exe v l vx LogFile Step 3 Optional If you do not want to upgrade your firmware version gt MLNX VPI WinOF 5 10 All win2012 x64 exe v MT SKIPFWUPGRD 1 1 MT SKIPFWUPGRD default value is False Mellanox Technologies 14 J Rev 5 10 Step 4 Optional If you want to control the installation of the WMI CIM provider gt MLNX VPI WinOF 5 10 All win2012 x64 exe v MT WMI 1 Step 5 Optional If you want to control whether to restore network configuration or not MLNX VPI WinOF 5 10 All win2012 x64 exe v MT RESTORECONF 1 For further help please run MLNX VPI WinOF 5 10 All win2012 x64 exe v h Step 6 Click Next in the Welcome screen Step 7 Read then accept the license agreement and click Next License Agreement Please read th
107. er to section 4 2 part_man Virtual IPoIB Port Creation Util ity on page 138 Step 2 Assign Port IPs to the new interfaces For further details please refer to 3 1 2 Assigning Port IP After Installation on page 30 Make sure the OpenSM using the partitions configuration the physical to virtual P PKey table mapping and the new interfaces were all configured over the same physical ae port gt To assign a non default PKey to the physical IPoIB port on a Windows virtual machine over a Linux host On the Windows VM Step 1 Disable the driver on the port or disable the bus driver with all the ports it carries through the device manger On the Linux host Step 2 Configure the OpenSM to recognize the partition you would like to add Mellanox Technologies 62 J Rev 5 10 For further details please refer to the section titled Partitions in Mellanox OFED for Linux User Manual Step 3 Map the physical PKey table to the virtual PKey table used by the VM in the following way Map the physical Pkey index you would like to use for the physical port to index 0 in the virtual Pkey table Map the physical PKey index of the default PKey index 0 to any index for example index1 in the virtual PKey table For further details please refer to the section titled Partitioning IPoIB Communication using PKeys in Mellanox OFED for Linux User Manual On the Windows VM Step 4 Enable the drivers which were di
108. erShell win dow Mellanox Technologies 155 Rev 5 10 5 4 1 General Diagnostic Issue 1 Go to Device Manager locate the Mellanox adapter that you are debugging right click and choose Properties and go to the Information tab e PCI Gen 1 should appear as PCI E 2 5 GT s e PCI Gen 2 should appear as PCI E 5 0 GT s PCI Gen 3 should appear as PCI E 8 0 GT s Link Speed 56 0 Gbps 40 0Gbps 10 0Gbps Issue2 To determine if the Mellanox NIC and PCI bus can achieve their maximum speed it s best to run nd send bw in a loopback On the same machine 1 Run start b affinity 0x1 nd send bw S IP host where IP host is the local IP 2 Run start b affinity 0x2 nd send bw C IP host 3 Repeat for port 2 with the appropriate IP 4 On PCI Gen3 the expected result is around 5700MB s On PCI Gen2 the expected result is around 3300MB s Any number lower than that points to bad configuration or installation on the wrong PCI slot Malfunctioning QoS settings and Flow Control can be the cause as well Issue3 To determine the maximum speed between the two sides with the most basic test 1 Run nd send bw S lt IP_host1 gt on machine 1 where lt IP_host1 gt is the local IP 2 Run nd send bw C lt IP_host1 gt on machine 2 3 Results appear in Gb s Gigabits 2 30 and reflect the actual data that was trans ferred excluding headers 4 If these results are not as expected the problem is most p
109. ernel configured not necessarily HKLM SYSTEM CurrentControl debugger is configured physically connected then Set Services NDIS Parameters in Windows server 2012 the flow control might be Type REG_DWORD and above disabled e Key name AllowFlowControlUn derDebugger Value 1 Package drop or low Might be a lack of QoS and Check the configured settings for all of performance on spe Flow Control settings con the QoS options Open a PowerShell cific traffic class figuration or their miscon prompt and use Get NetAdapterQos figuration To achieve maximum performance all of the following must exist All of the hosts switches and routers should use the same matching flow control settings If Global pause is used all devices must be configured for it If PFC Priority Flow control is used all devices must have match ing settings for all priorities ETS settings that limit speed of some priorities will greatly affect the out put results Make sure Flow Control is enabled on the Mellanox Interfaces enabled by default Go to the device man ager right click the Mellanox inter face go to Advanced and make sure Flow control is enabled for both TX and RX To eliminate QoS and Flow control as the performance degrading factor set all devices to run with Global Pause and rerun the tests Set Global pause on the switches routers e Run Disable NetAdapterQos on all of the hosts in a Pow
110. ernet packet uses the general MTU value whereas the RoCE packet uses the ROCE MTU The valid values are 256 512 e 1024 e 2048 Note This registry key is supported only in Ether net drivers Mellanox Technologies 104 Rev 5 10 Table 14 Ethernet Registry Keys Value Name Detaule Description Value Priority VLANTag 3 Packet Enables sending and receiving IEEE 802 3ac Priority amp tagged frames which include VLAN e 802 1p QoS Quality of Service tags for priority Enabled tagged packets 802 1Q tags for VLANs When this feature is enabled the Mellanox driver supports sending and receiving a packet with VLAN and QoS tag Promiscuous Vlan 0 Specifies whether a promiscuous VLAN is enabled or not When this parameter is set all the packets with VLAN tags are passed to an upper level without executing any filtering The valid values are e 0 disable e 1 enable Note This registry value is not exposed via the UI UseRSSForRawIP 1 The execution of RSS on UDP and Raw IP pack ets In a forwarding scenario one can improve the performance by disabling RSS on UDP or a raw packet In such a case the entire receive process ing of these packets is done on the processor that was defined in DefaultRecvRingProcessor registry key The valid values are e 0 disable e 1 enable This is also relevant for IPoIB Note This registry value is not exposed via the UL
111. ettings This option can be followed by one or two connection names This option automatically sets the driver registry values back to their default values SendCompletionMethod 0 IPoIB 1 ETH e RecvCompletionMethod 2 e ReceiveBuffers 1024 e UseRSSForRawIP 1 e DefaultRecvRingProcessor 1 TxInterruptProcessor 1 e TxForwardingProcessor 1 UseRSSForUDP 1 In Operating Systems support NDIS6 2 MaxRssProcessors 8 In Operating Systems support NDIS6 3 NumRSSQueues 8 cl Specifies first connection name See examples c2 Specifies second connection name See examples b Specifies base RSS processor number See examples Used for manual option m only n Specifies number of RSS processors See examples Used for manual option m only Mellanox Technologies 124 Rev 5 10 Table 23 Performance Tuning Tool Application Options Flag Description st Single stream traffic scenario This option must be followed by one or two connection names for an Ethernet adapter The tuning will restore the default settings on the second connection and performed on the first connection This option automatically sets SendCompletionMethod 0 RecvCompletionMethod 2 ReceiveBuffers 1024 In Operating Systems support NDIS6 3 RssProfile 4 Additionally this option chooses the best processors to assign to DefaultRecvRingProcessor TxInterruptProcessor
112. evel and Network Interface level 1 e using Set NetAdapterSriov Power shell cmdlet The number of VFs actu ally available to the Network Interface is the minimum value between mella nox bus driver configuration and Net work Interface configuration For example if 8 VFs support was burnt in firmware SriovPortMode is auto port and Network Interface was allowed 32 VFs using SetNetAdapt erSriov Powershell cmdlet the actual number of VFs available to Network Interface will be 8 MaxVFPort1 REG DWORD 16 default MaxVFPort lt i gt The maximum number MaxVFPort2 of VFs that are allowed per port This is the number of VFs the bus driver will open when working in manual mode Note If the total number of VFs requested is larger than the number of VFs burnt in firmware each port X 1 2 will have the number of VFs according to the following formula MaxVFPortX Max VPortl Max VPort2 number of VFs burnt in firm ware Mellanox Technologies 113 Rev 5 10 3 6 9 2 RoCE Options The following registry configuration is available for RoCE under HKEY LOCAL MACHINE SYSTEM CurrentControlSet Services m1x4_bus Parame ters Roce This registry is per driver and it will apply to all available adapters Table 20 RoCE Options Parameter tion port Note that in order to communi cate with RoCE v2 all machines in a fabric must be configured with the same value for the UDP
113. eys that are used to control the NDIS Virtual Machine Queue VMQ The VMQ supports Microsoft Hyper V network performance and is supported on Win dows Server 2008 R2 and above Mellanox Technologies 107 Rev 5 10 For more details about VMQ please refer to Microsoft web site http msdn microsoft com en us library windows hardware ff571034 v vs 85 aspx Table 16 VMQ Options Value Name Default Value Description VMQ 1 The support for the virtual machine queue VMQ fea tures of the network adapter The valid values are e 1 enable e 0 disable RssOrVmaPreference 0 Specifies whether VMQ capabilities should be enabled instead of receive side scaling RSS capabilities The valid values are 0 Report RSS capabilities 1 Report VMQ capabilities Note This registry value is not exposed via the UI VMQLookaheadSplit 1 Specifies whether the driver enables or disables the abil ity to split the receive buffers into lookahead and post lookahead buffers The valid values are 0 disable e 1 enable VMQVlanFiltering 1 Specifies whether the device enables or disables the abil ity to filter network packets by using the VLAN identi fier in the media access control MAC header The valid values are e 0 disable e 1 enable MaxNumV mqs 127 The number of VMQs that the device supports in paral lel This parameter can effect memory consumption of the interface since for each VMQ the dri
114. f MFT tools used to simplify firmware configuration The tool is available with MFT tools 3 6 0 or higher Step 1 Download MFT for Windows www mellanox com gt Products gt Software gt Firmware Tools Step2 Get the device ID look for the pcicont string in the output gt mst status Example MST devices mt4103 pci cro mt4103 pciconf0 Mellanox Technologies 82 Rev 5 10 Step 3 Check the current SR IOV configuration gt mlxconfig d mt4103 pciconf0 q Example Device 1 Device type ConnectX3Pro PCI device mt4103 pciconf0 Configurations Current SRIOV EN N A NUM OF VFS N A WOL MAGIC EN P2 N A LINK TYPE Pl N A LINK TYPE P2 N A Step 4 Enable SR IOV with 16 VFs gt mlxconfig d mt4103 pciconf0 s SRIOV EN 1 NUM OF VFS 16 Warning Care should be taken in increasing the number of VFs All servers HW are guaranteed to support 16 VFs More VFs can lead to exceeding the BIOS limit of MMIO available address space Example Device 1 Device type ConnectX3Pro PCI device mt4103 pciconf0 Configurations Current New SRIOV EN N A 1 NUM_OF VFS N A 16 WOL MAGIC EN P2 N A N A LINK TYPE Pl N A N A LINK TYPE P2 N A N A Apply new Configuration y n n y Applying Done I Please reboot machine to load new configurations Step 5 Reboot the machine After the reboot continue to Section 3 5 4 4 2 Enabling SR IOV in Mellanox WinOF Package Ethernet SR IOV Only on page 84 gt
115. figure the IP address for the NIC Mellanox Technologies 47 J Rev 5 10 If DHCP is used the IP address will be assigned automatically PS Set NetIPInterface InterfaceAlias Ethernet 4 DHCP Disabled PS Remove NetIPAddress InterfaceAlias Ethernet 4 AddressFamily IPv4 Con firm Sfalse PS New NetIPAddress InterfaceAlias Ethernet 4 IPAddress 192 168 1 10 Prefix Length 24 Type Unicast Step 8 Optional Set the DNS server assuming its IP address is 192 168 1 2 PS Set DnsClientServerAddress InterfaceAlias Ethernet 4 ServerAddresses UVR WS IR ST After establishing the priorities of ND NDK traffic the priorities must have PFC enabled on them Step 9 Disable Priority Flow Control PFC for all other priorities except for 3 PS Disable NetQosFlowControl 0 1 2 4 5 6 7 Step 10 Enable QoS on the relevant interface PS Enable NetAdapterQos InterfaceAlias Ethernet 4 Step 11 Enable PFC on priority 3 PS Enable NetQosFlowControl Priority 3 Step 12 Configure Priority 3 to use ETS PS New NetQosTrafficClass name SMB class priority 3 bandwidthPercentage 50 Algorithm ETS To add the script to the local machine startup scripts Step 1 From the PowerShell invoke gpedit msc Step2 In the pop up window under the Computer Configuration section perform the following 1 Select Windows Settings 2 Select Scripts Startup Shutdown 3 Double click Startup to open the
116. g and Fail Over Settings Mellanox Team Name femi Team Type Fault Tolerance Primary Mellanox ConnectX 3 Ethernet Adapter lt Failback to Primary IV Use primary Mac Address Select the adapters to include in the team Adapter Name Status Role Mellanox ConnectX 3 Ethernet Adapter O Mellanox ConnectX 3 Ethemet Adapter 2 Commit Cancel Teaming provides Load Balancing and Fail Over The administrator can configure a team of adapters and associate up to 8 Mellanox ConnectX adapters to this team Teaming should be used to increase the system reliability upon a link failure and to balance the workload xl alan alana anadan The newly created virtual Mellanox adapter representing the team will be displayed by the Device Manager under Network adapters in the following format see the figure below Mellanox Virtual Miniport Driver Team lt team_name gt File Action View Help es mung m dh 0 R Computer ee Disk drives Y Me Display adapters f DVD CD ROM drives ce Floppy drive controllers US Human Interface Devices Cg IDE ATA ATAPI controllers IEEE 1394 Bus host controllers Keyboards n Mice and other pointing devices amp Monitors Network adapters x Broadcom BCM5708C Netxtreme II GigE NDIS VBD Client Broadcom BCMS708C Netxtreme II GigE NDIS VBD Client 2 Mellanox Connect MT25418 DDR Channel Adapter Mellanox Connect 10Gb Ethernet Adapter KC Mellanox ConnectX 10Gb E
117. gured for OpenSM This can allow partitioning of the IPoIB traffic between the different virtual IPoIB interfaces To create a new interface on a new PKey on a native Windows machine Step 1 Configure OpenSM to recognize the partition you would like to add For further details please refer to the section titled Partitions in Mellanox OFED for Linux User Manual Step 2 Create a new interface using the part man tool For further details please refer to section 4 2 part_man Virtual IPoIB Port Creation Util ity on page 138 Step 3 Assign Port IPs to the new interfaces For further details please refer to 3 1 2 Assigning Port IP After Installation on page 30 Make sure the OpenSM using the partitions configuration and the new interfaces were configured to run over the same physical port gt To create a new interface on a new PKey on a Windows virtual machine over a Linux host On the Linux host Step 1 Configure the OpenSM to recognize the partition you would like to add For further details please refer to the section titled Partitions in Mellanox OFED for Linux User Manual Step 2 Map the physical PKey table to the virtual PKey table used by the VM For further details please refer to the section titled Partitioning IPoIB Communication using PKeys in Mellanox OFED for Linux User Manual On the Windows VM Step 1 Create a new interface using the part man tool For further details please ref
118. he following CMD Name CMD ID Total times Min Time ms Max Time ms Last cmd ms Average Time ms invoked Mellanox Technologies 148 Rev 5 10 The parameters used in this command are lt bus gt lt device gt lt function gt Example The PCI information can be queried from the General properties tab under Location If the Location is PCI Slot 3 PCI bus 8 device 0 function 0 run the following command mlxtool dbg cmd stats 8 0 0 4 6 1 4 pkeys Tool This tool displays the pkeys indexes and values available for each IPoIB interface Example If you wish to display the information of Ethernet 5 interface run the following command mlxtool dbg pkeys Ethernet 5 This command can be invoked on a specific IPoIB interface If no interface name is provided the information will be shown for all the interfaces ConnectX IPoIB NIC Ethernet 7 PKEY index PKEY 9 l meee 1 123 2 9563 4 6 2 show Tool This tool is used to show specific information Usage mlxtool exe show tool name tool arguments 4 6 2 1 show port list Tool This tool is used to show the Ethernet and IPoIB port list Usage mlxtool exe show ports 4 6 2 2 show device list Tool This tool is used to show the PCI list for devices Usage mlxtool exe show devices Mellanox Technologies 149 Rev 5 10 5 Troubleshooting You may be able to easily resolve the
119. he same physical network or on the same logical network must have the same MTU Receive Buffers The number of receive buffers default 1024 Send Buffers The number of sent buffers default 2048 Performance Options Configures parameters that can improve adapter performance Interrupt Moderation Moderates or delays the interrupts generation Hence optimizes network throughput and CPU uti lization default Enabled When the interrupt moderation is enabled the system accumulates interrupts and sends a single interrupt rather than a series of interrupts An interrupt is generated after receiving 5 packets or after 10ms from the first packet received It improves performance and reduces CPU load however it increases latency When the interrupt moderation is disabled the system generates an interrupt each time a packet is received or sent In this mode the CPU utilization data rates increase as the system handles a larger number of interrupts However the latency decreases as the packet is handled faster Receive Side Scaling RSS Mode Improves incoming packet processing performance RSS enables the adapter port to utilize the multiple CPUs in a multi core system for receiving incoming packets and steering them to the des ignated destination RSS can significantly improve the number of transactions the number of con nections per second and the network throughput This parameter can be set to one of the following
120. host controllers Texas Instruments 1394 OHCI Compliant Host Controller p amp Keyboards b A Mice and other pointing devices b RZ Monitors 4 S Network adapters Broadcom BCMS709C NetXtreme II GigE NDIS VBD Client 36 e Broadcom BCM5709C NetXtreme Il GigE NDIS VBD Client 37 S Broadcom BCMS709C NetXtreme Il GigE NDIS VBD Client 38 S Broadcom BCMS709C NetXtreme Il GigE NDIS VBD Client 39 S Mellanox ConnectX 3 IPolB Adapter Mellanox ConnectX 3 IPoIB Adapter 2 Mellanox ConnectX 3 Pro IPoIB Adapter Kr Mellanox ConnectX 3 Pro IPoIB Adapter 2 KP Mellanox Virtual Miniport Driver Team A b 9 Ports COM amp LPT b duh Print queues b B Processors b lt Storage controllers p gli System devices b Universal Serial Bus controllers To modify an existing team perform the following a Select the desired team and click Modify b Modify the team name and or the participating adapters c Click the Commit button Mellanox Technologies 66 Rev 5 10 gt To remove an existing team select the desired team and click Remove You will be prompted to approve this action Notes on this step a Each adapter that participates in a team has two properties Status Connected Disconnected Disabled Role Active or Backup b Each network adapter that is added or removed from a team gets refreshed i e disabled then enabled This may cause a temporary loss of connection to the adap
121. http kb vmware com selfservice micro sites search do cmd displayK C amp doc Type kc amp externalld 203298 1 amp sliceld 1 amp docTypeID DT_KB 1 1 amp dia logID 408420191 amp stateld 1 0 388456420 When enabling the VMQ in case NVGRE offload is enabled anda teaming of two virtual ports is performed no ping is detected between the VMs and or ping is detected but no establishing of TCP connection is possible Might be missing critical Microsoft updates Please refer to http support microsoft com kb 2975719 August 2014 update rollup for Win dows server RT 8 1 Windows server 8 1 and Windows server 2012 R2 specifically fixes Mellanox Technologies 157 Rev 5 10 Table 38 Virtualization Related Issues Issue Cause Solution In Hyper V environ The powershell command For further information on these registry ment Enable Net might depend on two regis keys please refer to AdapterVmq powershell try fields vmo and http msdn microsoft com en us command can enable RssOrvmgPreference library windows hardware VMQ on a network when the former is con hh451362 v vs 85 aspx adapter only if the vir trolled by powershell and tual switch which does the latter is controlled by not have SR IOV the virtual switch enabled is defined over corresponding network adapter Mellanox driver fails to The host machine cannot Increase the Log
122. iBand Driver Added section Advanced Configuration for Ethernet Driver Added section Updated section Tunable Performance Parameters Added section Merged Ethernet and InfiniBand features sections Removed section Sockets Direct Protocol and its sub sections Removed section Winsock Direct and Protocol and its subsections Removed section Added ConnectX 3 support Removed section IPoIB Drivers Overview Removed section Booting Windows from an iSCSI Tar get Mellanox Technologies 7 J Scope Rev 5 10 About this Manual The document describes WinOF Rev 5 10 features performance diagnostic tools content and configuration Additionally this document provides information on various performance tools supplied with this version Intended Audience This manual is intended for system administrators responsible for the installation configuration management and maintenance of the software and hardware of VPI InfiniBand Ethernet Con nectX 3 and ConnectX 3 Pro adapter cards It is also intended for application developers Documentation Conventions Table 2 Documentation Conventions Description Convention Example File names file extension Directory names directory Commands and their parameters command param 1 mts3610 1 gt show hosts Required item lt gt Optional item Mutually exclusive parameters pl p2 p3 or pl p
123. ick on a Mellanox ConnectX card gt properties Step3 Go to Details tab Step 4 Select the Driver key and obtain the nn number In the below example the index equals 0041 a Device Manager File Action View Help es mB Hm ms PRE General Port Protocol l Driver Details Events Resources B ntel R 5000 Series Chipset Reserved Registers 25F1 A Mellano Connect 3 MT04033 Network Adapter ntel R 5000 Series Chipset Reserved Registers 25F3 ntel R 5000X Chipset Memory Controller Hub 25CQ Property ntel R 5000X Chipset PCI Express x16 Port 4 7 25FA Dive key ntel R 6311ESB 6321ESB PCI Express Downstream Port E1 3510 ntel R 6311ESB 6321ESB PCI Express to PCI X Bridge 350C Value ntel R 6311ESB 6321ESB PCI Express Upstream Port 3500 4d36e97d e325 11 ce bfc1 08002be1 031 afon j R 631xESB 6321ESB 3100 Chipset LPC Interface Controller 2 R 631xESB 6321ESB 3100 Chipset PCI Express Root Port 1 2 R 631xESB 6321ESB 3100 Chipset PCI Express Root Port 2 2 R 631xESB 632 1ESB 3100 Chipset SMBus Controller 2698 ntel R 82801 PCI Bridge 244E Mellanox ConnectX 3 MTO4099 Network Adapter JE Mellanox ConnectX 3 M T04099 Network Adapter nte nte nte zt nte E DE EEE EE EE E E E 3 6 2 Finding the Index Value of the Network Interface To find the index value of your Network Interface from the Device Manager please perform the following steps
124. ine requires storage so that you can install an operating system You can specify the storage now or configure it later by modifying the virtual machine s properties Specify Name and Location 9 OS U ying iaa es Assign Memory Create a virtual hard disk Configure Networking Use this option to create a dynamically expanding virtual hard disk with the default Format VHDX Connect Virtual Hard Disk vmi vhdx Summary C Users Public Documents Hyper Virtual Hard Disks 127 GB Maximum 64 TB Use an existing virtual hard disk Use this option to attach an existing virtual hard disk either VHD or VHDX Format Location Win8Srv_DC_x64_fre_9200_vm1 vhd Browse Attach a virtual hard disk later Use this option to skip this step now and attach an existing virtual hard disk later lt Previous Einish Cancel 3 5 4 4 Configuring Mellanox Network Adapter for SR IOV The following are the steps for configuring Mellanox Network Adapter for SR IOV 3 5 4 4 1 Enabling SR IOV in Firmware For non Mellanox OEM branded cards you may need to download and install the new firm ware For the latest OEM firmware please go to http www mellanox com page oem firmware download As of firmware version 2 31 5000 SR IOV can be enabled and managed by using the mlxconfig too For older firmware versions use the flint tool To enable SR IOV using mlxconfig mlxconfig is part o
125. ions Section 4 6 mlxtool on page 148 Section 5 4 1 General Diagnostic on page 156 Section 3 6 9 MLX BUS Registry Keys on page 112 Added the following sections Section 3 6 9 MLX BUS Registry Keys on page 112 Section 5 8 State Dumping on page 159 Section 3 7 2 Win Linux nd rping Test on page 116 Section 3 1 4 3 RoCE v2 UDP Port on page 36 5 8 State Dumping on page 159 Rev 4 95 50000 April 30 2015 Updated the following sections e 3 7 1 Network Direct Interface on page 116 3 2 8 1 System Requirements on page 64 Added the following sections 3 1 13 Ignore Frame Check Sequence FCS Errors on page 59 3 8 1 1 Mellanox Specific Extensions to the ND Interface on page 118 e Section 5 7 Extracting WPP Traces on page 159 Moved IPoIB content under Section 3 2 InfiniBand Network on page 59 Mellanox Technologies 2 J Rev 5 10 Table 1 Document Revision History Document Revision Date Changes Rev 4 90 50000 January 15 2015 Restructured Section 5 Troubleshooting on page 150 Added the following sections Section 3 3 1 PowerShell Configuration on page 67 3 9 System Recovery upon Error Detection on page 136 Updated the following sections Section 2 3 2 Unattended Installation on page 19 Section 2 6 2 Unattended Uninstallation on page 23 Section 5 1 Installa
126. irect The Server Message Block SMB protocol is a network file sharing protocol implemented in Microsoft Windows The set of message packets that defines a particular version of the protocol is called a dialect The Microsoft SMB protocol is a client server implementation and consists of a set of data pack ets each containing a request sent by the client or a response sent by the server SMB protocol is used on top of the TCP IP protocol or other network protocols Using the SMB protocol allows applications to access files or other resources on a remote server to read create and update them In addition it enables communication with any server program that is set up to receive an SMB client request 3 4 1 1 System Requirements The following are hardware and software prerequisites Two or more machines running Windows Server 2012 and above One or more Mellanox ConnectX 3 or ConnectX 3 Pro adapters for each server One or more Mellanox InfiniBand switches Two or more QSFP cables required for InfiniBand Mellanox Technologies 68 Rev 5 10 3 4 1 2 SMB Configuration Verification 3 4 1 2 1 Verifying Network Adapter Configuration Use the following PowerShell cmdlets to verify Network Direct is globally enabled and that you have NICs with the RDMA capability Run on both the SMB server and the SMB client PS Get NetOffloadGlobalSetting Select NetworkDirect PS Get NetAdapterRDMA PS Get NetAdapterHardwareInfo
127. k Connect VMNetworkAdapter VMName mtlael4 005 SwitchName VSwMLNX Add VMNetworkAdapter VMName mtlael4 005 SwitchName VSwMLNX StaticMacAddress 00155D720100 Add VMNetworkAdapter VMName mtlael4 006 SwitchName VSwMLNX StaticMacAddress 00155D720101 B9 The commands from Step 2 4 are not persistent Its suggested to create script is run ning after each OS reboot Step 2 Configure a Subnet Locator and Route records on each Hyper V Host Host 1 and Host 2 mtlael4 amp mtlael5 New NetVirtualizationLookupRecord CustomerAddress 172 16 14 5 ProviderAddress 192 168 20 114 VirtualSubnetID 5001 MACAddress 00155D720100 Rule TranslationMethodEncap New NetVirtualizationLookupRecord CustomerAddress 172 16 14 6 ProviderAddress 192 168 20 114 VirtualSubnetID 5001 MACAddress 00155D720101 Rule TranslationMethodEncap New NetVirtualizationLookupRecord CustomerAddress 172 16 15 5 ProviderAddress 192 168 20 115 VirtualSubnetID 5001 MACAddress 00155D730100 Rule TranslationMethodEncap New NetVirtualizationLookupRecord CustomerAddress 172 16 15 6 ProviderAddress 192 168 20 115 VirtualSubnetID 5001 MACAddress 00155D730101 Rule TranslationMethodEncap Add customer route New NetVirtualizationCustomerRoute RoutingDomainID 11111111 2222 3333 4444 000000005001 VirtualSubnetID 5001 DestinationPrefix 172 16 0 0 16 NextHop 0 0 0 0 Metric 255
128. kets Received Sec The number of packets received per second that are cov ered by this priority Bytes Packets OUT Bytes Sent The number of bytes sent that are covered by this priority The counted bytes include framing characters modulo 2 64 Bytes Sent Sec The number of bytes sent per second that are covered by this priority The counted bytes include framing charac ters Packets Sent The number of packets sent that are covered by this prior ity modulo 2 64 Packets Sent Sec The number of packets sent per second that are covered by this priority Bytes and Packets Total Bytes Total The total number of bytes that are covered by this priority The counted bytes include framing characters modulo 2 64 Bytes Total Sec The total number of bytes per second that are covered by this priority The counted bytes include framing charac ters Packets Total The total number of packets that are covered by this prior ity modulo 2 64 Mellanox Technologies 134 Rev 5 10 Table 26 Mellanox Qos Counters Mellanox Qos Counters Description Packets Total Sec The total number of packets per second that are covered by this priority PAUSE INDICATION Sent Pause Frames The total number of pause frames sent from this priority to the far end port The untagged instance indicates the number of global pause frames that were sent Sent Pause Duration The
129. lanox WinOF 4 61 or higher Firmware version 2 30 8000 or higher 3 5 4 1 2 Feature Limitations e SR IOV is supported only in Ethernet ports and can be enabled if all ports are set as Ethernet RDMA i e RoCE capability is not available in SR IOV mode Mellanox Technologies 75 J Rev 5 10 3 5 4 2 SR IOV InfiniBand over KVM 3 5 4 2 1System Requirements A server and BIOS with SR IOV support BIOS settings might need to be updated to enable virtualization support and SR IOV support Hypervisor OS Linux KVM using SR IOV enabled drivers Virtual Machine VM OS The VM OS can be Windows Server 2008 R2 and above For further details about assigning a VF to the Windows VM please refer to steps 1 5 in the section titled Assigning the SR IOV Virtual Function to the Red Hat KVM VM Server in Mellanox OFED for Linux User Manual Mellanox ConnectX 3 ConnectX 3 Pro VPI Adapter Card family with SR IOV capability Mellanox WinOF 4 80 or higher Firmware version 2 30 8000 or higher 3 5 4 2 2 Feature Limitations Compared to Native InfiniBand e OpenSM and Infiniband Fabric Diagnostic Utilities listed in Table 29 Diagnostic Util ities on page 143 are not supported in guest OS ForaUD QP only SGID index 0 is supported The allocation of the GIDs per port in the VFs are accordingly 16 GIDs are allocated to the PF 2GIDs are allocated to every VF The remaining GIDs if such exist wi
130. lease refer to the MFT User Manual http www mellanox com gt Products gt Firmware Tools 5 1 1 3 Restore Configuration Warnings Table 34 Restore Configuration Warnings Error Code Description Troubleshooting 3 Failed to restore the configu Please see log for more details and contact the ration support team Mellanox Technologies 152 Rev 5 10 5 2 InfiniBand Related Troubleshooting Table 35 InfiniBand Related Issues Issue Cause Solution The InfiniBand inter Port status might be Enable switch admin or connect cable faces are not up after the PORT_DOWN first reboot after the Switch port state might be installation process is disabled or cable is dis completed connected Port status might be Run the SM on the fabric PORT INITIALIZED SM might not be running on the fabric Port status might be Please contact Mellanox Support PORT_ARMED Firmware issue 5 3 Ethernet Related Troubleshooting For further performance related information please refer to the Performance Tuning Guide and to Section 3 8 Performance Tuning and Counters on page 118 Table 36 Ethernet Related Issues Issue Cause Solution Low performance Non optimal system con See section Performance Tuning and figuration might have Counters on page 118 to take advantage occurred of Mellanox 10 40 56 GBit NIC perfor mance The driver fails to start There
131. led he Step 1 Set the system profile to be eth single switch and reset the system switch config system profile eth single profile Step2 Set the speed for the desired interface to 56GbE as follows For example for interface 1 1 switch config interface ethernet 1 1 switch config interface ethernet 1 1 speed 56000 switch config interface ethernet 1 1 Step 3 Verify the speed is 56GbE switch config show interface ethernet 1 1 Eth1 1 Admin state Enabled Operational state Down Description N A Mac address 00 02 c9 5d e0 26 MTU 1522 bytes Flow control receive off send off Actual speed 56 Gbps Switchport mode access Rx frames unicast frames multicast frames broadcast frames octets error frames discard frames L o M co ee cmd Mellanox Technologies 32 J Rev 5 10 Tx 0 frames 0 unicast frames 0 multicast frames 0 broadcast frames 0 octets 0 discard frames switch config 3 1 4 RDMA over Converged Ethernet RoCE Remote Direct Memory Access RDMA is the remote memory management capability that allows server to server data movement directly between application memory without any CPU involvement RDMA over Converged Ethernet RoCE is a mechanism to provide this efficient data transfer with very low latencies on loss less Ethernet networks With advances in data center convergence over reliable Ethernet ConnectX EN with RoCE uses the proven and efficient RDMA tr
132. ll be assigned to the VFs one GID to every VF starting from the lower VF Currently Mellanox IB Adapter Diagnostic Counters and Mellanox IB Adapter Traffic Counters are not supported Only Administrator assigned GUIDs are supported please refer to Mellanox OFED for Linux User Manual for instructions on how to configure Administrator assigned GUIDs 3 5 4 3 Configuring SR IOV Host Machines The following are the necessary steps for configuring host machines 3 5 4 3 1 Enabling SR IOV in BIOS Depending on your system perform the steps below to set up your BIOS The figures used in this section are for illustration purposes only For further information please refer to the appropriate BIOS User Manual To enable SR IOV in BIOS Step 1 Make sure the machine s BIOS supports SR IOV Please consult BIOS vendor website for SR IOV supported BIOS versions list Update the BIOS version if necessary Mellanox Technologies 76 J Rev 5 10 Step 2 Follow BIOS vendor guidelines to enable SR IOV according to BIOS User Manual For example a Enable SR IOV BIOS SETUP UTILITY Advanced Advanced PCI PnP Settings WARNING Setting wrong values in below sections Disabled may cause system to malfunction Enabled Clear NURAM No Plug amp Play O S Yes PCI Latency Timer 64 PCI IDE BusHaster Disabled Sloti PCI X OPROH Enabled Slot2 PCI X OPROH Enabled lot3 PCI X OPROH Enabled Select Screen Slot4 PCI E
133. llanox Technologies 9 J Rev 5 10 Related Documents Table 4 Related Documents Document Description MFT User Manual Describes the set of firmware management tools for a single Infini Band node MFT can be used for Generating a standard or customized Mellanox firmware image Querying for firmware information Burning a firmware image to a single InfiniBand node Enabling changing card configuration to support SRIOV WinOF Release Notes For possible software issues please refer to WinOF Release Notes MLNX OFED User Man For more information on SR IOV over KVM please refer to OFED ual User Manual InfiniBand Architecture The InfiniBand Specification by IBTA Specification Volume 1 Release 1 2 1 Mellanox Technologies 10 J Rev 5 10 1 Introduction This User Manual describes installation configuration and operation of Mellanox WinOF driver Rev 5 10 package Mellanox WinOF is composed of several software modules that contain InfiniBand and Ethernet drivers for ConnectX 3 and ConnectX 3 Pro adapter cards The Mellanox WinOF driver supports 10 40 or 56 Gb s Ethernet and 40 or 56 Gb s InfiniBand network ports The port type is deter mined upon boot based on card capabilities and user settings The Mellanox VPI WinOF driver release introduces the following capabilities Support for Single and Dual port Adapters e Up to 16 Rx queues per port Rxsteering mode RSS Hardw
134. lly Performances tools install the performance tools that are used to measure the InfiniBand perfor mance in user environment Analyze tools install the tools that can be used either to diagnosed or analyzed the InfiniBand envi ronment SDK contains the libraries and DLLs for developing InfiniBand application over IBAL Documentation contains the User Manual and Installation Guide Custom Setup Select the program features you want installed Click on an icon in the list below to change how a feature is installed Feature Description Installs the OpenSM tool which runs the Subnet manager SM Performace Tools Analysis Tools This feature is required only for SDK local testing of InfiniBand Documentation applications This feature requires 1180KB on your hard drive Install to C Program Files Mellanox MLNX_VPI IB Tools Change InstallShield Mellanox Technologies 17 Rev 5 10 b Click Install to start the installation Ready to Install the Program The wizard is ready to begin installation Click Install to begin the installation If you want to review or change any of your installation settings click Back Click Cancel to InstallShield Wizard Completed The InstallShield Wizard has successfully installed MLNX VPT Click Finish to exit the wizard You chose to run performance tuning The log File can be found at C windows System32
135. me Network Adapter VMName vml IovQueuePairsRequested 8 for 40GbE 3 8 1 7 Improving Live Migration In order to improve live migration over SMB direct performance please set the following regis try key to 0 and reboot the machine HKEY LOCAL MACHINE System CurrentControlSet Services LanmanServer Parameters RequireSe curitySignature 3 8 2 Application Specific Optimization and Tuning 3 8 2 1 Ethernet Performance Tuning The user can configure the Ethernet adapter by setting some registry keys The registry keys may affect Ethernet performance To improve performance activate the performance tuning tool as follows Step 1 Start the Device Manager open a command line window and enter devmgmt msc Step 2 Open Network Adapters Step3 Right click the relevant Ethernet adapter and select Properties Step 4 Select the Advanced tab Step 5 Modify performance parameters properties as desired 3 8 2 1 1 Performance Known Issues On Intel I OAT supported systems it is highly recommended to install and enable the latest I OAT driver download from www intel com With I OAT enabled sending 256 byte messages or larger will activate I OAT This will cause a significant latency increase due to I OAT algorithms On the other hand throughput will increase significantly when using I OAT 3 8 2 2 IPoIB Performance Tuning The user can configure the IPoIB adapter by setting some registry keys The registry keys may
136. me Any printable name without and and starting with an i If no iname is specified for an add command one will be auto generated by the tool This parameter which was previously mandatory is now optional for these commands pkey a 4 hex digit value It can be specified if a non default pkey should be used When using the add rem commands only one virtual adapter can be added or removed 3 in a single operation A Example Adding and removing a virtual adapter using defaults part man add Ethernet 4 ipoib 4 1 Dom chwer gt part_man show Ethernet 6 ipoib 4 1 FFFF S Pace Wem s cu IER STE GU Done ipoib 4 1 Mellanox Technologies 139 Rev 5 10 Adding and removing a virtual adapter using non defaults gt part_man add Ethernet 5 ipoib 5 1 F123 Done gt part_man show Ethernet 7 ipoib 5 1 E29 part man rem Ethernet 7 ipoib 5 1 F123 Or simply part man rem Ethernet 7 Adding a partial membership PKey value with the upper bit turned off part man add Ethernet 5 7123 The new port will use the partial PKey only in the absence of a full membership PKey of the same value 0xf123 for the example above in the OpenSM configuration Otherwise the full membership PKey will be chosen 9 Make sure that the PKeys used in the part man commands are supported by the OpenSM running on this port and the membership type of them is consistent with the one defined by OpenSM If the P
137. n all Hyper V hosts same command on all Hyper V hosts PS New NetVirtualizationCustomerRoute RoutingDomainID 11111111 2222 3333 4444 000000005001 VirtualSubnetID virtualsubnetID DestinationPrefix VMInterfaceIPAd dress Mask NextHop 0 0 0 0 Metric 255 Step 7 Configure the Provider Address and Route records on each Hyper V Host using an appro priate interface name and IP address PS NIC Get NetAdapter EthInterfaceName PS New NetVirtualizationProviderAddress InterfaceIndex SNIC InterfaceIndex Provid erAddress HypervisorInterfaceIPAddress PrefixLength 24 PS New NetVirtualizationProviderRoute InterfaceIndex NIC InterfaceIndex Destina tionPrefix 0 0 0 0 0 NextHop HypervisorInterfaceIPAddress Step 8 Configure the Virtual Subnet ID on the Hyper V Network Switch Ports for each Virtual Machine on each Hyper V Host Host 1 and Host 2 PS Get VMNetworkAdapter VMName lt VMName gt where MacAddress eq lt VMmacaddress1 gt Set VMNetworkAdapter VirtualSubnetID virtualsubnetID A Please repeat steps 5 to 8 on each Hyper V after rebooting the Hypervisor 3 5 3 4 Verifying the Encapsulation of the Traffic Once the configuration using PowerShell is completed verifying that packets are indeed encap sulated as configured is possible through any packet capturing utility If configured correctly an encapsulated packet should appear as a packet consisting of the following headers
138. n describes how to modify Windows registry parameters in order to improve performance Please note that modifying the registry incorrectly might lead to serious problems including the loss of data system hang and you may need to reinstall Windows As such it is recommended to back up the registry on your system before implementing a recommendations included in this section If the modifications you apply lead to serious problems you will be able to restore the original registry state For more details about backing up and restoring the registry please visit www microsoft com 3 8 1 General Performance Optimization and Tuning To achieve the best performance for Windows you may need to modify some of the Windows registries 3 8 1 1 Mellanox Specific Extensions to the ND Interface IND2QueuePairsPool The interface is an extension to the Network Direct SPI version 2 It reduces the creation time of the IND2QueuePair and IND2CompletionQueue interfaces hence improves the client server connection establishment time The interface exposes a pool of pre allocated IND2QueuePair and IND2CompletionQueue inter faces associated with it Pre allocation is done using a background thread when a pre configured threshold is reached The API for this interface is documented in the SDK header file ndspi_ext_mlx h Using IND2QueuePairsPool 1 Create a pool using IND2Adapter QueryInterface with ID IND2QueuePairsPool 2 Set pool configuration u
139. nce Tuning Tool Applica tion on page 122 Updated the Options table Section 3 8 2 Application Specific Optimization and Tuning on page 126 Removed the Bus master DMA Operations Section 3 2 2 OpenSM Subnet Manager on page 59 Added an option of how to register OpemSM via the PowerShell Section 3 5 3 3 1 Configuring the NVGRE using PowerShell on page 73 Rev 4 60 December 30 Added the following sections 2013 Section 3 1 8 Configuring Quality of Service QoS on page 46 Appendix A NVGRE Configuration Scripts Exam ples on page 163 Rev 4 55 December 15 Updated the following sections 2013 Section 3 1 5 Teaming and VLAN on page 39 Section 3 5 3 3 1 Configuring the NVGRE using PowerShell on page 73 November 07 Updated the following sections 2013 Section 3 1 4 1 2 Configuring Windows Host on page 34 October 03 2013 Added support for Windows Server 2012 R2 Mellanox Technologies 5 J Table 1 Document Revision History Rev 5 10 Document Revision Date Changes Rev 4 40 July 17 2013 Updated the following sections Section 3 1 4 RDMA over Converged Ethernet RoCE on page 33 Section 3 2 2 OpenSM Subnet Manager on page 59 Section 5 Troubleshooting on page 150 Added the following sections Appendix A NVGRE Configuration Scripts Examples on page 163 June 10
140. ne level must be used consistently to control SR IOV feature If both levels were used the per machine level of configuration will be enforced by the driver Registry Keys location for machine configuration HKLM SYSTEM CurrentControlSet Services mlx4_bus Parameters Registry Keys location for device configuration HKLM SYSTEM CurrentControlSet Control Class 4d36e97d e325 11ce bfc1 08002be10318 lt nn gt Parameters For more information on how to find device index nn please refer to 3 6 1 Finding the Index Value of the HCA on page 93 Table 19 SRIOV Registry Keys Key Name Key Type Values Description SriovEnable REG DWORD 0 RoCE Configures the RDMA or SR IOV default mode 2 SR IOV Note RDMA is not supported in SR IOV mode Mellanox Technologies 112 Rev 5 10 Table 19 SRIOV Registry Keys Key Name Key Type Values Description SriovPortMode REG DWORD 0O auto portl Configures the number of VFs to be default enabled by the bus driver to each port e auto port2 Note In auto_portX mode port X will e 2 manual have the number of VFs according to the burnt value in the device and the other port will have no SR IOV and it will support native Ethernet 1 e no RoCE Setting this parameter to Manual will configure the number of VFs for each port according to the reg istry key MaxVFPortX Note The number of VFs can be con figured both on a Mellanox bus driver l
141. nfirm Stop VM Name mtlae15 006 Force Confirm Connect VM to vSwitch maybe you have to switch off VM before doing manual does also work Connect VMNetworkAdapter VMName mtlael4 005 SwitchName VSwMLNX Add VMNetworkAdapter VMName mtlael5 005 SwitchName VSwMLNX StaticMacAddress 00155D730100 Add VMNetworkAdapter VMName mtlae15 006 SwitchName VSwMLNX StaticMacAddress 00155D730101 Mellanox Technologies 164 Rev 5 10 W o The commands from Step 2 4 are not persistent Its suggested to create script is run ning after each OS reboot Step 2 Configure a Subnet Locator and Route records on each Hyper V Host Host 1 and Host 2 mtlael4 amp mtlae15 New NetVirtualizationLookupRecord CustomerAddress 172 16 14 5 ProviderAddress 192 168 20 114 VirtualSubnetID 5001 MACAddress 00155D720100 Rule TranslationMethodEncap New NetVirtualizationLookupRecord CustomerAddress 172 16 14 6 ProviderAddress 192 168 20 114 VirtualSubnetID 5001 MACAddress 00155D720101 Rule TranslationMethodEncap New NetVirtualizationLookupRecord CustomerAddress 172 16 15 5 ProviderAddress 192 168 20 115 VirtualSubnetID 500 l pa ct ct MACAddress 00155D730100 Rule TranslationMethodEncap New NetVirtualizationLookupRecord CustomerAddress 172 16 15 6 ProviderAddress 192 168 20 115 VirtualSubnetID 5001 MACAddress 00155D730101 Rule TranslationMethodEncap Add customer route New NetVirtualizationCustomerRo
142. ng This section applies to the ibdiagpath tool only A tool command may require defining the destination device or port to which it applies The following addressing modes can be used to define the IB ports Using a Directed Route to the destination Tool option d This option defines a directed route of output port numbers from the local port to the destina tion e Using port LIDs Tool option I In this mode the source and destination ports are defined by means of their LIDs If the fabric is configured to allow multiple LIDs per port then using any of them is valid for defining a port Using port names defined in the topology file Tool option n This option refers to the source and destination ports by the names defined in the topology file Therefore this option is relevant only if a topology file is specified to the tool In this mode the tool uses the names to extract the port LIDs from the matched topology then the tool oper ates as in the l option ST For further information on the following tools please refer to the tool s man page a Table 29 Diagnostic Utilities Utility Description ibdiagnet Scans the fabric using directed route packets and extracts all the avail able information regarding its connectivity and devices It s only sup ported in Windows Server 2012 and above or Windows Client 8 1 and above ibportstate Enables querying the logical link and physi
143. nism the adapters can overcome any TCP IP issues and eliminate the risk of data loss Table 15 Flow Control Options Value Name Default Value Description FlowControl 0 When Rx Pause is enabled the receiving adapter generates a flow control frame when its received queue reaches a pre defined limit The flow control frame is sent to the sending adapter When TX Pause is enabled the sending adapter pauses the transmission if it receives a flow control frame from a link partner The valid values are 0 Flow control is disabled e 1 Tx Flow control is Enabled 2 Rx Flow control is enabled 3 Rx amp Tx Flow control is enabled PerPriRxPause 0 When Per Priority Rx Pause is configured the receiving adapter generates a flow control frame when its priority received queue reaches a pre defined limit The flow control frame is sent to the sending adapter Notes This registry value is not exposed via the UI e RxPause and PerPriRxPause are mutual exclusive i e at most only one of them can be set PerPriTxPause 0 When Per Priority TX Pause is configured the sending adapter pauses the transmission of a specific priority if it receives a flow control frame from a link partner Notes This registry value is not exposed via the UI TxPause and PerPriTxPause are mutual exclusive i e at most only one of them can be set 3 6 6 2 VMQ Options This section describes the registry k
144. nistration tools might be displayed on this page because they have been selected automatically If you do not want to install these optional features click Previous to clear their check boxes Hyper V Remote Server Administration Tools Role Administration Tools Hyper V Management Tools Hyper V GUI Management Tools Hyper V Module for Windows PowerShell Export configuration settings Specify an alternate source path lt Previous ext Install Cancel Step 6 Reboot the system 3 5 4 3 3 Verifying SR IOV Support within the Host Operating System SR IOV Ethernet Only gt To verify that the system is properly configured for SR IOV Step 1 Go to Start gt Windows Powershell Step 2 Run the following PowerShell commands PS Get VmHost lovSupport PS Get VmHost lovSupportReasons In case that SR IOV is supported by the OS the output in the PowerShell is as in the figure below Figure 6 Operating System Supports SR IOV L5 Windows Pow T right C 2 Microsoft Corporation All rights res Administrator Get VmHost IovSupport Administrator Get VmHost IovSupportReasons Administrator Note If BIOS was updated according to BIOS vendor instructions and you see the mes sage displayed in the figure below update the registry configuration as described in the Get VmHost lovSupportReasons message Mellanox Technologies 80 Rev 5 10 Figure 7 SR IOV Support dmi
145. nistrator gt Get VM igned to Step 3 Reboot Step 4 Verify the system is configured correctly for SR IOV as described in Steps 1 2 3 5 4 3 4 Creating a Virtual Machine SR IOV Ethernet Only To create a virtual machine Step 1 Go to Server Manager gt Tools gt Hyper V Manager Step 2 Go to New gt Virtual Machine and set the following Name lt name gt Startup memory 4096 MB Connection Not Connected Figure 8 Hyper V Manager File Action View Help nz B 33 Hyper V Manager l 1 Actions Ba LAB NALABJSSSEE Virtual Machines LAB N4LABISSSEE New Virtual Machine No virtual machines were found on this serve Le Import Virtual Machine Hard Disk j Hyper V Settings Floppy Disk ZZ Virtual Switch Manager wl Virtual SAN Manager g Edit Disk G Inspect Disk Stop Service a Name State CPU Usage Assigr No virtual machine selected Z Remove Server 1 Q Refresh View H Help No item selected Displays the New Virtual Machine Wizard Step 3 Connect the virtual hard disk in the New Virtual Machine Wizard Step 4 Go to Connect Virtual Hard Disk gt Use an existing virtual hard disk Step 5 Select the location of the vhd file Mellanox Technologies 81 Rev 5 10 Figure 9 Connect Virtual Hard Disk Connect Virtual Hard Disk Before You Begin 4 virtual mach
146. nologies 116 Rev 5 10 3 7 2 1 Test Running In order to run the test follow the steps below 1 Connect two servers to Mellanox adapters 2 Verify ping between the two servers 3 Configure the ROCE version to be a RoCE V1 over IP i Linux side V1 ii Win side V1 25 b RoCE V2 1 Linux side V2 ii Win side V2 iii Verify that ROCE udp port is the same on the two servers For the registry key refer to Table 20 RoCE Options on page 114 4 Select the server side and the client side and run accordingly a Server nd rping rping s v V d S size C count a addr p port b Client nd rping rping c v v d S size C count a addr p port Executable Options Letter Usage S Server side P Persistent server mode allowing multiple connections C Client side a Address p Port Debug Extensions Letter Usage V Displays ping data to stdout every test cycle V Validates ping data every test cycle d Shows debug prints to stdout S Indicates ping data size must be lt 64 1024 C Indicates the number of ping cycles to perform Example gt Linux server rping v s a lt IP address gt C 10 gt Windows client nd rping v c a same IP as above C 10 Mellanox Technologies 117 Rev 5 10 3 8 Performance Tuning and Counters For further information on WinOF performance please refer to the Performance Tuning Guide for Mellanox Network Adapters This sectio
147. nox Adapter Diagnostics Counters Proprietary Mellanox adapter diagnostics counter set consists of the NIC diagnostics These counters collect information from ConnectX 3 and ConnectX 3 Pro firmware flows Table 25 Mellanox Adapter Diagnostics Counters Mellanox Adapter Diagnostics Counters Description Requester length errors Number of local length errors when the local machine generates outbound traffic Responder length errors Number of local length errors when the local machine receives inbound traffic Requester QP operation errors Number of local QP operation errors when the local machine gen erates outbound traffic Responder QP operation errors Number of local QP operation errors when the local machine receives inbound traffic Mellanox Technologies 131 Rev 5 10 Table 25 Mellanox Adapter Diagnostics Counters Mellanox Adapter Diagnostics Description Counters Requester protection errors Number of local protection errors when the local machine gener ates outbound traffic Responder protection errors Number of local protection errors when the local machine receives inbound traffic Requester CQE errors Number of local CQE with errors when the local machine gener ates outbound traffic Responder CQE errors Number of local CQE with errors when the local machine receives inbound traffic Requester Inv
148. o install Windows _ Name Total size Freespace Type a Drive 0 Partition 4 Win2012R2DC9600 63 5 GB 541GB Logical c Drive 0 Unallocated Space 1 0 MB 10MB Extended wy Drive 0 Partition 5 Win2012DC 63 5 GB 48 6GB Logical cw Drive 0 Unallocated Space 297 6 GB 297 6 GB Extended p Refresh Drive options advanced Load driver Installation process will start once completing all the required steps in the Wizard the Client will reboot and will boot from the iSCSI target Mellanox Technologies 27 Rev 5 10 3 Features Overview and Configuration Once you have installed Mellanox WinOF VPI package you can perform various modifications to your driver to make it suitable for your system s needs Changes made to the Windows registry happen immediately and no backup is automati Wm cally made Do not edit the Windows registry unless you are confident regarding the changes 3 1 Ethernet Network 3 1 4 Port Configuration 3 1 1 1 Auto Sensing Auto Sensing enables the NIC to automatically sense the link type InfiniBand or Ethernet based on the cable connected to the port and load the appropriate driver stack InfiniBand or Ethernet Auto Sensing is performed only when rebooting the machine or after disabling enabling the adapter cards from the Device Manager Hence if you replace cables during the runtime the NIC will not perform Auto Sensing For further information on how to configure it please r
149. o uti lize the multiple CPUs in a multi core system for receiving incoming packets and steering them to their destination RSS can significantly improve the number of transactions per second the number of connections per second and the network throughput This parameter can be set to one of two values e 1 enable default Sets RSS Mode e 0 disable The hardware is configured once to use the Toeplitz hash function and the indirection table is never changed Note the I O Acceleration Technology IOAT is not functional in this mode Mellanox Technologies 100 Rev 5 10 Table 13 Performance Registry Keys Value Name Default Value Description TxHashDisrtibution 3 Sets the algorithm which is used to distribute the send packets on different send rings The adapter uses 3 methods e 1 Size In this method only 2 Tx rings are used The send pack ets are distributed based on the packet size Packets that are smaller than 128 bytes use one ring while the larger packets use the other ring e 2 Hash In this method the adapter calculates a hash value based on the destination IP the TCP source and the destination port If the packet type is not IP the packet uses ring number 0 e 3 Hash and size In this method for each hash value 2 rings are used one for small packets and another one for larger packets The valid values are e 1 size e 2 hash 3 hash and size Note This registry value is no
150. oad Broadcom NetXtreme Gigabit Ethernet 4 rine ier en x Large Send Offload 4 S Mellanox ConnectX 3 Pro Ethernet Adapter Large Send Offload V2 IPv6 Mellanox ConnectX 3 Pro Ethernet Adapter 2 Locally Administered Address Mellanox ConnectX 3 Pro Virtual Ethernet Adapter rere rerl n Mellanox ConnectX 3 Pro Virtual Ethernet Adapter 2 ewar due Functionally Microsoft Network Adapter Multiplexor Driver Number of Polls on Receive Other devices Prefered NUMA node jg Base System Device jy Base System Device jy Base System Device Qa Base System Device jg Base System Device jg Base System Device fm Race Sustem Device K Mellanox Technologies 46 Rev 5 10 gt To install the Data Center Bridging using the Server Manager Step 1 Open the Server Manager Step 2 Select Add Roles and Features Step3 Click Next Step 4 Select Features on the left panel Step 5 Check the Data Center Bridging checkbox Step 6 Click Install To install the Data Center Bridging using PowerShell Step 1 Enable Data Center Bridging DCB PS Install WindowsFeature Data Center Bridging To configure QoS on the host G The procedure below is not saved after you reboot your system Hence we recom P mend you create a script using the steps below and run it on the startup of the local 3 machine Please see the procedure below on how to add the script to the local machin
151. of interfaces inside a network adapter or a number of physical network adapters into a virtual interface that provides the fault tolerance function The fault tolerance teaming type is the only mode supported in adapter teaming The non active inter faces in a team are in a standby mode and will take over the network traffic in the event of a link failure in the active interface Only one interface is active at any given time Note For InfiniBand the only teaming mode supported is failover 3 2 8 3 Creating a Team Teaming is used to take over packet indications and information requests if the primary network interface fails The following steps describe the process of creating a team Step 1 Display the Device Manager File Action View Help 9s B sz amp p I Computer b Disk drives p KS Display adapters b 22 DVD CD ROM drives b DS Human Interface Devices b ca IDE ATA ATAPI controllers 4 IEEE 1394 host controllers Texas Instruments 1394 OHCI Compliant Host Controller p amp Keyboards b n Mice and other pointing devices gt amp Monitors 4 S Network adapters Broadcom BCMS5709C NetXtreme II GigE NDIS VBD Client 36 S Broadcom BCM5709C NetXtreme Il GigE NDIS VBD Client 37 Broadcom BCM5709C NetXtreme II GigE NDIS VBD Client 38 S Broadcom BCM5709C NetXtreme Il GigE NDIS VBD Client 39 Mellanox ConnectX 3 IPoIB Adapter S Mellanox ConnectX 3 IPoIB Adapter 2 Mellanox ConnectX 3
152. of the product Enter the network location or click Change to browse to a location Click Install to create a server image of MLNX VPT at the specified network location or click Cancel to exit the wizard Network location rr MM Change InstallShield Mellanox Technologies 22 Rev 5 10 Step 6 To complete the extraction click Finish InstallShield Wizard Completed The InstallShield Wizard has successfully installed MLNX VPT Click Finish to exit the wizard lt Back Cancel 2 6 Uninstalling Mellanox WinOF Driver 2 6 1 Attended Uninstallation gt To uninstall MLNX WinOF on a single node 1 Click Start gt Control Panel gt Programs and Features gt MLNX VPI Uninstall NOTE This requires elevated administrator privileges see Section 1 1 Supplied Pack ages on page 12 for details 2 Double click the exe and follow the instructions of the install wizard 3 Click Start gt All Programs gt Mellanox Technologies gt MLNX_WinOF gt Uninstall MLNX WinOF 2 6 2 Unattended Uninstallation If no reboot options are specified the installer restarts the computer whenever necessary without displaying any prompt or warning to the user Use the norestart or forcerestart standard command line options to control reboots hall gt To uninstall MLNX_WinOF in unattended mode Step 1 Open a CMD console Windows Server 2012 2012 R2 Click Start gt Task Manager gt Fil
153. on and de encapsulation of the network traffic Firewalls that block GRE tunnels between sites have to be configured to support forwarding GRE IP Protocol 47 tunnel traffic For further details on configuring NVGRE please refer to Appendix A NVGRE Configuration Scripts Examples on page 163 Figure 5 NVGRE Packet Structure rl Bench PE TCP user dat PA NVGRE is only supported in VMQ mode and not in SR IOV mode p Mellanox Technologies 72 Rev 5 10 3 5 3 3 Enabling Disabling NVGRE Offloading To leverage NVGRE to virtualize heavy network IO workloads the Mellanox ConnectX 3 Pro network NIC provides hardware support for GRE off load within the network NICs by default To enable disable NVGRE off loading Step 1 Open the Device Manager Step2 Go to the Network adapters Step3 Right click Properties on Mellanox ConnectX 3 Pro Ethernet Adapter card Step 4 Go to Advanced tab Step 5 Choose the Encapsulate Task Offload option Step 6 Set one of the following values Enable GRE off loading is Enabled by default Disabled When disabled the Hyper V host will still be able to transfer NVGRE traffic but TCP and inner IP checksums will be calculated by software that significant reduces performance 3 5 3 3 1 Configuring the NVGRE using PowerShell Hyper V Network Virtualization policies can be centrally configured using PowerShell 3 0 and PowerShell Remoting Step 1 Windows Server 201
154. onMethod 0 RecvCompletionMethod 2 e ReceiveBuffers 1024 n Operating Systems support NDIS6 3 RssProfile 4 Additionally this option chooses the best processors to assign to DefaultRecvRingProcessor TxForwardingProcessor n Operating Systems support NDIS6 2 RssBaseProcNumber MaxRssProcessors n Operating Systems support NDIS6 3 NumRSSQueues RssMaxProcNumber f Forwarding traffic scenario This option must be followed by two connection names The tuning in this case is code pendent This option automatically sets SendCompletionMethod 1 RecvCompletionMethod 0 e ReceiveBuffers 4096 e UseRSSForRawIP 0 e UseRSSForUDP 0 Additionally this option chooses the best processors to assign to DefaultRecvRingProcessor TxInterruptProcessor TxForwardingProcessor n Operating Systems support NDIS6 2 RssBaseProcNumber MaxRssProcessors n Operating Systems support NDIS6 3 NumRSSQueues RssMaxProcNumber Mellanox Technologies 123 Rev 5 10 Table 23 Performance Tuning Tool Application Options Flag Description m Manual configuration This option must be followed by one connection name This option assigns the provided base and number of CPUs to RssBaseProcNumber e MaxRssProcessors Additionally this option assigns the following with processors inside the range DefaultRecvRingProcessor TxInterruptProcessor r Restore default s
155. onfiguration You can get IP settings assigned automatically if your network supports this capability Otherwise you need to ask your network administrator for the appropriate IP settings Obtain an IP address automatically Use the following IP address Obtain DNS server address automatically Use the following DNS server addresses Preferred DNS server P Alternate DNS server Validate settings upon exit Step 5 Click OK Step 6 Close the Local Area Connection dialog Step 7 Verify the IP configuration by running ipconfig from a CMD console ipconfig Ethernet adapter Local Area Connection 4 Connection specific DNS Suffix i oles soe wo a oe 0 0 802 8 ddl Subnet iMask Pa m255925 500 Default Gateway s s s nosa Mellanox Technologies 31 Rev 5 10 3 1 3 56GbE Link Speed 3 1 3 1 System Requirements Mellanox ConnectX 3 and ConnectX 3 Pro cards Firmware version 2 31 5050 and above 3 1 3 2 Configuring 56GbE Link Speed Mellanox offers proprietary speed of 56GbE link speed over FDR systems To achieve this only the switch supporting this speed must be configured to enable it The NIC on the other hand auto detects this configuration automatically gt To achieve 56GbE link speed over SwitchX Based Switch System Make sure your switch supports 56GbE and that you have the relevant switch license instal
156. ontains TCP options The valid values are 0 disable 1 enable Note This registry key is not exposed to the user via the UI Mellanox Technologies 97 J Rev 5 10 Table 12 Off load Registry Keys Value Name Default Value Description LSOIpOptions eth 1 Enables its NIC to segment a large TCP packet whose IP IPoIB 1 header contains IP options The valid values are 0 disable e 1 enable Note This registry key is not exposed to the user via the UI PChecksumOffload eth 3 Specifies whether the device performs the calculation of IPv4 IPoIB 3 IPv4 checksums The valid values are e 0 disable 1 Tx Enable 2 Rx Enable 3 Tx and Rx enable TCPUDPChecksu eth 3 Specifies whether the device performs the calculation of mOffloadIPv4 IPoIB 3 TCP or UDP checksum over IPv4 The valid values are e 0 disable e 1 Tx Enable e 2 Rx Enable 3 Tx and Rx enable TCPUDPChecksu eth 3 Specifies whether the device performs the calculation of mOffloadIPv6 IPoIB 3 TCP or UDP checksum over IPv6 The valid values are e 0 disable e 1 Tx Enable e 2 Rx Enable 3 Tx and Rx enable ParentBusRegPath HKLM SYS TCP checksum off load IP IP TEM Cur rentControl Set Control Class 4d36 e97d e325 11ce bfcl1 08002be103 1830073 Mellanox Technologies 98 J Rev 5 10 3 6 5 Performance Registry Keys This group of regist
157. or each interface The number can be different for each interface This allows partitioning of CPUs across network adapters Note Restart the network adapter when you change this registry key HKLM SYSTEM CurrentControlSet Con trol Class 4d36e972 e325 11ce bfc1 08002be10318 lt nn gt NumaNodeID NUMA node affinitization HKLM SYSTEM CurrentControlSet Con trol Class 4d3 6e972 e325 11ce bfc1 08002be10318 lt nn gt RssBaseProcGroup Sets the RSS base processor group for systems with more than 64 processors Mellanox Technologies 58 J Rev 5 10 3 1 13 Ignore Frame Check Sequence FCS Errors Upon receiving packets these packets go through a checksum validation process for the FCS field If the validation fails the received packets are dropped When the FCS feature is enabled disabled by default the device does not validate the FCS field even if the field is invalid The registry key for enable disable is IgnoreFCS It is not recommended to ignore FCS as the field guarantees integrity of received Ethernet frames 3 2 InfiniBand Network 3 2 1 Port Configuration For more information on port configuration please refer to 3 1 1 Port Configuration on page 28 3 2 2 OpenSM Subnet Manager OpenSM v3 3 11 is an InfiniBand Subnet Manager In order to operate one host machine or more in the InfiniBand cluster at least one Subnet Manger is required in the fabric Please use th
158. or out of order received packets you can increase the number of received buffers The valid values are 256 up to 4096 TransmitBuffers eth 2048 The number of packets each ring sends Increasing this IPoIB 2048 value can enhance transmission performance but also consumes system memory The valid values are 256 up to 4096 SpeedDuplex 7 The Speed and Duplex settings that a device supports This registry key should not be changed and it can be used to query the device capability Mellanox ConnectX device is set to 7 meaning10Gbps and Full Duplex Note Default value should not be modified MaxNumOfMCList eth 128 The number of multicast addresses that are filtered by the IPoIB 128 NIC If the OS uses more multicast addresses than were defined it sets the port to multicast promiscuous and the multicast addresses are filtered by OS at protocol level The valid values are 64 up to 1024 Note This registry value is not exposed via the UI QOS eth 1 Enables the NDIS Quality of Service QoS The valid values are 1 enable 0 disable Note This keyword is only valid for ConnectX 3 when using Windows Server 2012 and above Mellanox Technologies 95 J Rev 5 10 Table 11 Basic Registry Keys Value Name EE Description RxIntModerationProfile eth 2 Enables the assignment of different interrupt moderation IPoIB 2 profiles for receive completions Interrupt moderation can have a gre
159. or remove the network adapter Boot from CD Virtual switch Bil Memory Not connected ka 4096 MB D Processor MAND 1 Virtual processor Enable virtual LAN identification S E IDE Controller 0 Cx Hard Drive DC x64 fre 920 E IDE Controller 1 EA DVD Drive None Bandwidth Management BE SCSI Controller W Network Adapter Not connected Y Network Adapter Internal Virtual Switch Minimum E 0 M Enable bandwidth management 0 To remove the network adapter from this virtual machine click Remove Mone H Diskette Drive Remove None 0 Use a legacy network adapter instead of this network adapter to perform a network based installation of the quest operating system or when integration i Name services are not installed in the quest operating system vmi amp Management Integration Services Ez Step 6 Enable the SR IOV for Mellanox VMNIC 1 Open VM settings Wizard 2 Open the Network Adapter and choose Hardware Acceleration 3 Tick the Enable SR IOV option 4 Click OK Mellanox Technologies 90 Rev 5 10 Figure 13 Enable SR IOV on VMNIC vmi v Q amp Hardware Hardware Acceleration 1 Add Hardware JK BIOS Specify networking tasks that can be offloaded to a physical network adapter Boot trom c9 Virtual machine queue nun Virtual machine queue VMQ requires a physical network adapter that supports indio this Feature n
160. oti Me ae Pea ERO CSS 26 Chapter3 Features Overview and Configuration sss sss s s x x e x lt x x e x lt x x e x e e 28 3 1 12 eT OT ST TTT 28 31 1 Port Configurations ose 5 oe eb DePP tele ge ead paddles eeuloy S YdepbXa 28 3 1 2 Assigning Port IP After Installation 0 20 eee eee 30 3 113 56GbE Link Sp ed iuro fae ome Ate Re oe en E 32 3 1 4 RDMA over Converged Ethernet RoCE 00 00 c eee eee eee 33 3 1 5 Teaming and VLAN aaae soora RRR R RRR RR RRR e 39 3 16 Header Data Splits sow ee eee eR TA eh ERASE 45 3 1 7 Ports TX Arbitration ees co cic 0 0 dR RUE RR VAR 45 3 1 8 Configuring Quality of Service QoS 0 cece ene 46 3 1 9 Configuring the Ethernet Driver esee ee eee 50 3 1 10 Differentiated Services Code Point DSCP 0 0 0 0 cece ee 51 Mellanox Technologies 3 J Rev 5 10 3 111 Lossless TCDP s ut Stack tet whites ea be tee e Sede 54 3 1 12 Receive Side Scaling RSS 58 3 1 13 Ignore Frame Check Sequence FCS Errors 00000 e eee eee ee 59 32 InfiniBand Network 2 2 00 eee e eiaa 59 3 2 1 Port Configuration s ood tain EM Me gs es pin Sa e aS 59 3 2 2 OpenSM Subnet Manager sisse 59 3 2 3 Modifying IPoIB Configuration 0 eee eh 60 3 2 4 Displaying Adapter Related Information nuuanu cece eee eee 60 3 2 5 Assigning Port IP After Installation 2 0 eee eee ee eee 61 3 2 6 Receiv
161. ow Each monitor run will increment the age of all non VMQ local endpoints When LocalEndpointMaxAge is reached the endpoint will be removed The valid values are 1 up to 20 Note This registry value is not exposed via the UI LocalEndpoint 60000 The time interval in ms between each 2 runs of the local end point MonitorInterval DB monitor for aging unused local endpoints Each run will incre ment the age of all non VMQ local endpoints The valid values are 10000 up to 1200000 Note This registry value is not exposed via the UI EnableQPR 0 Enables query path record The valid values are e 0 disable 1 enable McastQueryRe 2 The number of runs of the multicast monitor which runs every 30 sec sponselnterval onds allowed until a response to the IGMP MLD queries is received If after this period a response is not received the driver leaves the mul ticast group The valid values are 1 up to 10 Note This registry value is not exposed via the UI Mellanox Technologies 110 Rev 5 10 3 6 8 General Registry Values This section provides information on general registry keys that affect Mellanox driver operation Table 18 General Registry Values Value Name Detautt Description Value MaxNumRssCpus 4 The number of CPUs that participate in the RSS The Mellanox adapter can open multiple receive rings each ring can be processed by a different processor When RSS is disabled
162. packet The valid values are 0 disable e 1 enable Mellanox Technologies 99 J Rev 5 10 Table 13 Performance Registry Keys Value Name Default Value Description RxIntModeration eth 2 Sets the rate at which the controller moderates or IPoIB 2 delays the generation of interrupts making it possible to optimize network throughput and CPU utilization The default setting Adaptive adjusts the interrupt rates dynamically depending on traffic type and net work usage Choosing a different setting may improve network and system performance in certain configura tions The valid values are e 1 static e 2 adaptive The interrupt moderation count and time are configured dynamically based on traffic types and rate pkt_rate_low eth 150000 Sets the packet rate below which the traffic is consid IPoIB 150000 ered as latency traffic when using adaptive interrupt moderation The valid values are 100 up to 1000000 Note This registry value is not exposed via the UI pkt_rate_high eth 170000 Sets the packet rate above which the traffic is consid IPoIB 170000 ered as bandwidth traffic when using adaptive inter rupt moderation The valid values are 100 up to 1000000 Note This registry value is not exposed via the UI RSS eth 1 Sets the driver to use Receive Side Scaling RSS IPoIB 1 mode to improve the performance of handling incom ing packets This mode allows the adapter port t
163. packets will gen erate an interrupt that reschedules the polling mecha nism The valid values are 0 up to 200000 Note This registry value is not exposed via the UI AverageFactor eth 16 IPoIB 16 The weight of the last polling in the decision whether to continue the polling or give up when using polling completion method for receiving The valid values are 0 up to 256 Note This registry value is not exposed via the UI AveragePollThreshold eth 10 IPoIB 10 The average threshold polling number when using polling completion method for receiving If the aver age number is higher than this value the adapter con tinues to poll The valid values are 0 up to 1000 Note This registry value is not exposed via the UI ThisPollThreshold eth 100 IPoIB 100 The threshold number of the last polling cycle when using polling completion method for receiving If the number of packets received in the last polling cycle is higher than this value the adapter continues to poll The valid values are 0 up to 1000 Note This registry value is not exposed via the UI Mellanox Technologies 102 Rev 5 10 Table 13 Performance Registry Keys Value Name Default Value Description HeaderDataSplit eth 0 Enables the driver to use header data split In this IPoIB 0 mode the adapter uses two buffers to receive the packet The first buffer holds the header while the sec
164. physical layer preventing them from being deliverable Packets Received with Frame Length Error Shows the number of inbound packets that contained error where the frame has length error Packets received with frame length error are a subset of packets received errors Packets Received with Symbol Error Shows the number of inbound packets that contained symbol error or an invalid block Packets received with symbol error are a sub set of packets received errors Packets Received with Bad CRC Error Shows the number of inbound packets that failed the CRC check Packets received with bad CRC error are a subset of packets received errors Packets Received Discarded Shows the number of inbound packets that were chosen to be dis carded in the physical layer even though no errors had been detected to prevent their being deliverable One possible reason for discarding such a packet could be a buffer overflow a Those error discard counters are related to layer 2 issues such as CRC length and type errors There is a possi bility of an error discard in the higher interface level For example a packet can be discarded for the lack ofa receive buffer To see the sum of all error discard packets read the Windows Network Interface Counters Note that for IPoIB the Mellanox counters are for IB layer 2 issues only and Windows Network Interface counters are for interface level issues 3 8 4 1 2 Proprietary Mella
165. port auto port2 Note In auto portX mode port X will have e 2 manual the number of VFs according to the burnt value in the device and the other port will have no SR IOV and it will support native Ethernet i e no RoCE Setting this parame ter to Manual will configure the number of VFs for each port according to the registry key MaxVFPortX Note The number of VFs can be configured both on a Mellanox bus driver level and Net work Interface level i e using Set Net AdapterSriov Powershell cmdlet The number of VFs actually available to the Net work Interface is the minimum value between mellanox bus driver configuration and Network Interface configuration For example if 8 VFs support was burnt in firm ware SriovPortMode is auto_portl and Network Interface was allowed 32 VFs using SetNetAdapterSriov Powershell cmdlet the actual number of VFs available to Network Interface will be 8 Mellanox Technologies 86 J Rev 5 10 Parameter Name Values Description MaxVFPort1 16 default MaxVFPort lt i gt specifies the maximum MaxVFPort2 number of VFs that are allowed per port This is the number of VFs the bus driver will open when working in manual mode Note If the total number of VFs requested is larger than the number of VFs burnt in firm ware each port X 1 2 will have the number of VFs according to the following formula SriovPortXNumVFs SriovPortl Num VFs SriovPort2NumVFs number
166. proper mask the driver uses the default one For more details please refer to http mellanox com related docs prod software guid2mac checker user manual txt Note This registry value is not exposed via the UI Medium 0 Controls the way the interface is exposed to an upper level By default Type802 3 the IPoIB is exposed as an InfiniBand interface The user can change it and cause the interface to be an Ethernet interface by setting this regis try key The valid values are e 0 the interface is exposed as NdisPhysicalMediumInfiniband e the interface is exposed as NdisPhysicalMedium802 3 Note This registry value is not exposed via the UI SaTimeout 1000 The time in milliseconds before retransmitting an SA query request The valid values are 250 up to 60000 SaRetries 10 The number of times to retry an SA query request The valid values are 1 up to 64 McastIgmpMld 3 The number of runs of the multicast monitor before a general query is GeneralQueryIn initiated This monitor runs every 30 seconds terval The valid values are 1 up to 10 Mellanox Technologies 109 Rev 5 10 Table 17 IPoIB Registry Keys Value Name Default Description Value LocalEndpoint 5 The maximum number of runs of the local end point DB monitor MaxAge before an unused local endpoint is removed The endpoint age is zeroed when it is used as a source in the send flow or a destination in the receive fl
167. r between 1 and 4095 VLAN Priority The priority is a number between 0 and 7 0 lowest 7 highest This dialog allows you to configure Virtual LANs VLANs for the adapter NOTE After creating a new VLAN the adapter associated with the VLAN may experience a momentary loss of connectivity NOTE After configuring a VLAN the adapter associated with the VLAN may experience a momentary loss of connectivity The list view has four columns VLAN Name Displays the assigned VLAN name OK Cancel 5 The newly created VLAN interface will appear as can be seen below mal Device Manager BEE Fie Action View Help RSH E B li mtx w2012r202 Y Computer El Disk drives Y K Display adapters 8 Fy IEEE 1394 Bus host controllers R Monitors Ea Network adapters X Broadcom Netxtreme Gigabit Ethernet X Broadcom Netxtreme Gigabit Ethernet 2 X Broadcom Netxtreme Gigabit Ethernet 3 X Broadcom Netxtreme Gigabit Ethernet 4 X Mellanox Connecti 3 Ethernet Adapter X Mellanox ConnectX 3 Ethernet Adapter 2 EEF S i Base System Device jg Base System Device jg Base System Device jg Base System Device fp Base System Device jg Base System Device fp Base System Device dn Base System Device jg Base System Device J Base System Device jg Base System Device jm Base System Device jm Base System Device ss Base Svstem Device isl 3 1 5 2 3 Server Configuring a Port to
168. ransmit Control Blocks St troll Tx Interrupt Moderation Profile b amp Storage controllers Virtual Machine Queues a jE System devices VLAN ID Wi ACPI Fixed Feature Butt VMO Lookahead Split Dile n ri VMO VLAN Filtering Y 3 1 6 Header Data Split The header data split feature improves network performance by splitting the headers and data in received Ethernet frames into separate buffers The feature is disabled by default and can be enabled in the Advanced tab Performance Options from the Properties window For further information please refer to the MSDN library http msdn microsoft com en us library windows hardware ff553723 v VS 85 aspx 3 1 7 Ports TX Arbitration On a setup with a dual port NIC with both ports at link speed of 40GbE each individual port can achieve maximum line rate When both ports are running simultaneously in a high throughput scenario the total throughput is bottlenecked by the PCIe bus and in this case each port may not achieve its maximum of 40GDbE Ports TX Arbitration ensures bandwidth precedence is given to one of the ports on a dual port NIC enabling the preferred port to achieve the maximum throughput and the other port taking up the rest of the remaining bandwidth To configure Ports TX Arbitration Step 1 Open the Device Manager Step2 Go to the Network adapters Step3 Right click Properties on Mellanox ConnectX 3 Ethernet Adapter card Step 4 Go to Advanced tab
169. ride the default port before upgrade This can be done by setting the roce udp dport parameter to the desired port in the registry so that this port is used by both older and newer versions 3 1 4 4 Configuring SwitchX Based Switch System To enable RoCE the SwitchX should be configured as follows Ports facing the host should be configured as access ports and either use global pause or Port Control Protocol PCP for priority flow control Ports facing the network should be configured as trunk ports and use Port Control Pro tocol PCP for priority flow control For further information on how to configure SwitchX please refer to SwitchX User Manual 3 1 4 5 Configuring Arista Switch Step 1 Set the ports that face the hosts as trunk config interface et10 config if Et10 switchport mode trunk Step 2 Set VID allowed on trunk port to match the host VID config if Et10 switchport trunk allowed vlan 100 Step 3 Set the ports that face the network as trunk config interface et20 config if Et20 switchport mode trunk Step 4 Assign the relevant ports to LAG config interface et10 config if Et10 dcbx mode ieee config if Et10 speed forced 40gfull config if Et10 channel group 11 mode active Step 5 Enable PFC on ports that face the network config interface et20 config if Et20 load interval 5 config if Et20 speed forced 40gfull config if Et20 switchport trunk native vlan tag config if Et2
170. ring Quality of Service QoS on page 46 Global Pause Flow Control gt To use Global Pause Flow Control mode disable QoS and Priority PS Disable NetQosFlowControl PS Disable NetAdapterQos interface name To confirm flow control is enabled in adapter parameters Device manager gt Network adapters gt Mellanox ConnectX 3 Ethernet Adapter gt Properties gt Advanced tab Device Manager ei es Action View Help e9 m E H m PRs 4 chcapk 04 p M Computer b Disk drives Details Events Power Management p MY Display adapters General Advanced Information Performance Driver IDE ATA ATAPI controll ed i The following properties are available for this network adapter Click b amp Keyboards the property you want to change on the left and then select its value b n Mice and other pointing devices on the right gt Ki Monitors Property Value Rx amp Tx Enabled Y 4 Network adapters KX Broadcom NetXtreme Gigabit Ethernet Xy Broadcom NetXtreme Gigabit Ethernet 2 ariel rola Xy Broadcom NetXtreme Gigabit Ethernet 3 IPV4 Checksum Offload S Broadcom NetXtreme Gigabit Ethernet 4 par m x rge a vi S Mellanox ConnectX 3 Pro Ethernet Adapter Large Send Offload V2 IPv6 La Mellanox ConnectX 3 Pro Ethernet Adapter 2 Locally Administered Address S Mellanox ConnectX 3 Pro Virtual Ethernet Adapter kere a eea S Mellanox ConnectX 3 Pro Virtual Ethernet Adapter
171. rmware version or higher and to restart your computer Mellanox Technologies 158 Rev 5 10 Mellanox ConnectX EN 10Gbit Ethernet Adapter lt X gt device detected that the link con nected to port lt Y gt is up and has initiated normal operation Mellanox ConnectX EN 10Gbit Ethernet Adapter lt X gt device detected that the link con nected to port lt Y gt is down This can occur if the physical link is disconnected or dam aged or if the other end port is down Mismatch in the configurations between the two ports may affect the performance When Using MSI X both ports should use the same RSS mode To fix the problem configure the RSS mode of both ports to be the same in the driver GUI e Mellanox ConnectX EN 10Gbit Ethernet Adapter X device failed to create enough MSI X vectors The Network interface will not use MSI X interrupts This may affects the performance To fix the problem configure the number of MSI X vectors in the registry to be at least lt Y gt 5 7 Extracting WPP Traces WinOF Mellanox driver automatically dumps trace messages that can be used by the driver developers for debugging issues that have recently occurred on the machine The default location for the trace file is SystemRoot system32 LogFiles MInx Mellanox System etl The automatic trace session is called Mellanox Kernel In order to view the session run the following command logman query Melloanox Kernel ets In order to stop th
172. robably with one or more of the following Old Firmware version Misconfigured Flow control Global pause or PFC is configured wrong on the hosts routers andswitches See Section 3 1 4 RDMA over Converged Ethernet RoCE on page 33 e CPU power options are not set to Maximum Performance Mellanox Technologies 156 Rev 5 10 5 5 Virtualization Related Troubleshooting Table 38 Virtualization Related Issues Issue Cause Solution Mellanox driver fails to load a host machine in SR IOV environment and appears with yellow bang in Device Man ager The device may not have been able to find enough free resources that it can use Code 12 1 Boot to BIOS and disable SR IOV 2 Burn Firmware with lower number of VFs 3 Re enable SR IOV in BIOS For more information please contact Mellanox support Running Windows server 2008 R2 and above as VM over ESX with Mellanox adpter cards connected as Direct pass through fails to power on ConnectX adapter network cards might be trying to use too many MSI X vectors 1 Go to the vSphere Web Client 2 Right click the virtual machine and select Edit Settings 3 Click the Options tab and expand Advanced 4 Click Edit Configuration 5 Click Add Row 6 Add the parameter to the new row In the Name column add pciPassth ru0 maxMSIXvectors Inthe Value column add 31 7 Click OK and click OK again For further details please refer to
173. ry keys configures parameters that can improve adapter performance Table 13 Performance Registry Keys Value Name Default Value Description RecvCompletionMethod eth 1 Sets the completion methods of the receive packets IPoIB 1 and it affects network throughput and CPU utilization The supported methods are Polling increases the CPU utilization because the sys tem polls the received rings for incoming packets how ever it may increase the network bandwidth since the incoming packet is handled faster Adaptive combines the interrupt and polling methods dynamically depending on traffic type and network usage The valid values are 0 polling e 1 adaptive InterruptModeration eth 1 Sets the rate at which the controller moderates or IPoIB 1 delays the generation of interrupts making it possible to optimize network throughput and CPU utilization When disabled the interrupt moderation of the system generates an interrupt when the packet is received In this mode the CPU utilization is increased at higher data rates because the system must handle a larger number of interrupts However the latency is decreased since that packet is processed more quickly When interrupt moderation is enabled the system accumulates interrupts and sends a single interrupt rather than a series of interrupts An interrupt is gener ated after receiving 5 packets or after the passing of 10 micro seconds from receiving the first
174. s the generation of interrupts making it pos sible to optimize network throughput and CPU utilization The default setting Adaptive adjusts the interrupt rates dynamically depending on the traffic type and network usage Choosing a differ ent setting may improve network and system performance in certain configurations Send completion method Sets the completion methods of the Send packets and it may affect network throughput and CPU utilization Interrupt Moderation TX Packet Count Number of packets that need to be sent before an interrupt is generated on the send side default 0 Interrupt Moderation TX Packet Time Maximum elapsed time in usec between the sending of a packet and the generation of an inter rupt even if the moderation count has not been reached default 0 e Offload Options Allows you to specify which TCP IP offload settings are handled by the adapter rather than the operating system Mellanox Technologies 128 Rev 5 10 Enabling offloading services increases transmission performance as the offload tasks are per formed by the adapter hardware rather than the operating system Thus freeing CPU resources to work on other tasks Pv4 Checksums Offload Enables the adapter to compute IPv4 checksum upon transmit and or receive instead of the CPU default Enabled TCP UDP Checksum Offload for IPv4 packets Enables the adapter to compute TCP UDP checksum over IPv4 packets upon transmit and or re
175. sabled Make sure the OpenSM using the partitions configuration the physical to virtual PKey table mapping were configured over the same physical port gt To change a configuration of an existing port Step 1 Disable the driver on the port affected by the change you would like to make or disable the bus driver with all the ports it carries through the device manger in Windows OS Step 2 If required configure the OpenSM to recognize the partition you would like to add or change For further details please refer to the section titled Partitions in Mellanox OFED for Linux User Manual Step 3 Ifthe change is on a VM over a Linux host map the physical PKey table to the virtual PKey table as required For further details please refer to the section titled Partitioning IPoIB Communication using PKeys in Mellanox OFED for Linux User Manual Step 4 Enable the drivers you disabled in Windows OS Mellanox Technologies 63 J Rev 5 10 3 2 8 Teaming Windows Server 2012 and above supports teaming as part of the operating system However unlike Mellanox WinOF VPI it does not support teaming for InfiniBand adapters In this release this feature is at beta level In particular IPv6 VMQ and configuration through p PowerShell are not supported 3 2 8 1 System Requirements IPoIB teaming is supported in all operating systems supported by WinOF 3 2 8 2 Adapter Teaming InfiniBand adapter teaming can group a set
176. selected descriptor ring has no free descriptors two modes for handling are available 3 1 11 3 Drop Mode In this mode a packet arriving to a descriptor ring with no free descriptors is dropped after veri fying that there are really no free descriptors This allows isolation of the host driver execution delays from the network as well as isolation between different SW entities sharing the adapter e g SR IOV VMs 3 1 11 4 Poll Mode In this mode a packet arriving to a descriptor ring with no free descriptors will patiently wait until a free descriptor is posted All processing for this packet and the following packets is halted while free descriptor status is polled This behavior will propagate the backpressure into the Rx buffer which will accumulate incoming packets When XOFF threshold is crossed Flow Control mechanisms mentioned earlier will stop the remote transmitters thus avoiding packets from being dropped Since this mode breaks the aforementioned isolation the adapter offers a mitigation mechanism that limits the amount of time a packet may wait for a free descriptor while halting all packet processing When the allowed time expires the adapter reverts to the Drop Mode behavior Mellanox Technologies 55 Rev 5 10 3 1 11 5 Default behavior By default the adapter works in Drop Mode The adapter reverts to this mode upon initialization restart 3 1 11 6 Known Limitations The feature is not available for
177. sing the SetQueuePairParams and SetCompletionQueueParams methods 3 Set background creation thresholds using the SetLimits method 4 Fill the pool using the Fill method 5 Create items IND2QueuePair and IND2CompletionQueue associated with it using the Create Objects method Statistics about the utilization of the resource pool are available to allow the program mer to select optimal thresholds 3 8 1 2 Registry Tuning The registry entries that may be added changed by this General Tuning procedure are Under HKEY LOCAL MACHINENSSYSTEM CurrentControlSetServicesVTcpip Parameters Disable TCP selective acks option for better cpu utilization SackOpts type REG DWORD value set to 0 Under HKEY LOCAL MACHINENSYSTEMCurrentControlSetServices AFD Parameters Mellanox Technologies 118 Rev 5 10 Enable fast datagram sending for UDP traffic FastSendDatagramThreshold type REG DWORD value set to 64K Under HKEY LOCAL _MACHINE SYSTEM CurrentControlSet Services Ndis Parameters Set RSS parameters RssBaseCpu type REG DWORD value set to 1 3 8 1 3 Enable RSS Enabling Receive Side Scaling RSS is performed by means of the following command netsh int tcp set global rss enabled 3 8 1 4 Tuning the IPoIB Network Adapter The IPoIB Network Adapter tuning can be performed either during installation by modifying some of Windows registries as explained in Section 3 8 1 2 Registry Tuning on page 118 or can be set pos
178. sion will reboot your machine Note One or more of your HCA adapters has an old firmware version We recommend upgrading to a newer firmware version to enable improved functionality and support driver s capabilities InstallShield Step 10 Configure your system for maximum performance by checking the maximum performance box Maximum Performance Check this box to configure your system for maximum performance v Check this box to configure your system for maximum performance Recommended Note This step requires you to reboot the machine at the end of the installation process InstallShield This step requires rebooting your machine at the end of the installation hl Step 11 Select a Complete Mellanox Technologies 16 Rev 5 10 Step 12 In order to complete the installation select Complete installation If you wish to customize the features you want installed follow Step a and on below Setup Type Choose the setup type that best suits your needs Please select a setup type Complete All program features will be installed Requires the most disk space Choose which program features you want installed and where they will be installed Recommended for advanced users InstallShield a Select the desired feature to install e OpenSM installs Windows OpenSM that is required to manage the subnet from a host OpenSM is part of the driver and installed automatica
179. sktop of your screen 2 Open a CMD console Click Task Manager gt File gt Run new task and enter CMD 3 Enter the following command gt echo PROCESSOR ARCHITECTURES On an x64 64 bit machine the output will be AMD64 Step 2 Go to the Mellanox WinOF web page at http www mellanox com gt Products gt InfiniBand VPI Drivers gt Windows SW Drivers Mellanox Technologies 13 J Rev 5 10 Step 3 Download the exe image according to the architecture of your machine see Step 1 and the operating system The name of the exe is in the following format MLNX_VPI_WinOF lt version gt _All_ lt OS gt _ lt arch gt exe 7 Installing the incorrect exe file is prohibited If you do so an error message will be displayed For example if you try to install a 64 bit exe on a 32 bit machine the wizard will display the follow 7 ing or a similar error message Windows Installer xj AN This installation package is not supported by this processor type Contact your product vendor OK 2 3 Installing Mellanox WinOF Driver WinOF supports ConnectX 3 and ConnectX 3 Pro adapter cards In case you have ConnectX 4 adapter card on your server you will need to install WinOF 2 driver Be For details on how to install WinOF 2 driver please refer to WinOF 2 User Manual This section provides instructions for two types of installation procedures Attended Installation An installation procedure that requires fr
180. sociated with both ports i e the value on all ports is identical Bad doorbells Number of bad DoorBells Responder duplicate request received pending firmware implementation Number of duplicate requests received when the local machine receives inbound traffic Requester time out received pending firmware implemen tation Number of time out received when the local machine generates outbound traffic Device detected stalled state The number of times the device has entered the stalled state per port Packet detected as stalled The number of events where device was stalled for longer than the watermark Mellanox Technologies 133 Rev 5 10 3 8 4 1 3 Proprietary Mellanox QoS Counters Proprietary Mellanox QoS counter set consists of flow statistics per VLAN priority Each QoS policy is associated with a priority The counter presents the priority s traffic pause statistic Table 26 Mellanox Qos Counters Mellanox Qos Counters Description Bytes Packets IN Bytes Received The number of bytes received that are covered by this pri ority The counted bytes include framing characters mod ulo 2 64 Bytes Received Sec The number of bytes received per second that are covered by this priority The counted bytes include framing char acters Packets Received The number of packets received that are covered by this priority modulo 2 64 Pac
181. stall Windows Server 2012 R2 Step 2 Install Hyper V role Go to Server Manager gt Manage gt Add Roles and Features and set the following Installation Type gt Role based or Feature based Installation e Server Selection gt Select a server fro the server pool Server Roles gt Hyper V see figures below DESTINATION SERVER Select server roles I dev w072 Select one or more roles to install on the selected server Roles Description Hyper V provides the services that you can use to create and manage Active Directory Domain Services virtual machines and their resources Active Directory Certificate Services Active Directory Federation Services Each virtual machine is a virtualized computer system that operates in an isolated execution environment This Active Directory Rights Management Services allows you to run multiple operating Active Directory Lightweight Directory Services Application Server systems simultaneously DHCP Server DNS Server Fax Server yper V Installed Network Policy and Access Services Print and Document Services Remote Access Remote Desktop Services lt Previous Insta Cancel 5 Add Roles and Features Wizard Bu8 x Select server ro um See x Add features that are required for Hyper V ion The following tools are required to manage
182. stances of the device with separate resources Mellanox adapters are capable of exposing in ConnectX 3 ConnectX 3 Pro adapter cards up to 126 virtual instances called Virtual Func tions VFs These virtual functions can then be provisioned separately Each VF can be seen as an addition device connected to the Physical Function It also shares resources with the Physical Function SR IOV is commonly used in conjunction with an SR IOV enabled hypervisor to provide virtual machines direct hardware access to network resources hence increasing its performance This guide demonstrates the setup and configuration of SR IOV using Mellanox ConnectX VPI adapter cards family SR IOV VF is a single port device r Mellanox device is a dual port single PCI function Virtual Functions pool belongs to both ports ST To define how the pool is divided between the two ports use the Powershell SriovPortiNum VFs command see Step 5 in Section 3 5 4 4 2 Enabling SR IOV in Mellanox WinOF Package Ethernet SR IOV Only on page 84 3 5 4 1 SR IOV Ethernet over Hyper V 3 5 4 1 1System Requirements A server and BIOS with SR IOV support BIOS settings might need to be updated to enable virtualization support and SR IOV support Hypervisor OS Windows Server 2012 R2 Virtual Machine VM OS The VM OS can be either Windows Server 2012 and above Mellanox ConnectX 3 ConnectX 3 Pro VPI Adapter Card family with SR IOV capability Mel
183. t dev lt PCI device gt image lt file name gt bin b Step 8 Reboot the system for changes to take effect For more information please contact Mellanox Support 3 5 4 4 2 Enabling SR IOV in Mellanox WinOF Package Ethernet SR IOV Only gt To enable SR IOV in Mellanox WinOF Package Step 1 Install Mellanox WinOF package that supports SR IOV Step 2 Configure HCA ports type to Ethernet For further information please refer to Section 3 1 1 Port Configuration on page 28 Note SR IOV cannot be enabled if one of the ports is InfiniBand Step3 Set the Execution Policy specified in Section 3 3 1 PowerShell Configuration on page 67 Mellanox Technologies 84 J Rev 5 10 Step 4 Query SR IOV configuration with Powershell PS Get MlnxPCIDeviceSriovSetting Example Caption MLNX PCIDeviceSriovSettingData Mellanox ConnectX 3 PRO VPI MT04103 Network Adapter Description Mellanox ConnectX 3 PRO VPI MT04103 Network Adapter ElementName HCA 0 InstanceID PCI VEN_15B3 amp DEV_1007 amp SUBSYS_22F5103C amp REV_00 24BE05FFFFB9E2E000 Name HCA 0 Source a 3 SystemName LAB NALABJSS5EE SriovEnable False SriovPortlNumVFs 16 SriovPort2NumVFs 0 SriovPortMode gt 0 PSComputerName Step 5 Enable SR IOV through Powershell on both ports PS Set MlnxPCIDeviceSriovSetting Name HCA 0 SriovEnable true SriovPortMode 2 SriovPortlNumVFs 8 SriovPort2NumVFs 8 Example Confirm Are you sure you w
184. t Path iscsi 11 4 12 65 iqn 2011 01 iscsiboot Assuming the iSCSI target IP is 11 4 12 65 and the Target Name iqn 2011 01 iscsiboot 060 PXEClient PXEClient 066 Boot WDS server IP address Server Host Name 067 Boot File boot x86 wdsnbp com Name When DHCP and WDS are NOT deployed on the same server DHCP options 60 66 67 should be empty and the WDS option 60 must be configured 2 9 2 Configuring the Client Machine gt To configuring your client 1 Verify the Mellanox adapter card is burned with the correct Mellanox FlexBoot version For boot over Ethernet when using adapter cards with older firmware version than 2 30 8000 you need to burn the adapter card with Ethernet FlexBoot otherwise use the VPI FlexBoot 2 Verify the Mellanox adapter card is burned with the correct firmware version 3 Set the Mellanox Adapter Card as the first boot device in the BIOS settings boot order Mellanox Technologies 25 J Rev 5 10 2 9 3 Installing iSCSI 1 Reboot your iSCSI client 2 Press F12 when asked to proceed to iSCSI boot Virtual Media File View Macros Tools Power Chat Performance Help ILink doun TX O TXE O RX O RXE 01 Maiting for link up DHCP net1 00 c9 00 neti 11 0 0 217255 netO 11 0 0 20 255 Next server Filename Root path iqn 1991 05 com microsoft 1 winqa 083 1 winqa 083 target Registered SAN device 0x80 tftp 11 0 0 83 bootz5Cx8675Cudsnbp com
185. t by ConnectX 3 and ConnectX 3Pro network interface Packets Sent Sec Shows the rate at which packets are sent by ConnectX 3 and Con nectX 3Pro network interface Bytes TOTAL Bytes Total Shows the total of bytes handled by the adapter The counted bytes include framing characters Bytes Total Sec Shows the total rate of bytes that are sent and received by the adapter The counted bytes include framing characters Packets Total Shows the total of packets handled by ConnectX 3 and ConnectX 3Pro network interface Packets Total Sec Shows the rate at which packets are sent and received by Con nectX 3 and ConnectX 3Pro network interface Control Packets The total number of successfully received control frames ERRORS DROP AND MISC INDICATIONS Packets Outbound Errors Shows the number of outbound packets that could not be transmit ted because of errors found in the physical layer Mellanox Technologies 130 Rev 5 10 Table 24 Mellanox Adapter Traffic Counters Mellanox Adapter Traffic Counters Description Packets Outbound Discarded Shows the number of outbound packets to be discarded in the physical layer even though no errors had been detected to prevent transmission One possible reason for discarding packets could be to free up some buffer space Packets Received Errors Shows the number of inbound packets that contained errors in the
186. t exposed via the UI RxSmallPacketBypass eth 0 Specifies whether received small packets bypass larger IPoIB 0 packets when indicating received packet to NDIS This mode is useful in bi directional applications Enabling this mode ensures that the ACK packet will bypass the regular packet and TCP IP stack will issue the next packet more quickly The valid values are 0 disable e 1 enable Note This registry value is not exposed via the UI ReturnPacketThreshold eth 341 The allowed number of free received packets on the IPoIB 341 rings Any number above it will cause the driver to return the packet to the hardware immediately When the value is set to 0 the adapter uses 2 3 of the received ring size The valid values are 0 to 4096 Note This registry value is not exposed via the UI Mellanox Technologies 101 Rev 5 10 Table 13 Performance Registry Keys Value Name Default Value Description NumTcb eth 16 IPoIB 16 The number of send buffers that the driver allocates for sending purposes Each buffer is in LSO size if LSO is enabled or in MTU size otherwise The valid values are 1 up to 64 Note This registry value is not exposed via the UI ThreadPoll eth 10000 IPoIB 10000 The number of cycles that should be passed without receiving any packet before the polling mechanism stops when using polling completion method for receiving Afterwards receiving new
187. t installation manually To improve the network adapter performance activate the performance tuning tool as fol lows Step 1 Start the Device Manager open a command line window and enter devmgmt msc Step 2 Open Network Adapters Step3 Select Mellanox IPoIB adapter right click and select Properties Step 4 Select the Performance tab Step 5 Choose one of the tuning scenarios Single port traffic Improves performance for running single port traffic each time Dual port traffic Improves performance for running traffic on both ports simultaneously Forwarding traffic Improves performance for running scenarios that involve both ports for exam ple via IXIA Multicast traffic Improves performance when the main traffic runs on multicast Step 6 Click on Run Tuning button Clicking the Run Tuning button changes several registry entries described below and checks for system services that may decrease network performance It also generates a log including the applied changes Users can view this log to restore the previous values The log path is SHOMEDRIVES Windows System32 LogFiles PerformanceTunning log This tuning is required to be performed only once after the installation is completed and on one adapter only as long as these entries are not changed directly in the registry or by some other installation or script A reboot may be required for the changes to take effect aa
188. tch to the pri mary adapter even though the active adapter can continue functioning as the active one When the checkbox is unchecked the active adapter will remain active even though the pri mary can function as the active one Mellanox ConnectX 3 Ethernet Adapter Properties x General Advanced Information Performance Diagnostics VLAN Teaming Driver Details PowerManagement DA Load Balancing and Fail Over Settings Mellanox Team Name Remi Team Type Fault Tolerance Primary Mellanox ConnectX 3 Ethernet Adapter Iv Failback to Primary T Use primary Mac Address Select the adapters to include in the team Lider Nae E Ree V Mellanox ConnectX 3 Ethemet Adapter 1 Mellanox ConnectX 3 Ethernet Adapter 2 Commit Cancel Teaming provides Load Balancing and Fail Over The administrator can configure a team of adapters and associate up to 8 Mellanox ConnectX adapters to this team Teaming should be used to increase the system reliability upon a link failure and to balance the workload x Cancel m i mnl md dumm m mtm Mellanox Technologies 41 Rev 5 10 Step 7 Optional Primary MAC Address This option sets the team MAC address to be the same as the primary adapter MAC address Mellanox ConnectX 3 Ethernet Adapter Properties x General Advanced Information Performance Diagnostics VLAN Teaming Driver Details Power Management Load Balancin
189. tem Y K lt lt lt Step2 Right click a Mellanox network adapter under Network adapters list and left click Properties Select the Advanced tab from the Properties sheet Details Events Power Management General Advanced Information Performance Driver The following properties are available for this network adapter Click the property you want to change on the left and then select its value on the right Property Value Bus master DMA Operations Enabled Flow Control Header Data Split Interrupt Moderation Interrupt Moderation Rx Packet Cc Interrupt Moderation RX Packet Ti Interrupt Moderation TX Packet Cc Interrupt Moderation TX Packet Tn IP 4 Checksum Offload Jumbo Packet Large Send Offload LSO Large Send Offload Y2 IPv4 Large Send Offload 2 IPv6 Large Send Offload Version 1 IPs Mellanox Technologies 50 Rev 5 10 Step 3 Modify configuration parameters to suit your system Please note the following a For help on a specific parameter option check the help button at the bottom of the dia log b If you select one of the entries Off load Options Performance Options or Flow Con trol Options you ll need to click the Properties button to modify parameters via a pop up dialog 3 1 10 Differentiated Services Code Point DSCP DSCP is a mechanism used for classifying network traffic on I
190. ter c Incase a team loses one or more network adapters by a create or modify operation the remaining adapt ers in the team are automatically notified of the change 3 3 Management 3 3 1 PowerShell Configuration PowerShell is a task automation and configuration management framework from Microsoft con sisting of a command line shell and associated scripting language built on the NET Framework PowerShell provides full access to COM and WMI enabling administrators to perform adminis trative tasks on both local and remote Windows systems as well as WS Management and CIM enabling management of remote Linux systems and network devices Prior to working with it PowerShell must be configured as follow Step 1 Set the Execution policy to AllSigned PS Set ExecutionPolicy AllSigned Execution Policy Change The execution policy helps protect you from scripts that you do not trust Changing the execution policy might expose you to the security risks described in the about Execution Policies help topic at http go microsoft com fwlink LinkID 135170 Do you want to change the execution pol icy Y Yes N No S Suspend Help default is Y y Step 2 Add Mellanox to the trusted publishers by selecting A Always run as shown in the example below PS Get MInxPCIDeviceSriovSetting Mellanox Technologies 67 J Rev 5 10 3 4 Storage Protocols 3 4 4 Deploying Windows Server 2012 and Above with SMB D
191. the system opens a single Rx ring The Rx ring number that is configured should be powered of two and less than the number of processors on the sys tem Value Type DWORD The valid values are 1 up to number of processors on the system RssBaseCpu 1 The CPU number of the first CPU that the RSS can use NDIS uses the default value of 0 for the base CPU number however this value is configurable and can be changed The Mellanox adapter reads this value from registry and sets it to NDIS on driver start up Value Type DWORD The valid values are 0 up to the number of processors on the system CheckFwVersion 1 Configures the Mellanox driver to skip validation of the FW compatibility to the driver version Skipping this check up is not recommended and can cause unexpected behavior It can be used for testing purposes only Value Type DWORD The valid values are 0 Don t check e 1 Check MaximumWorkingThreads 2 The number of working threads which can work simultane ously on receive polling By default the Mellanox driver creates a working thread for each Rx rings if polling or adaptive receive completion is set Value Type DWORD The valid values are 1 up to number of Rx rings Mellanox Technologies 111 Rev 5 10 3 6 9 MLX BUS Registry Keys 3 6 9 4 SR IOV Registry Keys SR IOV feature can be controlled on a machine level or per device using the same set of Regis try Keys However only o
192. thernet Adapter 2 K gt Mellanox Virtual Miniport Driver Team A B p Other devices jp Base System Device EP Ports COM amp LPT Bl Processors Y CZ Storage controllers Egi System devices 2 Universal Serial Bus controllers E E E 8 8 8 M Mo P To modify an existing team perform the following Mellanox Technologies 42 Rev 5 10 a Select the desired team and click Modify b Modify the team name its type and or the participating adapters c Click the Commit button gt To remove an existing team select the desired team and click Remove You will be prompted to approve this action Notes on this step a Each adapter that participates in a team has two properties Status Connected Disconnected Disabled Role Active or Backup b Each network adapter that is added or removed from a team gets refreshed i e disabled then enabled This may cause a temporary loss of connection to the adapter c Incase a team loses one or more network adapters by a create or modify operation the remaining adapt ers in the team are automatically notified of the change 3 1 5 2 2 VLAN Configuration to a Team In order to configure a VLAN to a team follow the steps below 1 Open the Device Manager 2 Go to Network Adapters 3 Right click on the Team Adapter that was created and click on the VLAN tab lt Device Manager File Action View Help Gene TTD iver Details
193. tion Related Troubleshooting on page 151 Section 5 3 Ethernet Related Troubleshooting on page 153 Section 3 1 5 Teaming and VLAN on page 39 Section 3 1 10 Differentiated Services Code Point DSCP on page 51 Section 3 1 4 2 RoCEv2 on page 34 Section 4 Utilities on page 138 Section 42 part man Virtual IPoIB Port Creation Utility on page 138 Rev 4 80 50000 August 30 2014 Added the following sections Section 3 8 4 1 4 Propriety RDMA Activity on page 135 Section 3 6 9 MLX BUS Registry Keys on page 112 Section 4 1 Snapshot Tool on page 138 Section 3 2 7 Multiple Interfaces over non default PKeys Support on page 61 Section 5 4 1 General Diagnostic on page 156 Updated the following sections Section 4 4 InfiniBand Fabric Diagnostic Utilities on page 142 Section 4 5 Fabric Performance Utilities on page 145 Section 4 2 part man Virtual IPoIB Port Creation Utility on page 138 Mellanox Technologies 3 J Rev 5 10 Table 1 Document Revision History Document Revision Date Changes Rev 4 70 May 4 2014 Updated the following sections Section 1 2 WinOF Set of Documentation on page 12 Section 2 7 Firmware Upgrade on page 24 Section 3 5 4 4 2 Enabling SR IOV in Mellanox WinOF Package Ethernet SR IOV Only on page 84 Section 3 4 1 2 1 Verifying Network Adapter Config
194. tional Message match RDMA 1 The NETSTAT command confirms if the File Server is listening on the RDMA interfaces Mellanox Technologies 69 J Rev 5 10 3 5 Virtualization 3 5 4 Virtual Ethernet Adapter The Virtual Ethernet Adapter VEA provides a mechanism enabling multiple ethernet adapters on the same physical port Each of these multiple adapters is referred to as a virtual ethernet adapter VEA At present one can have a total of two VEAs per port The first VEA normally the only adapter for the physical port is referred to as a physical VEA The second VEA if present is called a virtual VEA currently only a single Virtual VEA is supported The difference between a vir tual and a physical VEA is that RDMA is only available through the physical VEA In addition certain settings for the port can only be configured on the physical VEA see VEA Feature Lim itations on page 70 The VEA feature is designed to extend the OS capabilities and increase the usability of the net work adapter At present once the user binds the RDMA capable network adapter to either team ing interface or Hyper V the RDMA capability ND and NDK is blocked by the OS Hence if the user 1s interested to have RDMA and teaming or Hyper V at the same time on the same phys ical Ethernet port then he can take advantage of this feature creating two VEAs the first for RDMA and the second for the other use The user can manage VE
195. tool name lt tool arguments gt 4 6 1 1 mstdump Tool This tool is used to create 6 mstdump files upon user request For further information on the files created you may refer to Table 39 Events Causing Automatic State Dumps on page 159 The parameters used in this command are lt bus gt devices lt function gt The PCI information can be queried from the General properties tab under Location Example If the Location is PCI Slot 3 PCI bus 8 device 0 function 0 run the following command mlxtool dbg mstdump 8 0 0 gt The output will indicate the files location and the index in the file name for this execution Example mstdump succeeded Dump files for device at location 8 0 0 were created in systemroot temp directory with set index 4 4 6 1 2 oid stats Tool This tool displays the OIDs statistics For each invoked OID the tool will display the following Oid Name Oid ID Total times Min Time uS Max Time uS Last Oid uS Average Time uS invoked Example If you wish to display the information of Ethernet 5 interface run the following command mlxtool dbg oid stats Ethernet 5 This command can be invoked on a specific IPoIB or ETH interface If no interface name is pro vided the information will be shown for all the interfaces 4 6 1 3 cmd stats Tool This tool displays the device commands statistics For each invoked command the tool will dis play t
196. total duration of packets transmission being paused on this priority in microseconds Received Pause Frames The number of pause frames that were received to this pri ority from the far end port The untagged instance indicates the number of global pause frames that were received Received Pause Duration The total duration that far end port was requested to pause for the transmission of packets in microseconds Sent Discard Frames The number of packets discarded by the transmitter Note this counter is per TC and not per priority 3 8 4 1 4 Propriety RDMA Activity Proprietary RDMA Activity counter set consists of NDK and NDSPI performance counters These performance counters allow you to track Network Direct Kernel RDMA activity includ ing traffic rates errors and control plane activity Table 27 RDMA Activity RDMA Activity Counters Description RDMA Accepted Connections The number of inbound RDMA connections established RDMA Active Connections The number of active RDMA connections RDMA Completion Queue This counter is not supported and always is set to zero Errors RDMA Connection Errors The number of established connections with an error before a consumer disconnected the connection RDMA Failed Connection The number of inbound and outbound RDMA connection Attempts attempts that failed RDMA Inbound Bytes sec The number of bytes for all incoming RDMA traffic This
197. tp remtech word abis d Vou tay re press com 2013 08 27 enable it after the Server 2012 remote desk installation is com top session host installa plete tion hangs at windows installer coordinator 5 1 1 Installation Error Codes and Troubleshooting 5 1 1 1 Setup Return Codes Table 32 Setup Return Codes Error Code Description Troubleshooting 1603 Fatal error during installation Contact support 1633 The installation package is not supported Make sure you are installing the on this platform right package for your platform For additional details on Windows installer return codes please refer to http support microsoft com kb 229683 Mellanox Technologies 151 Rev 5 10 5 1 1 2 Firmware Burning Warning Codes Table 33 Firmware Burning Warning Codes Error Code Description Troubleshooting 1004 Failed to open the device Contact support 1005 Could not find an image for at The firmware for your device was not least one device found Please try to manually burn the firmware 1006 Found one device that has multiple Burn the firmware manually and select images the image you want to burn 1007 Found one device for which force Burn the firmware manually with the update is required force flag 1008 Found one device that has mixed The firmware version or the expansion versions rom version does not match For additional details p
198. ts Mellanox Technologies 71 J Rev 5 10 across an IP fabric and uses 24 bits of the GRE key as a logical network discriminator called a tenant network ID Configuring the Hyper V Network Virtualization requires two types of IP addresses Provider Addresses PA Unique IP addresses assigned to each Hyper V host that are routable across the physical network infrastructure Each Hyper V host requires at least one PA to be assigned Customer Addresses CA Unique IP addresses assigned to each Virtual Machine that participate on a virtualized network Using NVGRE multiple CAs for VMs run ning on a Hyper V host can be tunneled using a single PA on that Hyper V host CAs must be unique across all VMs on the same virtual network but they do not need to be unique across virtual networks with different Virtual Subnet ID The VM generates a packet with the addresses of the sender and the recipient within the CA space Then Hyper V host encapsulates the packet with the addresses of the sender and the recip ient in PA space PA addresses are determined by using virtualization table Hyper V host retrieves the received packet identifies recipient and forwards the original packet with the CA addresses to the desired VM NVGRE can be implemented across an existing physical IP network without requiring changes to physical network switch architecture Since NVGRE tunnels terminate at each Hyper V host the hosts handle all encapsulati
199. umber of resync operations when the local machine receives inbound traffic Mellanox Technologies 132 Rev 5 10 Table 25 Mellanox Adapter Diagnostics Counters Mellanox Adapter Diagnostics Counters Description Requester Remote operation errors Number of remote operation errors when the local machine gener ates outbound traffic i e NAK was received indicating that the other end encountered an error that prevented it from completing the request Requester transport retries exceeded errors Number of transport retries exceeded errors when the local machine generates outbound traffic Requester RNR NAK retries exceeded errors Number of RNR Receiver Not Ready NAKs retries exceeded errors when the local machine generates outbound traffic Bad multicast received Number of bad multicast packet received Discarded UD packets Number of UD packets silently discarded on the receive queue due to lack of receives descriptor Discarded UC packets Number of UC packets silently discarded on the receive queue due to lack of receives descriptor CQ overflows Number of CQ overflows NOTE this value is evaluated for the entire NIC since there are cases where CQ might be associated with both ports i e the value on all ports is identical EQ overflows Number of EQ overflows NOTE this value is evaluated for the entire NIC since there are cases where EQ might be as
200. use Mellanox WinOF VPI package Example Mellanox msi exe a 3 Add the Mellanox driver to boot wim dism Mount Wim WimFile boot wim index 2 MountDir mnt dism Image mnt Add Driver Driver drivers recurse dism Unmount Wim MountDir mnt commit 4 Add the Mellanox driver to install wim dism Mount Wim WimFile install wim index 4 MountDir mnt dism Image mnt Add Driver Driver drivers recurse dism Unmount Wim MountDir mnt commit 5 Add the new boot and install images to WDS 1 Use index 2 for Windows setup and index 1 for WinPE 2 When adding the Mellanox driver to install wim verify you are using the appropriate index for your OS flavor To check the OS run imagex info install win Mellanox Technologies 24 J Rev 5 10 For additional details on WDS please refer to http technet microsoft com en us library jj648426 aspx 2 9 1 2 Configuring iSCSI Target gt To configure iSCSI Target 1 Install iSCSI Target e g StartWind 2 Add to the iSCSI target initiators the IP addresses of the iSCSI clients 2 9 1 3 Configuring the DHCP Server gt To configure the DHCP server Install a DHCP server Add to IPv4 a new scope Add iSCSI boot client identifier MAC GUID to the DHCP reservation Add to the reserved IP address the following options if DHCP and WDS are deployed on the same server me M pa Table 6 Reserved IP Address Options Option Name Value 017 Roo
201. user given names For diagnostic tools to fully support the topology file the user may need to provide the local sys tem name if the local hostname is not used in the topology file To specify a topology file to a diagnostic tool use one of the following two options 1 On the command line specify the file name using the option t topology file name gt 2 Define the environment variable IBDIAG TOPO FILE To specify the local system name to a diagnostic tool use one of the following two options 1 On the command line specify the system name using the option s 1ocal system name gt 2 Define the environment variable IBDIAG SYS NAME IB Interface Definition The diagnostic tools installed on a machine connect to the IB fabric by means of an HCA port through which they send MADs To specify this port to an IB diagnostic tool use one of the fol lowing options 1 On the command line specify the port number using the option p local port number gt see below 2 Define the environment variable IBDIAG PORT NUM In case more than one HCA device is installed on the local machine it is necessary to specify the device s index to the tool as well For this use on of the following options 1 On the command line specify the index of the local device using the following option j index of local device gt Define the environment variable IBDIAG DEV IDX Mellanox Technologies 142 Rev 5 10 Addressi
202. ute RoutingDomainID 11111111 2222 3333 4444 000000005001 VirtualSubnetID 5001 DestinationPrefix 172 16 0 0 16 NextHop 0 0 0 0 Metric 255 Step 4 Configure the Provider Address and Route records on Hyper V Host 2 Host 2 Only mtlael5 SNIC Get NetAdapter Port1 New NetVirtualizationProviderAddress InterfaceIndex SNIC InterfaceIndex ProviderAddress 192 168 20 115 PrefixLength 24 New NetVirtualizationProviderRoute InterfaceIndex NIC InterfaceIndex DestinationPrefix 0 0 0 0 0 NextHop 192 168 20 1 Step 5 Configure the Virtual Subnet ID on the Hyper V Network Switch Ports for each Virtual Machine on each Hyper V Host Host 1 and Host 2 Run the command below for each VM on the host the VM is running on it i e the for mtlael4 005 mtlael4 006 on host 192 168 20 114 and for VMs mtlael5 005 mtlael15 006 on host 192 168 20 115 mtlael5 only Get VMNetworkAdapter VMName mtlael5 005 where MacAddress eq 00155D730100 Set VMNet workAdapter VirtualSubnetID 5001 Get VMNetworkAdapter VMName mtlael5 006 where MacAddress eq 00155D730101 Set VMNet workAdapter VirtualSubnetID 5001 mn Mellanox Technologies 165 Rev 5 10 Appendix B Windows MPI MS MPI B 1 Overview Message Passing Interface MPI is meant to provide virtual topology synchronization and com munication functionality bet
203. v 5 10 Table 8 DSCP Default Registry Keys Settings Registry Key Default Value PriorityToDscpMappingTable 4 PriorityToDscpMappingTable 5 PriorityToDscpMappingTable 6 o t PriorityToDscpMappingTable 7 DscpBasedEtsEnabled eth 0 DscpForGlobalFlowControl 26 3 1 10 8 DSCP Sanity Testing To verify that all QoS and DSCP settings were correct you can capture incoming and outgoing traffic by using the ibdump tool and see the DSCP value in the captured packets as displayed in the figure below B ppap DWireshark 1 8 6 SUN Rev 48 File Edit Yew Go Capture Analyze Statistics Telephony Tools Internals Help uaearuxea ueeezalBE aaamaemxi5 Filter Expression Clear Apply Save No Time Source Destination Protocol Len Info 1086 source port 40152 Destination port expl tes on wire 8688 t 108 JytEes cap ed 8 5 Y Ethernet II Src Mellanox e9 57 11 00 02 c9 e9 57 11 Dst Mellanax e2 56 41 00 02 c9 e5 56 41 5 Internet Protocol version 4 Src 11 7 33 148 11 7 33 148 Dst 11 7 33 149 11 7 33 149 version 4 Header length 20 bytes olfferenttated services Field oroe oace 02033 unknown DScP ECN 0x02 ECTCO ECN Capab e Transport 0000 11 DIfferentlated services ss C unknown COxO3 TU Explicit Congestion Notification ECTCUJ CECN Capable Transport 0x02 Total Length 1068 Identification Ox0DOl C1 Flags 0x02 Don t Fragment Fragment offset O Time to
204. ve default values during the installation of the Mellanox adapters Most of the parameters are visible in the registry by default however certain parameters must be created in order to modify the default behavior of the Mellanox driver The adapter can be configured either from the User Interface Device Manager gt Mellanox Adapter gt Right click gt Properties or by setting the registry directly All Mellanox adapter parameters are located in the registry under the following registry key HKEY LOCAL MACHINE SYSTEM CurrentControlSet Control Class 4D36E972 E325 11CE BFC1 08002bE10318 lt Index gt Mellanox Technologies 92 Rev 5 10 The registry key can be divided into 4 different groups Group Description Basic Contains the basic configuration Offload Options Controls the offloading operation that the NIC supports Performance Options Controls the NIC operation in different environments and scenarios Flow Control Options Controls the TCP IP traffic Any registry key that starts with an asterisk is a well known registry key For more details regarding the registries please refer to http msdn microsoft com en us library ff570865 v VS 85 aspx 3 6 1 Finding the Index Value of the HCA To find the nn value of your HCA from the Device Manager please perform the following steps Step 1 Open Device Manager and go to System devices Step2 Right cl
205. ver creates a separate receive ring and an allocate buffer for it In order to minimize the memory consumption one can reduce the number of VMs that use VMQ in parallel However this can affect the performance The valid values are 1 up to 127 Note This registry value is not exposed via the UI MaxNumMacAddrFilters 127 The number of different MAC addresses that the physical port supports This registry key affects the number of supported MAC addresses that is reported to the OS The valid values are 1 up to 127 Note This registry value is not exposed via the UI Mellanox Technologies 108 Rev 5 10 Table 16 VMQ Options Value Name Default Value Description MaxNumVlanFilters 125 The number of VLANs that are supported for each port The valid values are 1 up to 127 Note This registry value is not exposed via the UI 3 6 7 IPoIB Registry Keys The following section describes the registry keys that are unique to IPoIB Table 17 IPoIB Registry Keys Value Name Default Description Value GUIDMask 0xE7 Controls the way the MAC is generated for IPoIB interface The driver uses the 8 bytes GUID to generate 6 bytes MAC This value should be either 0 or contain exactly 6 non zero digits using binary representation Zero 0 mask indicates its default value Oxb 11100111 That is to take all except intermediate bytes of GUID to form the MAC address In case of an im
206. vice Name Delay drop timeout occurred on port X Drop mode entered packets may now be dropped Mellanox Technologies 57 J Rev 5 10 3 1 12 Receive Side Scaling RSS 3 1 12 1 System Requirements Operating Systems Windows Server 2008 R2 Windows Server 2012 Windows Server 2012 R2 Windows 7 Client and Windows 8 1 Client 3 1 12 2 Using RSS Mellanox WinOF Rev 5 10 IPoIB and Ethernet drivers use NDIS 6 30 new RSS capabilities The main changes are Removed the previous limitation of 64 CPU cores Individual network adapter RSS configuration usage RSS capabilities can be set per individual adapters as well as globally To do so set the registry keys listed below For instructions on how to find interface index in registry lt nn gt please refer to Section 3 6 2 Finding the Index Value of the Network Interface on page 93 Table 10 Registry Keys Setting Sub key Description HKLM SYSTEM CurrentControlSet Con trol Class 4d36e972 e325 11ce bfc1 08002be10318 lt nn gt MaxRSSProcessors Maximum number of CPUs allotted Sets the desired maximum number of processors for each interface The num ber can be different for each interface Note Restart the network adapter after you change this registry key HKLM SYSTEM CurrentControlSet Con trol Class 4d3 6e972 e325 11ce bfc1 08002be10318 lt nn gt RssBaseProcNumber Base CPU number Sets the desired base CPU number f
207. w SDK No other source code changes are required 3 7 1 Network Direct Interface The Network Direct Interface NDI architecture provides application developers with a net working interface that enables zero copy data transfers between applications kernel bypass I O generation and completion processing and one sided data transfer operations NDI is supported by Microsoft and is the recommended method to write RDMA application NDI exposes the advanced capabilities of the Mellanox networking devices and allows applica tions to leverage advances of RDMA Both RoCE and InfiniBand IB can implement NDI For further information please refer to http msdn microsoft com en us library cc904397 v vs 85 aspx For code examples using NDI you may refer to https msdn microsoft com library cc853440 v vs 85 aspx 3 7 2 Win Linux nd rping Test The purpose of this test is to check interoperability between Linux and Windows via an RDMA ping The Windows nd rping was ported from Linux s RDMACM example rping c Windows If you wish to use a built in nd rping exe you may find it in Program Files Mella nox MLNX_VPI IB Tools e Ifyou wish to build the nd rping exe from scratch you can build it using the SDK exam ple choose the machine s OS in the configuration manager of the solution and build the nd rping exe Linux Installing the MLNX OFED on a Linux server will also provide the rping exe application Mellanox Tech
208. ween a set of processes With MPI you can run one process on several hosts Windows MPI run over the following protocols Sockets Ethernet Network Direct ND B 1 1 System Requirements nstall HPC Build 4 0 3906 0 Validate traffic ping between the whole MPI Hosts Every MPI client need to run smpd process which open the mpi channel MPI Initiator Server need to run mpiexec If the initiator is also client it should also run smpd B 2 Running MPI Step 1 Run the following command on each mpi client start smpd d p lt port gt Step2 Install ND provider on each MPI client in MPI ND Step 3 Run the following command on MPI server mpiexec exe p smpd port hosts num of hosts hosts ip list env MPICH NETMASK network ip subnet env MPICH ND ZCOPY THRESHOLD 1 env MPICH DISABLE ND 0 1 env MPICH DISABLE SOCK 0 1 affinity process B 3 Directing MSMPI Traffic Directing MPI traffic to a specific QoS priority may delayed due to Except for NetDirectPortMatchCondition the QoS powershell CmdLet for NetworkDi rect traffic does not support port range Therefore NetwrokDirect traffic cannot be directed to ports 1 65536 The MSMPI directive to control the port range namely MPICH PORT RANGE 3000 3030 is not working for ND and MSMPI chose a random port B 4 Running MSMPI on the Desired Priority Step 1 Set the default QoS policy to be the desired priority Note this prio
209. xpress Upstream Port 3500 op Intel R 631xESB 6321ESB 3100 Chipset LPC Interface Controller 2670 9 Intel R 631xESB 6321ESB 3100 Chipset PCI Express Root Port 1 2690 po Intel R 631xESB 6321ESB 3100 Chipset PCI Express Root Port 2 2692 op Intel R 631xESB 6321ESB 3100 Chipset SMBus Controller 269B pM Intel R 82801 PCI Bridge 244E I Mellanox ConnectX 3 MT04099 Network Adapter 7S Mellanox ConnectX 3 MT04099 Network Adapter of Microsoft ACPI Compliant System Mellanox Technologies 28 Rev 5 10 Step2 Right click on the Mellanox ConnectX Ethernet network adapter and left click Properties Select the Port Protocol tab from the Properties window The Port Protocol tab is displayed only if the NIC is a VPI IB and ETH v The figure below is an example of the displayed Port Protocol window for a dual port VPI adapter card General Port Protocol Driver Details Events Resources j 1 Current Setting Port IB Mel ees Port2 Eth HCA Port Type Configuration HW Defaults Port1 IB C ETH C AUTO r Port 2 C IB ETH C AUTO Port Protocol Configuration This menu displays the adapter s port type and enables you to set the network protocols for the network adapter ports e The network protocol is determined according to the NIC s Hardware Defaults port type You can choose the protocol explicitly by selecting the port type to InfiniBand IB or Ethernet Eth
210. y Internal Virtual Switch Name Internal only Mellanox SRIOV Virtual Switch Mellanox SRIOV Virtual Switch Mellanox ConnectX 3 Etherne Notes amp Global Network Settings MAC Address R 00 15 5D 21 te Virtual Switch Properties je to 00 15 5D 2 Connection type What do you want to connect this virtual switch to External network Mellanox Connectx 3 Ethernet Adapter v v Allow management operating system to share this network adapter v Enable single root I O virtualization SR IOV Internal network Private network VLAN ID Enable virtual LAN identification For management operating system jal LAN tha age N QU SR IOV can only be configured when the virtual switch is created An external virtual switch with SR IOV enabled cannot be converted to an internal or private Switch Step3 Click Apply Step4 Click OK Mellanox Technologies 89 Rev 5 10 Step 5 Adda VMNIC connected to a Mellanox vSwitch in the VM hardware settings Under Actions go to Settings gt Add New Hardware gt Network Adapter gt OK In Virtual Switch dropdown box choose Mellanox SR IOV Virtual Switch Figure 12 Adding a VMNIC to a Mellanox V switch vmi v G amp Hardware O Network Adapter Add Hardware K BIOS Specify the configuration of the network adapter
211. y Value8021 Action flag The NIC uses a mapping table between the CoS value and the DSCP value configured through the RroceDscpMarkPriorityFlow Control 0 7 Registry keys 3 1 10 3 Configuring Quality of Service for TCP and RDMA Traffic Step 1 Verify that DCB is installed and enabled is not installed by default PS Install WindowsFeature Data Center Bridging Step2 Import the PowerShell modules that are required to configure DCB PS import module NetQos PS import module DcbQos PS import module NetAdapter Mellanox Technologies 51 J Rev 5 10 Step 3 Configure DCB PS Set NetQosDcbxSetting Willing 0 Step 4 Enable Network Adapter QoS PS Set NetAdapterQos Name Cx3Pro ETH P1 Enabled 1 Step 5 Enable Priority Flow Control PFC on the specific priority 3 5 PS Enable NetQosFlowControl 3 5 3 1 10 4 Configuring DSCP to Control PFC for TCP Traffic Create a QoS policy to tag All TCP UDP traffic with CoS value 3 and DSCP value 9 PS New NetQosPolicy DEFAULT PriorityValue8021Action 3 DSCPAction 9 DSCP can also be configured per protocol PS New NetQosPolicy TCP IPProtocolMatchCondition TCP PriorityValue8021Action 3 DSCPAction 16 PS New NetQosPolicy UDP IPProtocolMatchCondition UDP PriorityValue8021Action 3 DSCPAction 32 3 1 10 5 Configuring DSCP to Control ETS for TCP Traffic Create a QoS policy to tag All TCP UDP traffic with CoS value 0 and DSCP value 8 PS New NetQosPolicy DEFAU
212. y to a Hyper V child partition s shared memory Scaling to multiple processors by processing packets for different virtual machines on different processors gt To enable Hyper V with VMQ using UI Step 1 Open Hyper V Manager Step2 Right click the desired Virtual Machine VM and left click Settings in the pop up menu Step 3 In the Settings window under the relevant network adapter select Hardware Accelera tion Step 4 Check uncheck the box Enable virtual machine queue to enable disable VMQ on that spe cific network adapter To enable Hyper V with VMQ using PowerShell Step 1 Enable VMQ on a specific VM Set VMNetworkAdapter VM Name VmqWeight 100 Step 2 Disable VMQ on a specific VM Set VMNetworkAdapter VM Name VmqWeight 0 3 5 3 Network Virtualization using Generic Routing Encapsulation NVGRE P Network Virtualization using Generic Routing Encapsulation NVGRE off load is currently sup z ported in Windows Server 2012 R2 with the latest updates for Microsoft 3 5 3 1 System Requirements Operating Systems Windows Server 2012 R2 Mellanox ConnectX 3 Pro Adapter with firmware v2 30 8000 or higher 3 5 3 2 Using NVGRE Network Virtualization using Generic Routing Encapsulation NVGRE is a network virtualiza tion technology that attempts to alleviate the scalability problems associated with large cloud computing deployments It uses Generic Routing Encapsulation GRE to tunnel layer 2 packe
213. yper V Management Tools Installed Remote Desktop Services Tools Windows Server Update Services Tools lt Previous Next In Cancel Step 4 Confirm the Installation DESTINATION SERVER Confirm installation selections L rsc 002 mtt abs minx Before You Begin To install the following roles role services or features on selected server click Install Installation Type Restart the destination server automatically if required Server Selection Optional features such as administration tools might be displayed on this page because they have been selected automatically If you do not want to install these optional features click Previous to clear Server Roles their check boxes Features Hyper V Hyper V Virtual Switches Remote Server Administration Tools Role Administration Tools Hyper V Management Tools Hyper V GUI Management Tools Hyper V Module for Windows PowerShell Migration Default St Export configuration settings Specify an alternate source path lt Previous Install Cancel Mellanox Technologies 79 Rev 5 10 Step 5 Click Install DESTINATION SERVER Confirm installation selections xcs Galicia To install the following roles role services or features on selected server click Install Restart the destination server automatically if required Optional features such as admi
Download Pdf Manuals
Related Search
Related Contents
ASUS M32BC 9584 User's Manual HP rp5800 User's Manual T04 Agente Mineralizante eduTrac SIS 6.0.00 Release Notes STRIPE COMFORT A greve da oficina de chumbo - Tribunal Regional do Trabalho da 4ª Samsung VC6306L User Manual Manual de uso - produktinfo.conrad.com EM 220 Windows Driver Manual - Zebra Technologies Corporation Copyright © All rights reserved.
Failed to retrieve file