Home

Mellanox OFED for FreeBSD User Manual

image

Contents

1. sessrreseers trrtr Ie 22 4 2 2 Dual NUMA Architectures isiin ie AEE hh 23 Mellanox Technologies 3 List of Tables Table 1 Document Revision History ssseeseeseeereersresrere II II 5 Table 2 Abbreviations and Acronyms 0 een ence n 7 Table 3 Glossaty rynien seb hee eed dears RW ae gate PURI PREP ES A elects 8 Table 4 Reference Documents 0 0 cee cette t een eens 8 Table 5 Mellanox OFED for FreeBSD Software Components 0 00 00 eee eee 9 Mellanox Technologies 4 Rev 2 1 6 Document Revision History Table 1 Document Revision History Release Date Description 2 1 6 June 2015 Added the following sections Section 3 1 3 EEPROM Cable Module Information Reader on page 21 e Updated the following sections e 3 1 2 Packet Pacing on page 18 2 1 5 January 15 2015 Added the following sections e Section 3 Features Overview and Configuration on page 17 e Updated the following sections Section 1 Overview on page 9 Section 2 Installation on page 11 Section 4 Performance Tuning on page 22 5 Mellanox Technologies Rev 2 1 6 About this Manual This Preface provides general information concerning the scope and organization of this User s Manual Intended Audience This manual is intended for system administrators responsible for the installation configuration management and maintenance of the softw
2. 1 Aug 21 2014 mlx4_en Mellanox ConnectX HCA Ethernet driver v2 1 Aug 18 2014 gt To query stateless offload status gt ifconfig mlxen lt x gt Note lt x gt is the OS assigned interface number gt To set stateless offload status gt ifconfig mlxen lt x gt rxcsum rxcsum txcsum txcsum tso tso lro lro Note lt x gt is the OS assigned interface number gt To query interrupt coalescing settings gt sysctl a grep adaptive Example gt sysctl a grep adaptive hw mlxen0 conf adaptive rx coal 1 hw mlxenl conf adaptive rx coal 1 To enable disable adaptive interrupt moderation gt sysctl hw mlxen x conf adaptive rx coal 1 0 Note x is the OS assigned interface number By default the driver uses adaptive interrupt moderation for the receive path which adjusts the moderation time to the traffic pattern To query values for packet rate limits and for moderation time high and low i sysctl a grep pkt rate gt sysctl a grep rx usecs gt To set the values for packet rate limits and for moderation time high and low gt sysctl hw mlxen x conf pkt rate low N gt sysctl hw mlxen x conf pkt rate high N gt sysctl hw mlxen x conf rx usecs low N gt sysctl hw mlxen x conf rx usecs high N Note x is the OS assigned interface number Example i sysctl a grep pkt rate hw mlxen0 conf pkt rate low 400000 hw mlxen0 conf pkt rate high 4
3. Offset Values 0x0000 0d 02 06 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0080 0d 00 23 08 00 00 00 00 00 00 00 05 8d 00 00 00 0x0090 00 00 04 a0 4d 65 6c 6c 61 6e 6f 78 20 20 20 20 0x00a0 AD 20 20 20 Oe 00 02 CS Acl A BA Be BO oy Sil 3 0x00b0 36 2d 30 30 34 20 20 20 41 33 06 0a 0d 00 46 74 0x00c0 00 00 00 00 4d 54 31 31 33 39 56 53 30 30 34 38 0x00d0 9282082082089 Si 30 So Be 84 20 20 OO WO WO er 0x00e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x00 0 00 00 00 00 00 00 00 00 00 00 02 00 00 00 00 00 21 Mellanox Technologies Rev 2 1 6 4 Performance Tuning 4 1 Interrupt Moderation Interrupt moderation is used to decrease the frequency of network adapter interruptions to the CPU Mellanox network adapters use an adaptive interrupt moderation algorithm by default The algorithm checks the transmission Tx receives Rx packet rates and modifies the Rx interrupt moderation settings accordingly In order to manually set Rx interrupt moderation use sysctl Step 1 Turn OFF the interrupt moderation Run gt sysctl hw mlxen x conf adaptive
4. an Ethernet NIC Software Components Mellanox OFED for FreeBSD contains the following software components Table 5 Mellanox OFED for FreeBSD Software Components Components Description mlx4 core Handles low level functions like device initialization and firmware commands processing Also controls resource allocation so that the InfiniBand Ethernet and FC functions can share a device without interfering with each other mlx4 en Handles Ethernet specific functions and plugs into the netdev mid layer 9 Mellanox Technologies Rev 2 1 6 Table 5 Mellanox OFED for FreeBSD Software Components Components Description mlx4 ib Handles InfiniBand specific functions supplied by ib core in order to interact with verbs and ULPs Documentation Release Notes User Manual Mellanox Technologies 10 Rev 2 1 6 Installation 2 Installation This chapter describes how to install and test the Mellanox driver for FreeBSD package on a sin gle host machine with Mellanox InfiniBand and or Ethernet adapter hardware installed 2 1 Software Dependencies To install the driver software kernel sources must be installed on the machine e Torun the Packet Pacing feature the kernel patch and dedicated FW version included in the Packet Pacing package need to be applied 2 2 Downloading Mellanox Driver for FreeBSD 1 Verify that the system has a Mellanox network adapter HCA NIC installed
5. logic device called mlx4_core0 a Find the device interrupts vmstat ia grep mlx4 cored awk print 1 sed s irq sed s 285 286 287 b Bind each interrupt to a desirable core cpuse x 285 d cpuset x 286 1 2 cpuset x 287 1 3 c Bind the application to the desirable core cpuset l 1 11 app name sever flag cpuset l 1 11 app name client flag IP Specifying a range of CPUs when using the cpuset command will allow the application to choose any of them This is important for applications that execute on multiple threads ae The range argument is not supported for interrupt binding 4 2 2 Dual NUMA Architecture 1 Find the CPU list closest to the NIC a Find the NIC s PCI location pciconf lv grep mlx mlx4_core0 pci0 4 0 0 class 0x028000 card 0x000315b3 chip 0x100715b3 rev 0x00 hdr 0x00 Usually low PCI locations are closest to NUMA number 0 and high PCI locations are closest to NUMA number 1 Here is how to verify the locations b Find the NIC s pcib by PCI location in this example try PCI 4 sysctl a grep pci 4 paren dev pci 4 parent pcib3 c Find the NIC s pcib location sysctl a grep pcib 3 location dev pcib 3 location slot 2 function 0 handle SB PCIO NPE3 In handle PCIO is the value for locations near NUMAO and PCI is the value for locations near NUMAI d Find the cores list of the closest NUMA sysctl a grep group level 2 A 1 group level 2 ca
6. number of IP addresses that are assigned to all network devices associated with the port including VLAN devices The first entry in the GID table at index 0 for each port is always present and equal to the link local IPv6 address of the net device that is associated with the port Note that even if the link local IPv6 address is not set index O is still populated e GID format can be of 2 types IPv4 and IPv6 IPv4 GID is a IPv4 mapped IPv6 address while IPv6 GID is the IPv6 address itself Load the following modules for RoCE support mlx4 core ib core mlx4 ib and mlx 4 en 1 For the IPv4 address A B C D the corresponding IPv4 mapped IPv6 address is ffff A B C D 17 Mellanox Technologies Rev 2 1 6 3 1 2 Packet Pacing Packet pacing also known as the rate limit defines a maximum bandwidth allowed for a TCP connection Limitation is done by hardware where each QP transmit queue has a rate limit value from which it calculates the delay between each packet sent 3 1 2 1 Setting Properties for Packet Pacing Before loading mlxen module set which property 0 7 is relevant for packet pacing sockets Run kenv hw mlx4 en config prios for _rl_rings lt priority gt lt priority gt lt priority gt For example kenv hw mlx4 en config prios for rl rings 0 1 2 When no priorities are being selected the default priority supported with packet pacing sockets would be priority zero 3 1 2 2 Setting Rate
7. the kernel environment variable before loading the modules d kenv rate limit debug 1 All rate limited rings statistics will be available via Sysctl as debug option is enabled e Rate limit value e Packets Bytes i sysctl a grep tx ring hw mlxen0 stat tx ring8 rate limit val 409600 Mellanox Technologies 20 Rev 2 1 6 Features Overview and Configuration hw mlxen0 stat tx ring8 packets 2284 hw mlxen0 stat tx ring8 bytes 3177592 Running sysctl in order to get more rate limit information without enabling the debug option Show ring ID qp number and rate of currently running rings in a csv format in order to dump to excel sysctl hw mlxen0 conf dump rate limit rings 1 Show ring ID qp number and rate of currently running rings in a table format sysctl hw mlxen0 conf dump rate limit rings 2 3 1 2 5 Limitations Maxrate limited rings is 45 000 Minrate 250 Kbps Maxrate 50 Mbps gt sysctl a grep rate limit caps Sys device mlx4 core0 rate limit caps min value 250 Kbps Sys device mlx4 core0 rate limit caps max value 50 Mbps Different rate limits to be configured per NIC port 120 divided by the number of priorities See Section 3 1 2 1 3 1 3 EEPROM Cable Module Information Reader In order to read the cable EEPROM info e Read the cable information by enabling the following sysctl parameter sysctl hw mlxen lt X gt conf eeprom_info 1 Example hw mlxen0 conf eeprom info 0 gt 0
8. 50000 hw mlxenl conf pkt rate low 400000 hw mlxenl conf pkt rate high 450000 Sysctl a grep rx usecs hw mlxen0 conf rx_usecs low 0 hw mlxen0 conf rx usecs high 128 hw mlxenl conf rx_usecs low 0 hw mlxenl conf rx usecs high 128 15 Mellanox Technologies Rev 2 1 6 Above an upper limit of packet rate adaptive moderation will set the moderation time to its highest value Below a lower limit of packet rate the moderation time will be set to its lowest value To query pause frame settings gt ifconfig m mlxen lt x gt Note lt x gt is the OS assigned interface number gt To set pause frame settings gt ifconfig m mlxen lt x gt mediaopt rxpause txpause Note lt x gt is the OS assigned interface number gt To query ring size values gt sysctl a grep size Example gt sysctl a grep size hw mlxen0 conf rx size 1024 hw mlxen0 conf tx size 1024 hw mlxenl conf rx size 1024 hw mlxenl conf tx size 1024 To modify rings size gt sysctl hw mlxen x conf rx size N gt sysctl hw mlxen x conf tx size N Note x is the OS assigned interface number To obtain additional device statistics gt sysctl a grep mlx grep stat The driver defaults to the following parameters e Both ports are activated i e a net device is created for each port The number of Rx rings for each port is the nearest power of 2 of number of CPU cores limited by 16 LRO is
9. AMN Mellanox TECHNOLOGIES Connect Accelerate Outperform Mellanox OFED for FreeBSD User Manual Rev 2 1 6 www mellanox com Rev 2 1 6 NOTE THIS HARDWARE SOFTWARE OR TEST SUITE PRODUCT PRODUCT S AND ITS RELATED DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES AS IS WITH ALL FAULTS OF ANY KIND AND SOLELY FOR THE PURPOSE OF AIDING THE CUSTOMER IN TESTING APPLICATIONS THAT USE THE PRODUCTS IN DESIGNATED SOLUTIONS THE CUSTOMER S MANUFACTURING TEST ENVIRONMENT HAS NOT MET THE STANDARDS SET BY MELLANOX TECHNOLOGIES TO FULLY QUALIFY THE PRODUCT S AND OR THE SYSTEM USING IT THEREFORE MELLANOX TECHNOLOGIES CANNOT AND DOES NOT GUARANTEE OR WARRANT THAT THE PRODUCTS WILL OPERATE WITH THE HIGHEST QUALITY ANY EXPRESS OR IMPLIED WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT ARE DISCLAIMED IN NO EVENT SHALL MELLANOX BE LIABLE TO CUSTOMER OR ANY THIRD PARTIES FOR ANY DIRECT INDIRECT SPECIAL EXEMPLARY OR CONSEQUENTIAL DAMAGES OF ANY KIND INCLUDING BUT NOT LIMITED TO PAYMENT FOR PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY FROM THE USE OF THE PRODUCT S AND RELATED DOCUMENTATION EVEN IF ADVISED OF THE POSSIBILITY OF SUC
10. FreeBSD Package Contents s s sss cossor res 9 1 1 1 Tarball Packages ue ce OLEO RUE CE SEDE OH AEA BE 9 BSEC Eg is as Plaka he aed acs TS PRE GA hed SEEN SOs ad oe ld aS 9 Chapter 2 Installation 2 0 35 osc8c40sceek sd ob tee ben ERR CREE CER ER LL 2 1 Software Dependencies 0 0 cece ccc eee ee 11 2 2 Downloading Mellanox Driver for FreeBSD 004 11 2 3 Installing Mellanox Driver for FreeBSD eese 11 2 4 Firmware Programming 0 0s cece n 13 2 4 1 Installing Firmware Tools 13 2 4 3 Downloading Firmware esseresreresterserterreresr ere sr erste ret 14 2 4 3 Updating Firmware Using flint lees 14 2 4 4 Updating Firmware Using mlxburn mlx file sssssssesrers reser rer rr ra 14 2 5 Driver Usage and Configuration sesseesseeeeerrerresr rss ers ess ee 14 Chapter 3 Features Overview and Configuration ee 17 3 1 Bthernet Network cies mien io sep eR ete Sue 17 3 1 1 RDMA over Converged Ethernet ROCE ssseoseseseereree re eee eee ee 17 3 1 2 PacketP cmg Lee RRLIS EE RES Ree Eie REM ERE 18 3 1 3 EEPROM Cable Module Information Reader sssseseeresrrrr ere eee 21 Chapter 4 Performance Tuning essessecsssesssssscscscereorocs nn nnn 22 4 1 Interrupt Moderation 0 cece rr rr rr eens 22 4 2 Tuning for NUMA Architecture ssssesseesereeerrerr ers rer ser ears 22 4 2 1 Single NUMA Architecture
11. H DAMAGE Mellanox TECHNOLOGIES Mellanox Technologies 350 Oakmead Parkway Suite 100 Sunnyvale CA 94085 U S A www mellanox com Tel 408 970 3400 Fax 408 970 3403 Copyright 2015 Mellanox Technologies All Rights Reserved Mellanox Mellanox logo BridgeX ConnectX Connect IB CoolBox CORE Direct GPUDirectG InfiniBridge InfiniHost InfiniScale Kotura Kotura logo MetroX MLNX OS PhyX ScalableHPC SwitchX TestX UFM Virtual Protocol Interconnect Voltaire and Voltaire logo are registered trademarks of Mellanox Technologies Ltd CyPU ExtendX FabricIT FPGADirect HPC X Mellanox Care Mellanox CloudX Mellanox Open Ethernet Mellanox PeerDirect Mellanox Virtual Modular Switch MetroDX NVMeDirect StPU Switch IBTM Unbreakable Link are trademarks of Mellanox Technologies Ltd All other trademarks are property of their respective owners 2 Mellanox Technologies Document Number 2950 Rev 2 1 6 Table of Contents Table of Contents ides 6 ce ea Vue d eonk e E RUAEEEEENOS SEN ME IA E d Ex Rx Vie o Last OF Tables 5a est t uy ER RENAE xS even WE E Maece d ER RR eae ee Document Revision History ceeeeeeeeeeeee ee ee ee eese e D About this Manual 4454522 vies Ros Sus Se ete eis bene lec CEPS CENE E Vos een es Chapter E Overview 5 3605 we v Bask SARs ew PERO EE ee ee EE NER E VP E d 1 1 Mellanox OFED for
12. The following example shows a system with an installed Mellanox HCA pciconf lv grep Mellanox C 3 mlx4_core0 pci0 7 0 0 class 0x028000 card 0x000615b3 chip 0x100315b3 rev 0x00 hdr 0x00 vendor Mellanox Technologies device MT27500 Family ConnectX 3 class network 2 Download the tarball image to your host The image name has the format MLNX_OFED_FreeBSD lt ver gt tgz You can download it from http www mellanox com gt Products gt Software gt Ethernet Drivers gt FreeBSD or http www mellanox com gt Products gt Software gt InfiniBand VPI drivers gt FreeBSD 3 Use the md5sum utility to confirm the file integrity of your tarball image 2 3 Installing Mellanox Driver for FreeBSD Prior to installing the driver please re compile and install the kernel with NO OFED options devices enabled Aul Extract the tarball 2 Compile and load needed modules in the following order of dependencies Ethernet Driver mlx4 core a Go to the mlx4 directory Run cd modules mlx4 b Clean any previous dependencies Run make m SHEAD share mk SYSDIR S HEAD sys clean cleandepend 11 Mellanox Technologies Rev 2 1 6 c Compile the mlx4 core module Run make m SHEAD share mk SYSDIR SHEAD sys Note For packet pacing support add CONFIG RATELIMIT yes This option has a kernel patch dependency d Install the mlx4 core module Run make m SHEAD share mk SYSDIR SHEAD sys install e Lo
13. ad the mlx4 core module Run kldload mlx4 mlxen a Go to the mlxen directory Run cd modules mlxen b Clean any previous dependencies Run make m SHEAD share mk SYSDIR HEAD sys clean cleandepend c Compile the mlxen module Run make m SHEAD share mk SYSDIR SHEAD sys Note For packet pacing support add CONFIG RATELIMIT yes This option has a kernel patch dependency d Install the mlxen module Run make m SHEAD share mk SYSDIR S HEAD sys install e Load the mlxen module Run kldload mlxen nfiniBand Driver mlx4 core Run the same steps as specified for Ethernet driver above ib core a Go to the ib core directory Run cd modules ibcore b Clean any previous dependencies Run make m SHEAD share mk SYSDIR S HEAD sys clean cleandepend c Compile the ib core module Run make m SHEAD share mk SYSDIR S HEAD sys d Install the ib core module Run make m SHEAD share mk SYSDIR S HEAD sys install e Load the ib core module Run kldload ibcore mlx4 ib a Go to the mlx4 ib directory Run cd modules mlx4ib Mellanox Technologies 12 Rev 2 1 6 Installation b Clean any previous dependencies Run make m SHEAD share mk SYSDIR SHEAD sys clean cleandepend c Compile the mlx4 ib module Run make m SHEAD share mk SYSDIR SHEAD sys d Install the mlx4 ib module Run make m SHEAD share mk SYSDIR SHEAD sys install e Load the mlx4 ib m
14. are and hardware of VPI InfiniBand Ethernet adapter cards It is also intended for application developers Mellanox Technologies 6 Rev 2 1 6 Common Abbreviations and Acronyms Table 2 Abbreviations and Acronyms pue Whole Word Description B Capital B is used to indicate size in bytes or multiples of bytes e g IKB 1024 bytes and IMB 1048576 bytes b Small b is used to indicate size in bits or multiples of bits e g 1Kb 1024 bits FW Firmware HCA Host Channel Adapter HW Hardware IB InfiniBand LSB Least significant byte Isb Least significant bit MSB Most significant byte msb Most significant bit NIC Network Interface Card SW Software VPI Virtual Protocol Interconnect PFC Priority Flow Control PR Path Record RDS Reliable Datagram Sockets RoCE RDMA over Converged Ethernet SL Service Level QoS Quality of Service ULP Upper Level Protocol VL Virtual Lane Glossary The following is a list of concepts and terms related to InfiniBand in general and to Subnet Man agers in particular It is included here for ease of reference but the main reference remains the 7 Mellanox Technologies Rev 2 1 6 InfiniBand Architecture Specification Table 3 Glossary Channel Adapter CA An IB device that terminates an IB link and executes transport func Host Channel Adapter tions This m
15. ay be an HCA Host CA or a TCA Target CA HCA HCA Card A network adapter card based on an InfiniBand channel adapter device IB Devices Integrated circuit implementing InfiniBand compliant communica tion In Band A term assigned to administration activities traversing the IB connec tivity only Local Port The IB port of the HCA through which IBDIAG tools connect to the IB fabric Master Subnet Manager The Subnet Manager that is authoritative that has the reference con figuration information for the subnet See Subnet Manager Multicast Forwarding Tables A table that exists in every switch providing the list of ports to for ward received multicast packet The table is organized by MLID Network Interface Card NIC A network adapter card that plugs into the PCI Express slot and pro vides one or more ports to an Ethernet network Unicast Linear For warding Tables LFT A table that exists in every switch providing the port through which packets should be sent to each LID Virtual Protocol Inter connet VPI A Mellanox Technologies technology that allows Mellanox channel adapter devices ConnectX to simultaneously connect to an Infini Band subnet and a 10GigE subnet each subnet connects to one of the adpater ports Related Documentation Table 4 Reference Documents Document Name Description Release 1 2 1 InfiniBand Architecture Specification Vol 1 T
16. card s PSID flint d pci q 3 Download the desired firmware from the Mellanox website http www mellanox com page firmware download 2 4 3 Updating Firmware Using flint 1 Unzip the firmware binary file Sflint d pci i lt img bin gt b 2 Reboot the server 2 4 4 Updating Firmware Using mlxburn mlx file 1 Extract the relevant firmware package ConnectX3 ConnectX3 Pro 2 From the extracted directory run mlxburn d NIC s pci slot fw mlx file conf dir list Example mlxburn d NIC s pci slot fw fw ConnectX3 rel mlx conf dir list 3 Reboot the server 2 5 Driver Usage and Configuration gt To assign an IP address to the interface gt ifconfig mlxen lt x gt ip Note lt x gt is the OS assigned interface number gt To check driver and device information gt pciconf lv grep mlx g gt lelliume ecl peiewzsyez gt GI Example gt pciconf lv grep mlx mlx4 core080pci0 7 0 0 class 0x028000 card 0x003715b3 chip 0x100315b3 rev 0x00 hdr 0x00 gt flint d pci0 7 0 0 q Image type FS2 FW Version 214301 WAS Device ID 4099 Description Node Porti Port2 Sys image GUIDs 0002c903002ffcc0 0002c903002ffcc1 0002c903002ffcc2 0002c903002ffcc3 MACS 0002c92ffcc0 0002c92ffccl1 VSD PSID MT 1020120019 Mellanox Technologies 14 Rev 2 1 6 Installation gt To check driver version gt dmesg Example gt dmesg mlx4_core Mellanox ConnectX core driver v2
17. che level 2 gt cpu count 12 mask fff 0 1 2 3 4 5 6 7 8 9 10 11 lt cpu gt group level 2 cache level 2 gt cpu count 12 mask f 000 gt 12 13 14 15 16 17 18 19 20 21 22 23 lt cpu gt Note Each list of cores refers to a different NUMA 23 Mellanox Technologies Rev 2 1 6 2 Tune Mellanox NICs to work on desirable cores a Pin both interrupts and application processes to the relevant cores b Find the closest NUMA to the NIC c Find the NIC s device name by its PCI location sysctl a grep pci4 dev mlx4 core 0 parent pci4 This means the NIC on PCI number 4 has a logic device called mlx4_core0 d Find the device interrupts vmstat ia grep mlx4 core0 awk print 1 sed s irg sed s 338 339 340 e Bind each interrupt to a core from the closest NUMA cores list Note It is best to avoid core number 0 cpuset x 338 l 1 Cowsec x 999 Sh 2 cpuset x 340 1 3 f Bind the application to the closest NUMA cores list Note It is best to avoid core number 0 cpuset l 1 11 app name sever flag cpuset l 1 11 app name client flag IP For best performance change CPU s BIOS configuration to performance mode hax Mellanox Technologies 24
18. d of bits 19 Mellanox Technologies Rev 2 1 6 A rate limited ring corresponding to the requested rate will be created and associated to the relevant socket e Rate limited traffic will be transmitted when data is sent via the socket 2 Modify the rate limited value using the same socket 3 Destroy the relevant ring upon TCP socket completion 3 1 2 3 1Error Detection Detecting failures can be done using the getsockopt interface to query a specific socket getsockopt s SOL SOCKET SO MAX PACING RATE pacing rate amp size socket pacing rate Holds the rate limit value associated with the socket If the value of pacing rate 0 it means that there was an error and the rate limit could not be set on the socket and the Socket is not rate limited For a more detailed message look at the output of dmesg To see if the socket is rate limited data must be sent using the socket Only then will the getsock 3 opt returned value be valid 3 1 2 4 Feature Characteristics MLNX OFED for FreeBSD supports up to 45 000 rate limited TCP connections Each TCP connection is mapped to a specific QP Userinterface Rate limited tx ring amount gt sysctl a grep rate limit tx ring hw mlxen0 conf rate limit tx rings 0 hw mlxenl conf rate limit tx rings 0 Native tx ring amount gt sysctl a grep native tx ring hw mlxen0 conf native tx rings 8 hw mlxenl conf native tx rings 8 Debugging e Setting
19. enabled with 32 concurrent sessions per Rx ring Mellanox Technologies 16 Rev 2 1 6 Features Overview and Configuration 3 Features Overview and Configuration 3 1 Ethernet Network 3 1 1 RDMA over Converged Ethernet RoCE RoCE allows InfiniBand IB transport applications to work over an Ethernet network RoCE encapsulates the InfiniBand transport and the GRH headers in Ethernet packets bearing a dedi cated ether type 0x8915 Thus any VERB application that works in an InfiniBand fabric can also work in an Ethernet fabric RoCE is enabled only for drivers that support VPI currently only mlx4 When working with RDMA applications over Ethernet link layer the following points should be noted The presence of a Subnet Manager SM is not required in the fabric Thus operations that require communication with the SM are managed in a different way in RoCE This does not affect the API Since the SM is not present querying a path is impossible Therefore the path record structure must be filled with the relevant values before establishing a connection It is recommended working with RDMA CM to establish a connection as it takes care of filling the path record structure Since LID is a layer 2 attribute of the InfiniBand protocol stack it is not set for a port and is displayed as zero when querying the port With RoCE APM is not supported The GID table for each port is populated with N 1 entries where N is the
20. he InfiniBand Architecture Specification that is pro vided by IBTA Support and Updates Webpage Please visit http www mellanox com gt Products gt Software gt Ethernet Drivers gt FreeBSD Driv ers for downloads FAQ troubleshooting future updates to this manual etc Mellanox Technologies 8 Rev 2 1 6 Overview 1 1 1 1 1 2 1 Overview This document provides information on the Mellanox driver for FreeBSD and instructions for installing the driver on Mellanox ConnectX adapter cards supporting the following uplinks to Servers e ConnectXG 3 ConnectX 9 3 Pro e InfiniBand QDR FDR10 FDR Ethernet 10GigE 40GigE The driver release introduces the following capabilities e Single Dual port e Up to 16 Rx queues per port e Up to 32 Tx queues per port according to number of CPUs e MSI X or INTx e Adaptive interrupt moderation Hardware Tx Rx checksum calculation Large Send Offload i e TCP Segmentation Offload Large Receive Offload e VLAN Tx Rx acceleration Hardware VLAN stripping insertion Net device statistics Mellanox OFED for FreeBSD Package Contents Tarball Package Mellanox OFED for FreeBSD package includes the following directories modules contains the relevant Makefiles e ofed source code mlx4 driver mlx4 is the low level driver implementation for the ConnectX adapters designed by Mellanox Technologies The ConnectX can operate as an InfiniBand adapter and as
21. odule Run kldload mlx4ib ipoib a Go to the ipoib directory Run cd modules ipoib b Clean any previous dependencies Run make m SHEAD share mk SYSDIR SHEAD sys clean cleandepend c Compile the ipoib module Run make m SHEAD share mk SYSDIR SHEAD sys d Install the ipoib module Run make m SHEAD share mk SYSDIR SHEAD sys install e Load the ipoib module Run kldload ipoib To load a module on reboot add mlx4_load YES mlxen_load YES ibcore_load YES 3 mlx4ib_load YES ipoib YES to the boot loader conf file create if does not exist Run Kkldstat in order to verify which modules are loaded on your server y 2 4 Firmware Programming The adapter card was shipped with the most current firmware available This section is intended for future firmware upgrades and provides instructions for 1 installing Mellanox firmware update tools MFT 2 downloading FW and 3 updating adapter card firmware 2 4 4 Installing Firmware Tools Step 1 Download the current Mellanox Firmware Tools package MFT from www mellanox com Products Adapter IB VPI SW Firmware Tools The tools package to download is MFT SW for FreeBSD tarball name is mft X X X tgz Step 2 Extract the tarball and run the installation script 13 Mellanox Technologies Rev 2 1 6 2 4 2 Downloading Firmware 1 Retrieve device s PCI slot i e pci0 x 0 0 Run mst status 2 Verify your
22. rx coal 0 Note x is the OS assigned interface number Step2 Turn ON the interruption moderation Run gt sysctl hw mlxen lt xa gt conf adaptive rx coal 1 Step 3 Set the threshold values for packet rate limits and for moderation time Run gt sysctl hw mlxen lt xa gt conf pkt_rate_low N gt sysctl hw mlxen lt xa gt conf rx_usecs_low N gt sysctl hw mlxen xa conf pkt rate high N gt sysctl hw mlxen x conf rx usecs high N Above an upper limit of packet rate adaptive moderation will set the moderation time to its high 5 est value Below a lower limit of packet rate the moderation time will be set to its lowest value Ah 4 2 Tuning for NUMA Architecture 4 2 1 Single NUMA Architecture When using a server with single NUMA no tuning is required Also make sure to avoid using core number 0 for interrupts and applications Find a CPU list sysctl a grep group level 2 A 1 group level 2 cache level 2 gt eio lower 12 waiecUmrUs X 2 9 4 5 OG Tr By 9 10 11 cpu 2 Tune Mellanox NICS to work on desirable cores a Find the NIC s PCI location pciconf lv grep mlx mlx4 core00pci0 2 0 0 class 0x028000 card 0x000315b3 chip 0x100715b3 rev 0x00 hdr 0x00 b Find the NIC s device name by its PCI location sysctl a grep pci2 dev mlx4 core 0 parent pci2 Mellanox Technologies 22 Rev 2 1 6 Performance Tuning This means the NIC on PCI number 2 has a
23. s for Packet Pacing Rates that are being used with packet pacing must be defined in advance New Rates Configuration Newly configured rates must be within a certain range determined by the firmware and they can be read through sysctl Fora minimum value run sysctl sys device DEVICE NAME rate limit caps min value Foramaximum value run sysctl sys device DEVICE NAME rate limit caps max value The number of configured rates is also determined by the firmware In order to check how many rates can be defined run sysctl hw INTERFACE NAME conf num rates e A new rate can be added by any the following methods e Add arate per index from index 1 to num rates sysctl hw INTERFACE NAME conf rate limit 1 400000 sysctl hw INTERFACE NAME conf rate limit 120 500000 In order to read which rate was defined for a specific index for example Index 1 run sysctl hw INTERFACE NAME gt conf rate limit 1 Each index can be defined with a rate only once e Add a rate in an unknown index sysctl hw INTERFACE NAME conf add rate 600000 Mellanox Technologies 18 Rev 2 1 6 Features Overview and Configuration This will add the defined rate to the next available index If all rates were already defined with an index the new rate will not be added Rates are determined and then saved in bits per second Rates requested for a new socket are added in bytes per second Limitation Ra
24. te values must be multiples of 1000 There are two burst levels that can be defined for each index e Burst low Burst capacity is limited to a lower range to allow better pacing Burst high Burst capacity is limited to a higher range to allow better bandwidth For changing the burst level per index run sysctl hw INTERFACE NAME conf burst size l burst high burst low In order to read which burst level was defined for an index for example index 1 run sysctl hw INTERFACE NAME conf burst size 1 To display the packet pacing configuration run sysctl hw INTERFACE NAME gt conf rate limit show hw INTERFACE NAME gt conf rate limit show INDEX CURRENTLY USED BURST RATE bit s 1 0 HIGH 400 000 2 0 LOW 500 000 3 0 LOW 0 4 0 LOW 0 where Index Rate index Currently used number of rings which are currently running configured with the index s rate Burst Burst level configured for the index s rate Rate Rate configured for the relevant index All rates are shown in bits per second Ae 3 1 2 3 Using Packet Pacing Sockets 1 Create a rate limited socket according to the desired rate using the setsockopt interface based on the previous section setsockopt s SOL SOCKET SO MAX PACING RATE pacing rate sizeof pacing rate SO MAX PACING RATE Marks the socket as a rate limited socket pacing rate Defined rate in bytes sec The type is unsigned int Note The same value entered via sysctl in bytes instea

Download Pdf Manuals

image

Related Search

Related Contents

取扱説明書  - Eurotherm by Schneider Electric  PDF簡易取扱説明書  Model 34788-Series Recovery, Recycling, and Recharging Unit    HealthRider H500i User's Manual  Etiquette - Plant Products  Manual de Instruções      

Copyright © All rights reserved.
Failed to retrieve file