Home
MLNX_EN for Linux User Manual
Contents
1. Note for receive side time stamping currently only HWTSTAMP FILTER NONE and HWTSTAMP FILTER ALL are supported 5 2 2 Getting Time Stamping Once time stamping is enabled time stamp 1s placed in the socket Ancillary data recvmsg can be used to get this control message for regular incoming packets For send time stamps the outgo ing packet is looped back to the socket s error queue with the send time stamp s attached It can be received with recvmsg flags MSG ERRQUEUE The call returns the original outgoing packet data including all headers preprended down to and including the link layer the scm timestamping control message and a sock extended err control message with ee errno ENOMSG and ee origin SO EE ORIGIN TIMESTAMPING A socket with such a pending bounced packet is ready for reading as far as select is concerned If the outgoing Mellanox Technologies 25 J Rev 2 2 1 0 1 Driver Features packet has to be fragmented then only the first fragment is time stamped and returned to the sending socket When time stamping is enabled VLAN stripping is disabled For more info please refer to Documentation networking timestamping txt in kernel org 5 3 Flow Steering Flow Steering is applicable to the mlx4 driver only p Flow steering is a new model which steers network flows based on flow specifications to specific QPs Those flows can be either unicast or multicast network flows In order to m
2. mlx4 en Handles Ethernet specific functions and plugs into the netdev mid layer mstflint An application to burn a firmware binary image Software modules Sources of all software modules under conditions mentioned in the modules LICENSE files 10 Mellanox Technologies Rev 2 2 1 0 1 Table 5 MLNX_EN Package Content Components Description Documentation Release Notes README Mellanox Technologies 11 J Rev 2 2 1 0 1 Driver Installation 2 2 1 2 2 Driver Installation Software Dependencies To install the driver software kernel sources must be installed on the machine MLNX EN driver cannot coexist with OFED software on the same machine Hence when installing MLNX_EN all OFED packages should be removed done by the minx en install script Installing the Driver Step 1 Download Driver Package from the Mellanox site http www mellanox com content pages php pg products dyn amp product family 27 amp menu section 35 Step 2 Install Driver gt tar xzvf mlnx en 2 0 3 0 0 tgz file gt cd ming en 2 0 3 0 0 gt install sh gt To install minx en 2 0 3 0 0 on XenServer6 1 rpm ihv RPMS xenserver6ul 1386 uname r mlnx en rpm The package consists of several source RPMs The install script rebuilds the source RPMs and then installs the created binary RPMs The created kernel module binaries are located at For KMP RPMs installation OnSLES mellanox
3. 5 4 6 2 Additional Ethernet VF Configuration Options Guest MAC configuration By default guest MAC addresses are configured to be all zeroes Inthe MLNX EN guest driver if a guest sees a zero MAC it generates a random MAC address for itself If the administrator wishes 38 Mellanox Technologies Rev 2 2 1 0 1 the guest to always start up with the same MAC he she should configure guest MACs before the guest driver comes up The guest MAC may be configured by using ip link set dev lt PF device gt vf lt NUM gt mac lt LLADDR gt For legacy guests which do not generate random MACs the adminstrator should always configure their MAC addresses via ip link as above e Spoof checking Spoof checking is currently available only on upstream kernels newer than 3 1 ip link set dev lt PF device gt vf lt NUM gt spoofchk on off 5 4 6 3 Mapping VFs to Ports using the minx get vfs pl Tool gt To map the PCI representation in BDF to the respective ports mlnx_get_vfs pl The output is as following BDF 0000 04 00 0 soyar lg 2 v 0 0000 04 00 1 v l 0000 04 00 2 Ponte 2 v 2 0000 04 00 3 vf3 0000 04 00 4 Both 1 vf4 0000 04 00 5 5 5 Ethtool ethtool is a standard Linux utility for controlling network drivers and hardware particularly for wired Ethernet devices It can be used to Get identification and diagnostic information Get extended device statistics e Control speed duplex autonegotiation and
4. lt N gt ethtool S eth lt x gt Obtains additional device statistics ethtool t eth lt x gt Performs a self diagnostics test ethtool s eth lt x gt msgivl N Changes the current driver message level ethtool T eth lt x gt Shows time stamping capabilities ethtool 1 eth lt x gt Shows the number of channels ethtool L eth lt x gt rx lt N gt tx Sets the number of channels lt N gt ethtool show priv flags eth lt x gt Shows driver private flags and their states on off Private flags are pm qos request low latency e mlx4 rss xor hash function gen disable 32 14 4 e ethtool set priv flags eth lt x gt Enables disables driver feature matching the given private priv flag lt on off gt flag 5 6 Ethernet VXLAN 5 6 1 Prerequisites e HCA ConnectX 3 Pro Firmware 2 31 5020 or higher RHEL7 Ubuntu 14 04 or upstream kernel 3 12 10 or higher e DMFS enabled Mellanox Technologies 41 J Rev 2 2 1 0 1 Driver Features 5 6 2 Enabling VXLAN To enable the vxlan offloads support load the mix4 core driver with Device Managed Flow steering DMFS enabled Best would be to create an etc modprobe d m1x4 conf file with the following contents enable DMFS enable VXLAN offloads on CX3 pro options mlx4 core log num mgm entry size 1 To verify VXLAN is enabled verify the Ethernet net device created by the mlx4 en driver advertises the NETIF F GSO UDP TUNNEL feature which can
5. PR Path Record RDS Reliable Datagram Sockets RoCE RDMA over Converged Ethernet SL Service Level QoS Quality of Service Mellanox Technologies 7 J Rev 2 2 1 0 1 Table 2 Abbreviations and Acronyms Sheet 2 of 2 Abbreviation Acronym Whole Word Description ULP Upper Level Protocol VL Virtual Lane Glossary The following is a list of concepts and terms related to InfiniBand in general and to Subnet Man agers in particular It is included here for ease of reference but the main reference remains the InfiniBand Architecture Specification Table 3 Glossary Channel Adapter An IB device that terminates an IB link and executes transport CA Host Channel functions This may be an HCA Host CA or a TCA Target Adapter HCA CA HCA Card A network adapter card based on an InfiniBand channel adapter device IB Devices Integrated circuit implementing InfiniBand compliant commu nication In Band A term assigned to administration activities traversing the IB connectivity only Local Port The IB port of the HCA through which IBDIAG tools connect to the IB fabric Master Subnet Man ager The Subnet Manager that is authoritative that has the refer ence configuration information for the subnet See Subnet Manager Multicast Forward ing Tables A table that exists in every switch providing the list of ports to forward received multicast packet The table
6. be used m1nx qos gets a list of a mapping between UPs to TCs For example minx qos ietho p 0 0 0 0 1 1 1 1 maps UPs 0 3 to Tco and Ups 4 7 to Tc1 5 1 4 Quality of Service Properties The different QoS properties that can be assigned to a TC are Strict Priority see Strict Priority e Minimal Bandwidth Guarantee ETS see Minimal Bandwidth Guarantee ETS Rate Limit see Rate Limit 5 1 4 1 Strict Priority When setting a TC s transmission algorithm to be strict then this TC has absolute strict prior ity over other TC strict priorities coming before it as determined by the TC number TC 7 is highest priority TC 0 is lowest It also has an absolute priority over non strict TCs ETS 18 Mellanox Technologies Rev 2 2 1 0 1 This property needs to be used with care as it may easily cause starvation of other TCs A higher strict priority TC is always given the first chance to transmit Only if the highest strict priority TC has nothing more to transmit will the next highest TC be considered Non strict priority TCs will be considered last to transmit This property is extremely useful for low latency low bandwidth traffic Traffic that needs to get immediate service when it exists but is not of high volume to starve other transmitters in the sys tem 5 1 4 2 Minimal Bandwidth Guarantee ETS After servicing the strict priority TCs the amount of bandwidth BW left on the wire may be split among o
7. either a value in a triplet or a single value should be less than or equal to the respective value of num vfs parameter The example above loads the driver with 5 VFs num vfs The standard use of a VF is a single VF per a single VM However the number of VFs varies upon the working mode requirements Step 9 Reboot the server If the SR IOV is not supported by the server the machine might not come out of boot load Ah Step 10 Load the driver and verify the SR IOV is supported Run lspci grep Mellanox 03 00 0 InfiniBand Mellanox Technologies MT26428 ConnectX VPI PCIe 2 0 5GT s IB QDR 10GigE rev b0 03 00 1 InfiniBand Mellanox Technologies MT27500 Family ConnectX 3 Virtual Function rev b0 03 00 2 InfiniBand Mellanox Technologies MT27500 Family ConnectX 3 Virtual Function rev b0 03 00 3 InfiniBand Mellanox Technologies MT27500 Family ConnectX 3 Virtual Function rev b0 03 00 4 InfiniBand Mellanox Technologies MT27500 Family ConnectX 3 Virtual Function rev b0 03 00 5 InfiniBand Mellanox Technologies MT27500 Family ConnectX 3 Virtual Function rev b0 Where e 03 00 represents the Physical Function 03 00 X represents the Virtual Function connected to the Physical Function 34 Mellanox Technologies Rev 2 2 1 0 1 5 4 3 Enabling SR IOV and Para Virtualization on the Same Setup gt To enable SR IOV and Para Virtualization on the same setup Step 1 Create a bridge
8. modprobe mlx4 en The result is a new net device appearing in the ifconfig a output For details on driver usage and configuration please refer to Section 3 Ethernet Driver Usage and Configuration on page 14 On Ubuntu OS the minx en service is responsible for loading the mlx4 en driver upon boot 2 4 Unloading the Driver gt To unload the Ethernet driver gt modprobe r mlx4 en 2 5 Uninstalling the Driver gt To uninstall the mInx en driver gt sbin mlnx en uninstall sh Mellanox Technologies 13 J Rev 2 2 1 0 1 Ethernet Driver Usage and Configuration 3 X Ethernet Driver Usage and Configuration To assign an IP address to the interface gt ifconfig eth lt x gt ip a x is the OS assigned interface number gt To check driver and device information gt ethtool i eth lt x gt Example gt ethtool i eth2 driver mlx4 en version 2 1 8 Oct 06 2013 firmware version 2 30 3110 bus info 0000 1a 00 0 gt To query stateless offload status gt ethtool k eth lt x gt gt To set stateless offload status gt ethtool K eth lt x gt rx on off tx on off sg on off tso on off lro on off To query interrupt coalescing settings gt ethtool c eth x To enable disable adaptive interrupt moderation gt ethtool C eth x adaptive rx on off By default the driver uses adaptive interrupt moderation for the receive path which adjusts
9. 1 skprio 2 tos 8 skprio 3 skprio 4 tos 24 skprio 5 skprio 6 tos 16 skprio 7 skprio 8 skprio 9 skprio 10 skprio 11 skprio 12 skprio 13 skprio 14 skprio 15 wiga 7 tc 1 ratelimit 4 Gbps tsa ets bw 70 tos I we 2 up 3 tc 2 ratelimit 2 Gbps tsa strict up 4 wigg 5 up 6 5 1 5 2 tcandtc wrap py The tc tool is used to setup sk prio to UP mapping using the mgprio queue discipline In kernels that do not support mgprio such as 2 6 34 an alternate mapping is created in sysfs The tc wrap py tool will use either the sysfs or the tc tool to configure the sk prio to UP mapping Usage tc wrap py i interface options Options version show program s version number and exit h help show this help message and exit u SKPRIO UP skprio up SKPRIO UP maps sk prio to UP LIST is lt 16 comma separated UP index of element is sk prio i INTF interface INTF Interface name 22 Mellanox Technologies Rev 2 2 1 0 1 Example set skprio 0 2 to UPO and skprio 3 7 to UP1 on eth4 UP 0 skprio 0 skprio 1 skprio 2 tos 8 skprio 7 skprio 8 skprio 9 skprio 10 skprio 11 skprio 12 skprio 13 skprio 14 skprio 15 UE skprio 3 skprio 4 tos 24 skprio 5 skprio 6 tos 16 UP 2 UP 3 UP 4 w 5 UP 6 UP 5 1 5 3 Additional Tools tc tool compiled with the sch mgprio module is required to support kernel v2 6 32 or higher This is a part of
10. Pause where lt i gt is in the range 0 7 Counter Description IX pause prio lt i gt The total number of PAUSE frames received from the far end port rx pause duration prio i gt The total time in microseconds that far end port was reguested to pause transmission of packets rx pause transition prio lt i gt The number of receiver transitions from XON state paused to XOFF state non paused tx pause prio lt i gt The total number of PAUSE frames sent to the far end port tx pause duration prio lt i gt The total time in microseconds that transmission of packets has been paused 48 Mellanox Technologies Rev 2 2 1 0 1 Table 11 Port Pause where lt i gt is in the range 0 7 Counter Description tx_pause_transition_prio lt i gt The number of transmitter transitions from XON state paused to XOFF state non paused Table 12 VPort Statistics where lt i gt lt empty_string gt is the PF and ranges 1 NumOfVf per VF Counter Description vport lt i gt _rx_unicast_packet S Unicast packets received successfully vport lt i gt rx unicast bytes Unicast packet bytes received successfully vport lt i gt rx multicast pack ets Multicast packets received successfully vport lt i gt rx multicast byte S Multicast packet bytes received successfully vport lt i gt rx broadcast pac kets Broadc
11. be seen by the eththool K DEV grep udp For example ethtool k eth0 grep udp tnl tx udp tnl segmentation on Make sure that the VXLAN tunnel is set over UDP port 4789 which is the ConnectX firmware default If using standard Linux bridge and not open vswitch set the following etc mod probe d vxlan conf options vxlan udp port 4789 5 6 3 Important Notes VXLAN tunneling adds 50 bytes 14 eth 20 ip 8 udp 8 vxlan to the VM Ethernet frame Please verify that either the MTU of the NIC who sends the packets e g the VM virtio net NIC or the host side veth device or the uplink takes into account the tunneling overhead Meaning the MTU of the sending NIC has to be decremented by 50 bytes e g 1450 instead of 1500 or the uplink NIC MTU has to be incremented by 50 bytes e g 1550 instead of 1500 From upstream 3 15 rcl and onward it is possible to use arbitrary UDP port for VXLAN Note that this requires firmware version 2 31 2800 or higher Additionally you need to enable this kernel configuration option CONFIG MLX4 EN VXLAN y On upstream kernels 3 12 3 13 GRO with VXLAN is not supported 5 7 Quantized Congestion Control Congestion control is used to reduce packet drops in lossy environments and mitigate congestion spreading and resulting victim flows in lossless environments The Quantized Congestion Notification QCN IEEE standard 802 1Qau provides congestion control for long lived flows in lim
12. flow control for Ethernet devices e Control checksum offload and other hardware offload features Control DMA ring sizes and interrupt moderation Mellanox Technologies 39 J Rev 2 2 1 0 1 Driver Features The following are the ethtool supported options Table 7 ethtool Supported Options Options Description ethtool 1 eth lt x gt Checks driver and device information For example H gt ethtool i eth2 driver mlx4 en MT 0DD0120009 CX3 version 2 1 6 Aug 2013 2 30 3000 0000 1a 00 0 firmware version bus info ethtool k eth lt x gt Queries the stateless offload status ethtool K eth lt x gt rx onloff tx onjoff sg on off tso onloff Iro onjoff gro onloff gso onjoff Sets the stateless offload status TCP Segmentation Offload TSO Generic Segmentation Offload GSO increase outbound throughput by reducing CPU overhead It works by queuing up large buffers and letting the network interface card split them into separate packets Large Receive Offload LRO increases inbound through put of high bandwidth network connections by reducing CPU overhead It works by aggregating multiple incoming packets from a single stream into a larger buffer before they are passed higher up the networking stack thus reducing the number of packets that have to be processed LRO is available in kernel versions 3 1 for untagged traffic Note LRO will be done whenever possible Othe
13. is organized by MLID Network Interface Card NIC A network adapter card that plugs into the PCI Express slot and provides one or more ports to an Ethernet network Unicast Linear For A table that exists in every switch providing the port through warding Tables which packets should be sent to each LID LFT Virtual Protocol A Mellanox Technologies technology that allows Mellanox Interconnet VPI channel adapter devices ConnectX to simultaneously con nect to an InfiniBand subnet and a 10GigE subnet each subnet connects to one of the adpater ports 8 Mellanox Technologies Rev 2 2 1 0 1 Related Documentation Table 4 Reference Documents Document Name Description InfiniBand Architecture Specification The InfiniBand Architecture Specification that is Vol 1 Release 1 2 1 provided by IBTA IEEE Std 802 3ae 2002 Part 3 Carrier Sense Multiple Access with Colli Amendment to IEEE Std 802 3 2002 sion Detection CSMA CD Access Method and Document PDF SS94996 Physical Layer Specifications Amendment Media Access Control MAC Parameters Physical Layers and Management Parameters for 10 Gb s Operation Support and Updates Webpage Please visit http www mellanox com gt Products gt Ethernet Drivers gt Linux Drivers for down loads FAQ troubleshooting future updates to this manual etc Mellanox Technologies 9 J Rev 2 2 1 0 1 Overview
14. mlnx en kmp RPM 1ib modules kernel ver updates mellanox mlnx en On RHEL kmod mellanox mlnx en RPM lib modules kernel ver extra mellanox mlnx en Fornon KMP RPMs mlnx en RPM OnSLES 1ib modules kernel ver updates mlnx en On RHE 1ib modules kernel ver extra mlnx en mlnx en installer supports 2 modes of installation The install scripts selects the mode of driver installation depending of the running OS kernel version Kernel Module Packaging KMP mode where the source rpm is rebuilt for each installed flavor of the kernel This mode is used for RedHat and SUSE distributions NonKMP installation mode where the sources are rebuilt with the running kernel This mode is used for vanilla kernels P If the Vanilla kernel is installed as rpm please use the disable kmp flag when installing the driver 12 Mellanox Technologies Rev 2 2 1 0 1 The kernel module sources are placed under usr src mellanox mlnx en 2 0 gt To recompile the driver gt cd usr src mellanox mlnx en 2 0 gt scripts mlnx en patch sh gt make gt make install The uninstall and performance tuning scripts are installed If the driver was installed without kmp support the sources would be located under usr srs mlnx en 2 0 2 3 Loading the Driver Step 1 Make sure no previous driver version is currently loaded gt modprobe r mlx4 en Step 2 Load the new driver version gt
15. of rpg min dec fac according to priority use spaces between values and 1 for unknown values rpg min rate RPG MIN RATE LIST Set value of rpg min rate according to prior ity use spaces between values and 1 for unknown values cndd state machine CNDD STATE MACHINE LIST Set value of cndd state machine according to priority use spaces between values and 1 for unknown values To get QCN current configuration sorted by priority mlnx gen i eth2 g parameters gt To show OCN s statistics sorted by priority mlnx gcn i eth2 g statistics Example output when running mlnx gen i eth2 g parameters priority 0 rpg enable 0 rppp max rps 1000 rpg time reset 1464 rpg byte reset 150000 rpg threshold 5 rpg max rate 40000 rpg ai rate 10 rpg hai rate 50 rpg gd 8 rpg min dec fac 2 rpg min rate 10 cndd state machine 0 priority 1 rpg enable 0 rppp max rps 1000 rpg time reset 1464 rpg byte reset 150000 rpg threshold 5 rpg max rate 40000 rpg ai rate 10 rpg hai rate 50 rpg gd 8 rpg min dec fac 2 rpg min rate 10 cndd state machine 0 priority 7 rpg enable 0 rppp max rps 1000 rpg time reset 1464 rpg byte reset 150000 rpg threshold 5 44 Mellanox Technologies Rev 2 2 1 0 1 rpg_max_rate 40000 rpg ai rate 10 rpg hai rate 50 rpg gd 8 rpg min dec fac 2 rpg min rate 10 cndd state machine 0 5 7 2 Setting QCN Configuration Setting the QCN parameters r
16. ports on device This applies to all ConnectX HCAs on the host fits format is a string The string specifies the num v s parameter separately per installed HCA The string format is bb dd f v bb dd f v bb dd f bus device function of the PF of the HCA e v number of VFs to enable for that HCA which is either a single value or a triplet as described above For example num v s 5 The driver will enable 5 VFs on the HCA and this will be applied to all ConnectX HCAs on the host num vfs 00 04 0 5 00 07 0 8 The driver will enable 5 VFs on the HCA positioned in BDF 00 04 0 and 8 on the one in 00 07 0 num vfs 1 2 3 The driver will enable 1 VF on physical port 1 2 VFs on physical port 2 and 3 dual port VFs applies only to dual port HCA when all ports are Ethernet ports num vfs 00 04 0 5 6 7 00 07 0 8 9 10 The driver will enable HCA positioned in BDF 00 04 0 Ssingle VFs on port 1 e 6 single VFs on port 2 7 dual port VFs HCA positioned in BDF 00 07 0 Ssingle VFs on port 1 Osingle VFs on port 2 10 dual port VFs Applies when all ports are configure as Ethernet in dual port HCAs Notes e PFs not included in the above list will not have SR IOV enabled Triplets and single port VFs are only valid when all ports are configured as Ethernet When an InfiniBand port exists only num vfs a syntax is valid where a is a single value that represents the number of VFs The seco
17. this help message and exit p LIST prio tc LIST maps UPs to TCs LIST is 8 comma seperated TC numbers Example 0 0 0 0 1 1 1 1 maps UPs 0 3 to TCO and UPs 4 7 ro TCL s LIST tsa LIST Transmission algorithm for each TC LIST is comma seperated algorithm names for each TC Possible algorithms strict etc Example ets strict ets sets TCO TC2 to ETS and TC1 to strict The rest are unchanged t LIST tcbw LIST Set minimal guaranteed BW for ETS TCs LIST is comma seperated percents for each TC Values set to TCs that are not configured to ETS algorithm are ignored but must be present Example if TCO TC2 are set to ETS then 10 0 90 will set TCO to 10 and TC2 to 90 Percents must sum to 100 r LIST ratelimit LIST Rate limit for TCs in Gbps LIST is a comma seperated Gbps limit for each TC Example 1 8 8 will limit TCO to 1Gbps and TCI TC2 to 8 Gbps each i INTF interface INTF Interface name a Show all interface s TCs 20 Mellanox Technologies Rev 2 2 1 0 1 Get Current Configuration Set ratelimit 3Gbps for tc0 4Gbps for tc1 and 2Gbps for tc2 Mellanox Technologies 21 Rev 2 2 1 0 1 Driver Features Configure QoS map UP 0 7 to tc0 1 2 3 to tc1 and 4 5 6 to tc 2 set tc0 tc1 as ets and tc2 as strict divide ets 30 for tc0 and 70 for tc1 ml peg 1 Gil lt 5 Ges elas Selde 9 0 1 1 1 2 2 2 se 3 0 10 tc 0 ratelimit 3 Gbps tsa ets bw 30 van O skprio 0 skprio
18. up your BIOS The figures used in this section are for illustration purposes only For further information please refer to the appropriate BIOS User Manual Step 1 Enable SR IOV in the system BIOS LA BIOS SETUP UTILITY fiduanced di d Enabled Step 2 Enable Intel Virtualization Technology ring n Tech Step3 Install the hypervisor that supports SR IOV Step 4 Depending on your system update the boot grub grub conf file to include a similar command line load parameter for the Linux kernel Mellanox Technologies 29 Rev 2 2 1 0 1 Driver Features For example to Intel systems add default 0 timeout 5 splashimage hd0 0 grub splash xpm gz hiddenmenu title Red Hat Enterprise Linux Server 2 6 32 36 x86 645 root hd0 0 kernel vmlinuz 2 6 32 36 x86 64 ro root dev VolGroup00 LogVol00 rhgb quiet intel iommu on initrd initrd 2 6 32 36 x86 64 img a Please make sure the parameter intel jiommu on exists when updating the boot grub grub conf file otherwise SR IOV cannot be loaded Step 5 Install the MLNX EN driver for Linux that supports SR IOV SR IOV can be enabled and managed by using one of the following methods Bum firmware with SR IOV support where the number of virtual functions VFs will be set to 16 enable sriov Runthe mlxconfig tool and set the SRIOV EN parameter to 1 without re burning the firmware SRIOV EN 1 For further information please refer to sect
19. use the VFs will cause machine to hang Step 2 Run the script below Please be aware uninstalling the driver deletes the entire driver s file but does not unload the driver sbin mlnx en uninstall sh MLNX EN uninstall done Step3 Restart the server Ethernet Virtual Function Configuration when Running SR IOV VLAN Guest Tagging VGT and VLAN Switch Tagging VST When running ETH ports on VFs the ports may be configured to simply pass through packets as is from VFs Vlan Guest Tagging or the administrator may configure the Hypervisor to silently force packets to be associated with a VLan Qos Vlan Switch Tagging In the latter case untagged or priority tagged outgoing packets from the guest will have the VLAN tag inserted and incoming packets will have the VLAN tag removed Any vlan tagged packets sent by the VF are silently dropped The default behavior is VGT The feature may be controlled on the Hypervisor from userspace via iprout2 netlink ip link set dev DEVICE group DEVGROUP up down vf NUM mac LLADDR vlan VLANID qos VLAN QOS spoofchk on off 1 use ip link set dev PF device vf NUM vlan vlan id qos lt qos gt where NUM 0 max vf num e vlan id 0 4095 4095 means set VGT e qos 0 7 For example e ip link set dev eth2 vf 2 qos 3 sets VST mode for VF 2 belonging to PF eth2 with qos 3 ip link set dev eth2 vf 4095 sets mode for VF 2 back to VGT
20. vim etc sysconfig network scripts ifcfg bridge0 DEVICE bridge0 TYPE Bridge TPADDR 12 195 15 1 NETMASK 255 255 0 0 BOOTPROTO static ONBOOT yes NM CONTROLLED no DELAY 0 Step 2 Change the related interface in the example below bridge0 is created over eth5 DEVICE eth5 BOOTPROTO none STARTMODE on HWADDR 00 02 c9 2e 66 52 TYPE Ethernet NM_CONTROLLED no ONBOOT yes BRIDGE bridge0 Step 3 Restart the service network Step 4 Attach a virtual NIC to VM ifconfig a eth6 Link encap Ethernet HWaddr 52 54 00 E7 77 99 inet addr 13 195 15 5 Bcast 13 195 255 255 Mask 255 255 0 0 inet6 addr fe80 5054 ff fee7 7799 64 Scope Link UP BROADCAST RUNNING MULTICAST MTU 1500 Metric 1 RX packets 481 errors 0 dropped 0 overruns 0 frame 0 TX packets 450 errors 0 dropped 0 overruns 0 carrier 0 collisions 0 txqueuelen 1000 RX bytes 22440 21 9 KiB TX bytes 19232 18 7 KiB Interrupt 10 Base address 0xa000 Mellanox Technologies 35 J Rev 2 2 1 0 1 Driver Features Step 5 Add the MAC 52 54 00 E7 77 99 to the sys class net eth5 fdb table on HV 36 Mellanox Technologies Rev 2 2 1 0 1 5 4 4 Assigning a Virtual Function to a Virtual Machine This section will describe a mechanism for adding a SR IOV VF to a Virtual Machine 5 4 4 1 Assigning the SR IOV Virtual Function to the Red Hat KVM VM Server Step 1 Run the virt manager Step 2 Double click on the virtual machine and open its Properties Step 3 Goto D
21. with an invalid value failure The following are the flow specific parameters Table 6 Flow Specific Parameters ether tcp4 udp4 ip4 Mandatory dst src ip dst ip Optional vlan src ip dst ip src src ip dst ip vlan port dst port vlan RFS RFS is an in kernel logic responsible for load balancing between CPUs by attaching flows to CPUs that are used by flow s owner applications This domain allows the RFS mechanism to use the flow steering infrastructure to support the RFS logic by implementing the ndo rx flow steer which in turn calls the underlying flow steering mechanism with the RFS domain Enabling the RFS requires enabling the ntuple flag via the ethtool For example to enable ntuple for eth0 run ethtool K eth0 ntuple on RES requires the kernel to be compiled with the conFIG_RFS ACCEL option This options is available in kernels 2 6 39 and above Furthermore RFS requires Device Managed Flow Steering support P RES cannot function if LRO is enabled LRO can be disabled via ethtool p oea Mellanox Technologies 27 J Rev 2 2 1 0 1 Driver Features All of the rest The lowest priority domain serves the following users The mlx4 Ethernet driver attaches its unicast and multicast MACs addresses to its QP using L2 flow specifications Fragmented UDP traffic cannot be steered It is treated as other protocol by hardware from the first packet and not considere
22. www mellanox com gt products gt Firmware tools mlxburn fw fw ConnectX3 rel mlx dev dev mst mt4099 pci_cr0 conf MCX341A XCG Ax ini If the current firmware version is the same as one provided with MLNX EN run it in combination with the force fw update parameter P This configuration option is supported only in HCAs that their configuration file INI is included in MLNX_EN ae Step 7 Create the text file etc modprobe d mlx4 core conf if it does not exist otherwise delete its contents Step 8 Insert an option line in the etc modprobe d mlx4 core conf file to set the number of VFs the protocol type per port and the allowed number of virtual functions to be used by the physical function driver probe vf options mlx4 core num vfs 5 probe vf 1 2 If the HCA does not support SR IOV please contact Mellanox Support support mellanox com Mellanox Technologies 31 J Rev 2 2 1 0 1 Driver Features Parameter Recommended Value num vfs Ifabsent or zero no VFs will be available e Tfits value is a single number in the range of 0 63 The driver will enable the num v s VFs on the HCA and this will be applied to all ConnectX HCAs on the host Ifits a triplet x y z applies only if all ports are configured as Ethernet the driver creates xsingle port VFs on physical port 1 ysingle port VFs on physical port 2 applies only if such a port exist zn port VFs where n is the number of physical
23. 1 Overview This document provides information on the MLNX_EN Linux driver and instructions for install ing the driver on Mellanox ConnectX adapter cards supporting 10Gb s and 40Gb s Ethernet The MLNX_EN driver release exposes the following capabilities Single Dual port Upto 16 Rx queues per port 6 Tx queues per port Rxsteering mode Receive Core Affinity RCA e MSI X or INTx Adaptive interrupt moderation HW Tx Rx checksum calculation Large Send Offload 1 e TCP Segmentation Offload Large Receive Offload Multi core NAPI support VLAN Tx Rx acceleration HW VLAN stripping insertion Ethtool support Net device statistics SR IOV support Flow steering Ethernet Time Stamping at beta level 1 1 Package Contents This driver kit contains the following Table 5 MLNX EN Package Content Components Description mlx4 driver mlx4 is the low level driver implementation for the ConnectX adapters designed by Mellanox Technologies The ConnectX can operate as an InfiniBand adapter and as an Ethernet NIC To accommodate the two flavors the driver is split into modules mlx4 core mlx4 en and mlx4 ib Note mlx4 ib is not part of this package mlx4 core Handles low level functions like device initialization and firmware commands processing Also controls resource allocation so that the InfiniBand Ethernet and FC functions can share a device without interfering with each other
24. AGE Mellanox TECHNOLOGIES Mellanox Technologies Mellanox Technologies Ltd 350 Oakmead Parkway Suite 100 Beit Mellanox Sunnyvale CA 94085 PO Box 586 Yokneam 20692 U S A Israel www mellanox com www mellanox com Tel 408 970 3400 Tel 972 0 74 723 7200 Fax 408 970 3403 Fax 972 0 4 959 3245 Copyright 2014 Mellanox Technologies All Rights Reserved Mellanox amp Mellanox logo BridgeX ConnectX Connect IB amp CORE Direct InfiniBridge InfiniHost InfiniScale MetroX MLNX OS PhyX ScalableHPC SwitchX UFM Virtual Protocol Interconnect and Voltaire are registered trademarks of Mellanox Technologies Ltd ExtendX FabricIT Mellanox Open Ethernet Mellanox Virtual Modular Switch MetroDX TestX Unbreakable Link are trademarks of Mellanox Technologies Ltd All other trademarks are property of their respective owners 2 Mellanox Technologies Document Number 2950 Rev 2 2 1 0 1 Table of Contents Table of Contents oce ege exe EL E sees ed d Fastot LableS v konnen edere uk Ue dieser e exce Desert e I E N e oS a edere e E Me e eie i Chapter 1 OVerview A os DR Ree aac br ox V ee s Dre li sz dU Il Package Contents sumak a el e EE A p e Eee 10 Chapter2 Driver Installation lt lt neee e kt e Rx E erer 12 2 1 SoftwareDependencies en 12 2 2 Installing the Divers ede ie eec stt A RR man 12 2 37 Loading the D
25. E is off or fails then do it in software SOF TIMESTAMPING RAW HARDWARE return original raw hardware time stamp SOF TIMESTAMPING SYS HARDWARE return hardware time stamp transformed to the system time base SOF TIMESTAMPING SOFTWARE return system time stamp generated in software SOF TIMESTAMPING TX RX determine how time stamps are generated SOF TIMESTAMPING RAW SYS determine how they are reported To enable time stamping for a net device Admin privileged user can enable disable time stamping through calling ioctl sock SIOCSHWT STAMP amp ifreq with following values Send side time sampling Enabled by ifreq hwtstamp config tx type when possible values for hwtstamp config tx type enum hwtstamp tx types No outgoing packet will need hardware time stamping should a packet arrive which asks for it no hardware time stamping will be done v EWTSTAMP TX OFF Enables hardware time stamping for outgoing packets the sender of the packet decides which are to be time stamped by setting SOF TIMESTAMPING TX SOFTWARE before sending the packet 5 HWTSTAMP TX ON Enables time stamping for outgoing packets just as HWTSTAMP TX ON does but also enables time stamp insertion directly into Sync packets In this case transmitted Sync packets will not received a time stamp via the socket error QUEUE ny HWTSTAMP TX ONESTEP SYNC hi Note for send side time s
26. January 2014 Added Section 5 10 Ethernet Performance Counters on page 46 2 0 3 0 0 October 2013 Added the following sections Section 5 4 Single Root IO Virtualization SR IOV on page 28 Section 5 3 Flow Steering on page 26 Section 5 2 Time Stamping Service on page 23 6 Mellanox Technologies J Rev 2 2 1 0 1 About this Manual This Preface provides general information concerning the scope and organization of this User s Manual Intended Audience This manual is intended for system administrators responsible for the installation configuration management and maintenance of the software and hardware of VPI InfiniBand Ethernet adapter cards It is also intended for application developers Common Abbreviations and Acronyms Table 2 Abbreviations and Acronyms Sheet 1 of 2 Abbreviation Acronym Whole Word Description B Capital B is used to indicate size in bytes or multiples of bytes e g IKB 1024 bytes and 1MB 1048576 bytes b Small b is used to indicate size in bits or multiples of bits e g Kb 1024 bits FW Firmware HCA Host Channel Adapter HW Hardware IB InfiniBand LSB Least significant byte Isb Least significant bit MSB Most significant byte msb Most significant bit NIC Network Interface Card SW Software VPI Virtual Protocol Interconnect PFC Priority Flow Control
27. Mellanox TECHNOLOGIES Connect Accelerate Outperform MLNX_EN for Linux User Manual Rev 2 2 1 0 1 www mellanox com Rev 2 2 1 0 1 NOTE THIS HARDWARE SOFTWARE OR TEST SUITE PRODUCT PRODUCT S AND ITS RELATED DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES AS IS WITH ALL FAULTS OF ANY KIND AND SOLELY FOR THE PURPOSE OF AIDING THE CUSTOMER IN TESTING APPLICATIONS THAT USE THE PRODUCTS IN DESIGNATED SOLUTIONS THE CUSTOMER S MANUFACTURING TEST ENVIRONMENT HAS NOT MET THE STANDARDS SET BY MELLANOX TECHNOLOGIES TO FULLY QUALIFY THE PRODUCTO S AND OR THE SYSTEM USING IT THEREFORE MELLANOX TECHNOLOGIES CANNOT AND DOES NOT GUARANTEE OR WARRANT THAT THE PRODUCTS WILL OPERATE WITH THE HIGHEST QUALITY ANY EXPRESS OR IMPLIED WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT ARE DISCLAIMED IN NO EVENT SHALL MELLANOX BE LIABLE TO CUSTOMER OR ANY THIRD PARTIES FOR ANY DIRECT INDIRECT SPECIAL EXEMPLARY OR CONSEQUENTIAL DAMAGES OF ANY KIND INCLUDING BUT NOT LIMITED TO PAYMENT FOR PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY FROM THE USE OF THE PRODUCT S AND RELATED DOCUMENTATION EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAM
28. aintain flexibil ity domains and priorities are used Flow steering uses a methodology of flow attribute which is a combination of L2 L4 flow specifications a destination QP and a priority Flow steering rules could be inserted either by using ethtool or by using InfiniBand verbs The verbs abstraction uses an opposed terminology of a flow attribute ibv flow attr defined by a combination of specifi cations struct ibv flow spec 5 3 1 Enable Disable Flow Steering Flow Steering is disabled by default and regular L2 steering is performed instead BO Steering When using SR IOV flow steering is enabled if there is adequate amount of space to store the flow steering table for the guest master gt To enable Flow Steering Step 1 Open the etc modprobe d mlnx conf file Step 2 Set the parameter log num mgm entry size to 1 by writing the option mlx4_core log num mgm entry size 1 Step3 Restart the driver To disable Flow Steering Step 1 Open the etc modprobe d mlnx conf file Step 2 Remove the options mlx4 core log num mgm entry size 1 Step 3 Restart the driver 5 3 2 Flow Domains and Priorities Flow steering defines the concept of domain and priority Each domain represents a user agent that can attach a flow The domains are prioritized A higher priority domain will always super sede a lower priority domain when their flow specifications overlap Setting a lower priority value will result in higher pr
29. ast packets received successfully vport lt i gt rx broadcast byte S Broadcast packet bytes received successfully vport lt i gt rx dropped Received packets discarded due to out of buffer condition vport lt i gt rx errors Received packets discarded due to receive error condition vport lt i gt tx unicast packet S Unicast packets sent successfully vport lt i gt tx unicast bytes Unicast packet bytes sent successfully vport lt i gt tx multicast pack ets Multicast packets sent successfully vport lt i gt tx multicast byte S Multicast packet bytes sent successfully vport lt i gt tx broadcast pac kets Broadcast packets sent successfully vport lt i gt tx broadcast byte S Broadcast packet bytes sent successfully vport lt i gt tx errors Packets dropped due to transmit errors Mellanox Technologies 49 J Table 13 SW Statistics Rev 2 2 1 0 1 Driver Features Counter Description rx ro aggregated Number of packets aggregated rx lro flushed Number of LRO flush to the stack rx lro no desc Number of times LRO description was not found rx alloc failed Number of times failed preparing receive descriptor rx csum good Number of packets received with good checksum IX csum none Number of packets received with no checksum indication tx chksum offload Number of packets transmitted with checksum off
30. binary firmware image file enter the following command gt mstflint d lt pci device gt i lt image name bin gt b For burning firmware using the MFT package please check the MFT user s manual under www mel lanox com gt Products gt Adapter IB VPI SW gt Firmware Tools P After burning new firmware to an adapter card reboot the machine so that the new firm ware can take effect Ro 16 Mellanox Technologies Rev 2 2 1 0 1 5 Driver Features 5 1 Quality of Service Quality of Service QoS is a mechanism of assigning a priority to a network flow socket rdma_cm connection and manage its guarantees limitations and its priority over other flows This is accomplished by mapping the user s priority to a hardware TC traffic class through a 2 3 stages process The TC is assigned with the QoS attributes and the different flows behave accordingly 5 1 1 Mapping Traffic to Traffic Classes Mapping traffic to TCs consists of several actions which are user controllable some controlled by the application itself and others by the system network administrators The following is the general mapping traffic to Traffic Classes flow 1 The application sets the required Type of Service ToS 2 The ToS is translated into a Socket Priority sk_prio 3 The sk_prio is mapped to a User Priority UP by the system administrator some applica tions set sk_prio directly 4 The UP is mapped to TC by the network system administ
31. concurrent sessions per Rx ring Some of these values can be changed using module parameters which can be displayed by run ning gt modinfo mlx4 en To set non default values to module parameters add to the etc modprobe cont file options mlx4 en param name value param name value Values of all parameters can be observed in sys module mlx4 en parameters Mellanox Technologies 15 J Rev 2 2 1 0 1 Firmware Programming 4 Firmware Programming The adapter card was shipped with the most current firmware available This section is intended for future firmware upgrades and provides instructions for 1 installing Mellanox firmware update tools MFT 2 downloading FW and 3 updating adapter card firmware 4 1 Installing Firmware Tools The driver package compiles and installs the Mellanox mstflint utility under usr local bin You may also use this tool to burn a card specific firmware binary image See the file tmp mlnx en src utils mstflint README file for details Alternatively you can download the current Mellanox Firmware Tools package MFT from www mellanox com gt Products gt Adapter IB VPI SW gt Firmware Tools The tools package to download is MFT SW for Linux tarball name is mft X X X tgz For help in identifying your adapter card please visit http www mellanox com content pages php pg firmware HCA FW identification 4 2 Updating Adapter Card Firmware Using a card specific
32. d as UDP traffic 5 4 Single Root IO Virtualization SR IOV Single Root IO Virtualization SR IOV is a technology that allows a physical PCIe device to present itself multiple times through the PCIe bus This technology enables multiple virtual instances of the device with separate resources Mellanox adapters are capable of exposing in ConnectX 3 adapter cards up to 126 virtual instances called Virtual Functions VFs These vir tual functions can then be provisioned separately Each VF can be seen as an addition device con nected to the Physical Function It shares the same resources with the Physical Function and its number of ports equals those of the Physical Function SR IOV is commonly used in conjunction with an SR IOV enabled hypervisor to provide virtual machines direct hardware access to network resources hence increasing its performance In this chapter we will demonstrate setup and configuration of SR IOV in a Red Hat Linux envi ronment using Mellanox ConnectX VPI adapter cards family 5 4 1 System Requirements To set up an SR IOV environment the following is required MLNX EN Driver Aserver blade with an SR IOV capable motherboard BIOS Hypervisor that supports SR IOV such as Red Hat Enterprise Linux Server Version 6 Mellanox ConnectX VPI Adapter Card family with SR IOV capability 28 Mellanox Technologies Rev 2 2 1 0 1 5 4 2 Setting Up SR IOV Depending on your system perform the steps below to set
33. ecific Parameters ia 27 Table7 ethtoolSupportedOptlons e nent eens 40 Table Port IN Counters ii tom planlan obeudum wee a Gene ese ee gel 46 Fable 9 Port OU T Counters as 4 quee aos yla oe eae all ei esae any eoe e n QE ques 47 Table 10 Port VLAN PriorityTagging where lt i gt isintherange0 7 48 Table 11 PortPause where lt i gt isintherange0 48 Table 12 VPort Statistics where lt i gt lt empty string is the PF and ranges 1 NumOfVf per VF 49 Table 133 SS W Statistics oe eee deese tes ae ose ri AI 50 Table 14 Per Ring SW Statistics where lt i gt is the ring I per configuration 50 Mellanox Technologies 5 J Rev 2 2 1 0 1 Document Revision History Table 1 Document Revision History Release Date Description 2 2 1 0 1 May 2014 e Added the following sections Section 5 4 6 3 Mapping VFs to Ports using the mlnx get vfs pl Tool on page 39 Section 5 5 Ethtool on page 39 Section 5 6 Ethernet VXLAN on page 41 Section 5 7 Quantized Congestion Control on page 42 Section 5 8 pm qos usage on ingress Packet Traffic on page 45 Section 5 9 XOR RSS Hash Function on page 45 Updated the following section Section 5 4 2 Setting Up SR IOV on page 29 Removed the following sections Burning Firmware with SR IOV Performance 2 1 1 0 0
34. equires updating its value for each priority 1 indicates no change in the current value Example for setting rp g enable in order to enable QCN for priorities 3 5 6 mille Gin i cua seg emelgle 1 i i i i 1 1 i Example for setting rpg_hai_rate for priorities 1 6 7 milz Gein 1 cual polo ee sil il 1 sil i 60 50 5 8 pm qos usage on ingress Packet Traffic pm gos API is used by mlx4 en to enforce minimum DMA latency requirement on the system when ingress traffic is detected Additionally it decreases packet loss on systems configured with abundant power state profile when Flow Control is disabled The pm_qos API contains the following functions pm qos add request pm qos update request pm qos remove request pm qos feature is both global and static once a request is issued it is enforced on all CPUs and does not change in time MLNX OFED v2 2 1 0 0 provides an option to trigger a request when required and to remove it when no longer required It is disabled by default and can be set unset through the ethtool priv flags For further information on how to enable disable this feature please refer to Table 7 ethtool Supported Options on page 40 5 9 XOR RSS Hash Function The device has the ability to use XOR as the RSS distribution function instead of the default Toplitz function The XOR function can be better distributed among driver s receive queues in small number of streams where it distribu
35. es tx 511 bytes packets Number of transmitted 256 to 511 octet frames tx 1023 bytes packets Number of transmitted 512 to 1023 octet frames Mellanox Technologies 47 J Rev 2 2 1 0 1 Driver Features Table 9 Port OUT Counters Counter Description tx_1518_bytes_packets Number of transmitted 1024 to 1518 octet frames tx_1522 bytes_packets Number of transmitted 1519 to 1522 octet frames tx_1548_bytes_packets Number of transmitted 1523 to 1548 octet frames tx gt 1548 bytes packets Number of transmitted 1549 or greater octet frames Table 10 Port VLAN Priority Tagging where i is in the range O 7 Counter Description IX prio i packets Total packets successfully received with priority 1 IX prio i bytes Total bytes in successfully received packets with priority i rx novlan packets Total packets successfully received with no VLAN priority rx novlan bytes Total bytes in successfully received packets with no VLAN pri ority tx prio i packets Total packets successfully transmitted with priority 1 tx prio lt i gt bytes Total bytes in successfully transmitted packets with priority i tx novlan packets Total packets successfully transmitted with no VLAN priority tx novlan bytes Total bytes in successfully transmitted packets with no VLAN priority Table 11 Port
36. etails gt Add hardware gt PCI host device seo tual Ble Virtual Machine View Send Key Q 2 0 Add new virtual hardware e c Adding Virtual Hardware This assistant will guide you through adding a new piece of virtual hardware First select what type of hardware you wish to add Hardware type Storage W Network i input EJ Graphics fen Sound d Serial Paralel Physical Host Device EJ video Bl watchdog X Cancel gt Forward e Add Hardware Remove Step 4 Choose a Mellanox virtual function according to its PCI device e g 00 03 1 Step 5 If the Virtual Machine is up reboot it otherwise start it Step 6 Log into the virtual machine and verify that it recognizes the Mellanox card Run lspci grep Mellanox 00 03 0 InfiniBand Mellanox Technologies MT27500 Family ConnectX 3 Virtual Function rev b0 Step 7 Add the device to the etc sysconfig network scripts ifcfg ethx configuration file The MAC address for every virtual function is configured randomly therefore it is not necessary to add it Mellanox Technologies 37 J Rev 2 2 1 0 1 Driver Features 5 4 5 5 4 6 5 4 6 1 Uninstalling SR IOV Driver gt To uninstall SR IOV driver perform the following Step 1 For Hypervisors detach all the Virtual Functions VF from all the Virtual Machines VM or stop the Virtual Machines that use the Virtual Functions Please be aware stopping the driver when there are VMs that
37. ice Hinks le e e cae pee Ben gzl 29 5 4 3 Enabling SR IOV and Para Virtualization on the Same Setup 35 5 4 4 AssigningaVirtualFunctiontoaVirtualMachine 37 5 4 5 UninstallingSR lOVDriver 38 5 4 6 Ethernet Virtual Function Configuration when Running SR IOV 38 5 5 BthtoOl zi oett pompe e EVI ede 39 5 6 Ethernet VXDLANa kk ade 41 526 1 Prerequisites 3x 400 sacs x RR RE Mea nik aie Gk pik CEA eget 41 56 2 Enabling VALAN o AA A AA 42 5 03 Important Notes si cce e mekik OR Ra ERR RR 42 5 7 uantizedClongestionControl 42 57 1 OCNTool minxgen ii 43 Mellanox Technologies 3 J Rev 2 2 1 0 1 5 7 2 Setting QCN Configuration 45 5 8 pmgosusageoningressPacketTraffic 45 5 9 XORRSSHashFunction 1 45 5 10 EthernetPerformanceCounters 46 Chapter 6 Performance 45 A A EE RR ER MARE 51 4 Mellanox Technologies Rev 2 2 1 0 1 List of Tables Table 1 Document Revision History a 6 Table 2 AbbreviatlonsandAcronyms 0 0 7 Table 3 Glossatyz ka A a Sank ee as e ee eR E GERE ERE E Male Bi egi 8 Table 4 ReferenceDocuments hm 9 Table 5 MLNX ENPackageContent 5 0 10 Table 6 Flow Sp
38. inx qos tool or by the 11dpad daemon if DCBX is used Socket applications can use setsockopt SK_PRIO value to directly set the sk_prio P of the socket In this case the ToS to sk prio fixed mapping is not needed This allows the application and the administrator to utilize more than the 4 values possible via ToS In case of VLAN interface the UP obtained according to the above mapping is also used P in the VLAN tag of the traffic 5 1 3 Map Priorities with tc wrap py minx gos Network flow that can be managed by QoS attributes is described by a User Priority UP A user s sk_prio 1s mapped to UP which in turn is mapped into TC Indicating the UP e When the user uses sk prio it is mapped into a UP by the tc tool This is done by the tc wrap py tool which gets a list of lt 16 comma separated UP and maps the sk_prio to the specified UP For example tc_wrap py ietho u 1 5 maps sk prio 0 of etho device to UP 1 and sk_prio 1to UPS Setting set egress mapin VLAN maps the skb priority of the VLAN to a v1an qos The vlan gos is represents a UP for the VLAN device In RoCE rdma set option with RDMA OPTION ID TOS could be used to set the UP When creating QPs the s field in ibv modify gp command represents the UP Indicating the TC After mapping the skb priority to UP one should map the UP into a TC This assigns the user priority to a specific hardware traffic class In order to do that minx qos should
39. ion mlxconfig Changing Device Configuration Tool in the MFT User Manual www mellanox com gt Products gt Software gt Firmware Tools Step 6 Verify the HCA is configured to support SR IOV root selene mstflint dev lt PCI Device gt dc 1 Verify in the HCA section the following fields appear HCA num pfs 1 total vfs 0 126 sriov_en true Parameter Recommended Value num_pfs 1 Note This field is optional and might not always appear total vfs When using firmware version 2 31 5000 and above the recommended value is 126 When using firmware version 2 30 8000 and below the recommended value is 63 Note Before setting number of VFs in SR IOV please make sure your system can support that amount of VFs Setting number of VFs larger than what your Hardware and Software can support may cause your system to cease working sriov en true 1 If SR IOV is supported to enable SR IOV if it is not enabled it is sufficient to set sriov en true in the INI 30 Mellanox Technologies Rev 2 2 1 0 1 2 Add the above fields to the INI if they are missing 3 Set the total_vfs parameter to the desired number if you need to change the num ber of total VFs 4 Reburn the firmware using the mlxburn tool if the fields above were added to the INI or the total_vfs parameter was modified If the mlxburn is not installed please downloaded it from the Mellanox website http
40. iority In addition to the domain there is priority within each of the domains Each domain can have at most 2012 priorities in accordance to its needs The following are the domains at a descending order of priority 26 Mellanox Technologies Rev 2 2 1 0 1 e Ethtool Ethtool domain is used to attach an RX ring specifically its QP to a specified flow Please refer to the most recent ethtool manpage for all the ways to specify a flow Examples e ethtool U eth5 flow type ether dst 00 11 22 33 44 55 loc 5 action 2 All packets that contain the above destination MAC address are to be steered into rx ring 2 its underlying QP with priority 5 within the ethtool domain ethtool U eth5 flow type tcp4 src ip 1 2 3 4 dst port 8888 loc 5 action 2 All packets that contain the above destination IP address and source port are to be steered into rx ring 2 When destination MAC is not given the user s destination MAC is filled automatically e ethtool u eth5 Shows all of ethtool s steering rule When configuring two rules with the same priority the second rule will overwrite the first one so this ethtool interface is effectively a table Inserting Flow Steering rules in the kernel requires support from both the ethtool in the user space and in kernel v2 6 28 MLXA Driver Support The mlx4 driver supports only a subset of the flow specification the ethtool API defines Asking for an unsupported flow specification will result
41. iproute2 package v2 6 32 19 or higher Otherwise an alternative custom sysfs interface 1s available mlnx qos tool package ofed scripts requires python gt 2 5 tc wrap py package ofed scripts requires python gt 2 5 5 2 Time Stamping Service Time Stamping is currently at beta level d Please be aware that everything listed here is subject to change Time Stamping is currently supported in ConnectX 3 ConnectX 3 Pro adapter ly d cards only Time stamping is the process of keeping track of the creation of a packet A time stamping ser vice supports assertions of proof that a datum existed before a particular time Incoming packets are time stamped before they are distributed on the PCI depending on the congestion in the PCI buffers Outgoing packets are time stamped very close to placing them on the wire Mellanox Technologies 23 J Rev 2 2 1 0 1 Driver Features 5 2 1 Enabling Time Stamping Time stamping is off by default and should be enabled before use gt To enable time stamping for a socket Call setsockopt with SO TIMESTAMPING and with the following flags SOF TIMESTAMPING TX HARDWARE try to obtain send time stamp in hardware SOF TIMESTAMPING TX SOFTWARE if SOF TIMESTAMPING TX HARDWARE is off or fails then do it in software SOF TIMESTAMPING RX HARDWARE return the original unmodified time stamp as generated by the hardware SOF TIMESTAMPING RX SOFTWARE if SOF TIMESTAMPING RX HARDWAR
42. ited bandwidth delay product Ethernet networks It 1s part of the IEEE Data Center Bridging DCB protocol suite which also includes ETS PFC and DCBX QCN in conducted at L2 and is targeted for hardware implementations QCN applies to all Ethernet packets and all transports and both the host and switch behavior is detailed in the standard QCN user interface allows the user to configure QCN activity QCN configuration and retrieval of information is done by the m1nx gen tool The command interface provides the user with a set of changeable attributes and with information regarding QCN s counters and statistics All parameters and statistics are defined per port and priority QCN command interface is available if and only the hardware supports it 42 Mellanox Technologies Rev 2 2 1 0 1 5 7 1 QCN Tool minx gen mlnx_qcn is a tool used to configure QCN attributes of the local host It communicates directly with the driver thus does not require setting up a DCBX daemon on the system The mlnx gen enables the user to Inspect the current QCN configurations for a certain port sorted by priority Inspect the current QCN statistics and counters for a certain port sorted by priority Set values of chosen QCN parameters Usage minx gcn i interface options Options version Show program s version number and exit h help Show this help message and exit i INTF interface INTF Interface na
43. load tx queue stopped Number of times transmit queue suspended tx wake queue Number of times transmit queue resumed tx timeout Number of times transmitter timeout tx tso packets Number of packet that were aggregated Table 14 Per Ring SW Statistics where i is the ring per configuration Counter Description x lt i gt packets Total packets successfully received on ring i x lt i gt bytes Total bytes in successfully received packets on ring 1 tx lt i gt packets Total packets successfully transmitted on ring i lt i gt bytes Total bytes in successfully transmitted packets on ring 1 50 Mellanox Technologies Rev 2 2 1 0 1 6 Performance For further information on Linux performance please refer to the Performance Tuning Guide for Mellanox Network Adapters Mellanox Technologies 51 J Rev 2 2 1 0 1 Performance 52 Mellanox Technologies
44. me g TYPE get type TYPE Type of information to get statistics param eters rpg enable RPG ENABLE LIST Set value of rpg enable according to prior ity use spaces between values and 1 for unknown values rppp max rps RPPP MAX RPS LIST Set value of rppp max rps according to prior ity use spaces between values and 1 for unknown values rpg time reset RPG TIME RESET LIST Set value of rpg time reset according to pri ority use spaces between values and 1 for unknown values rpg byte reset RPG BYTE RESET LIST Set value of rpg byte reset according to pri ority use spaces between values and 1 for unknown values rpg threshold RPG THRESHOLD LIST Set value of rpg threshold according to pri ority use spaces between values and 1 for unknown values rpg max rate RPG MAX RATE LIST Set value of rpg max rate according to prior ity use spaces between values and 1 for unknown values rpg ai rate RPG AI RATE LIST Set value of rpg ai rate according to prior ity use spaces between values and 1 for unknown values rpg hai rate RPG HAI RATE LIST Set value of rpg hai rate according to prior ity use spaces between values and 1 for unknown values rpg gd RPG GD LIST Set value of rpg gd according to priority use spaces between values and 1 for unknown values Mellanox Technologies 43 J Rev 2 2 1 0 1 Driver Features rpg min dec fac RPG MIN DEC FAC LIST Set value
45. nd parameter in a triplet is valid only when there are more than 1 physical port In a triplet x z lt 63 and y z lt 63 the maximum number of VFs on each physical port must be 63 32 Mellanox Technologies Rev 2 2 1 0 1 Parameter Recommended Value port type array Specifies the protocol type of the ports It is either one array of 2 port types t1 t2 for all devices or list of BDF to port type array bb dd f t1 t2 string Valid port types 1 ib 2 eth 3 auto 4 N A If only a single port is available use the N A port type for port2 e g 1 4 probe vf If absent or zero no VF interfaces will be loaded in the Hypervisor host Ifnum_vfs is a number in the range of 1 63 the driver run ning on the Hypervisor will itself activate that number of VFs All these VFs will run on the Hypervisor This number will apply to all ConnectX HCAs on that host e fitsa triplet x y z applies only if all ports are configured as Ethernet the driver probes xsingle port VFs on physical port 1 ysingle port VFs on physical port 2 applies only if such a port exist e zn port VFs where n is the number of physical ports on device Those VFs are attached to the hypervisor fits format is a string the string specifies the probe v parameter separately per installed HCA The string format is bb dd f v bb dd f v bb dd f bus device function of the PF of the HCA v number of VFs to use in the PF dri
46. or Number of received frames with a length type field value in the decimal range 1535 1501 rx t 64 bytes packets Number of received 64 or less octet frames rx 127 bytes packets Number of received 65 to 127 octet frames rx 255 bytes packets Number of received 128 to 255 octet frames rx 511 bytes packets Number of received 256 to 511 octet frames rx 1023 bytes packets Number of received 512 to 1023 octet frames rx 1518 bytes packets Number of received 1024 to 1518 octet frames rx 1522 bytes packets Number of received 1519 to 1522 octet frames rx 1548 bytes packets Number of received 1523 to 1548 octet frames rx gt 1548 bytes packets Number of received 1549 or greater octet frames Table 9 Port OUT Counters Counter Description tx packets Total packets successfully transmitted tx bytes Total bytes in successfully transmitted packets tx multicast packets Total multicast packets successfully transmitted tx broadcast packets Total broadcast packets successfully transmitted tx errors Number of frames that failed to transmit tx dropped Number of transmitted frames that were dropped tx t 64 bytes packets Number of transmitted 64 or less octet frames tx 127 bytes packets Number of transmitted 65 to 127 octet frames tx 255 bytes packets Number of transmitted 128 to 255 octet fram
47. rator 5 TCs hold the actual QoS parameters QoS can be applied on the following types of traffic However the general QoS flow may vary among them Plain Ethernet Applications use regular inet sockets and the traffic passes via the ker nel Ethernet driver e RoCE Applications use the RDMA API to transmit using QPs Raw Ethernet QP Application use VERBs API to transmit using a Raw Ethernet QP 5 1 2 Plain Ethernet Quality of Service Mapping Applications use regular inet sockets and the traffic passes via the kernel Ethernet driver The following is the Plain Ethernet QoS mapping flow 1 The application sets the ToS of the socket using setsockopt IP Tos value 2 ToS is translated into the sk prio using a fixed translation TOS 0 sk prio 0 TOS 8 sk prio 2 TOS 24 sk prio 4 TOS 16 sk prio 6 3 The Socket Priority is mapped to the UP Ifthe underlying device is a VLAN device egress map is used controlled by the vconfig command This is per VLAN mapping Ifthe underlying device is not a VLAN device the tc command is used In this case even though tc manual states that the mapping is from the sk prio to the TC number the mlx4 en driver interprets this as a sk prio to UP mapping Mellanox Technologies 17 J Rev 2 2 1 0 1 Driver Features Mapping the sk prio to the UP is done by using tc wrap py i dev name gt u 0 1 2 3 4 5 6 7 4 The the UP is mapped to the TC as configured by the m
48. rivers ci Bai kile bla alimi bel ae 13 24 UnloadingtheDriver n 13 2 5 UninstallingtheDriver 0 13 Chapter3 Ethernet Driver Usage and Configuration 14 Chapter 4 FirmwareProgramming 00 0000616 4 1 MnstallingFirmwareTools 16 42 UpdatingAdapterCardfirmware 16 Chapter 5 Driver Features 0 ore Vena DE Sa G ee eee aaa EX e ee eid 17 5 1 Quality of Service siya sake ae le eSI e die Rae e ns Red 17 5 1 1 MappingTraffictoTrafficClasses 17 5 1 2 Plain Ethernet Quality of Service Mapping 17 5 1 3 Map Priorities with tc wrap py mlnx gos 18 5 1 4 Quality of Service Properties 18 5 15 Quality of Service Tools enia a e SA N L s 19 5 2 Time StampingService 0 0 23 52 1 EnablinglimeStamping 24 5 2 2 GettingglimeStamping 1 25 3 Elo Steering en il err ARE E XAR VEN gile ba 26 5 3 1 Enable DisableFlowsSteering 26 5 3 2 FlowDomainsandPrlorities 26 5 4 SingleRootlOvVirtualization SR IOV 28 5 4 1 SystemReguirements 1 28 5 4 2 Setting Up SR lOV ntt t
49. rwise GRO will be done Generic Receive Offload GRO is available throughout all kernels ethtool c eth lt x gt Queries interrupt coalescing settings ethtool C eth lt x gt adaptive rx onloff Enables disables adaptive interrupt moderation By default the driver uses adaptive interrupt moderation for the receive path which adjusts the moderation time to the traffic pattern ethtool C eth lt x gt pkt rate low N pkt rate high N rx usecs low N rx usecs high N Sets the values for packet rate limits and for moderation time high and low values Above an upper limit of packet rate adaptive moderation will set the moderation time to its highest value Below a lower limit of packet rate the moderation time will be set to its lowest value 40 Mellanox Technologies Rev 2 2 1 0 1 Table 7 ethtool Supported Options Options Description ethtool C eth lt x gt rx usecs N rx Sets the interrupt coalescing settings when the adaptive frames N moderation is disabled Note usec settings correspond to the time to wait after the last packet is sent received before triggering an inter rupt ethtool a eth lt x gt Queries the pause frame settings ethtool A eth lt x gt rx onloff tx Sets the pause frame settings onloff ethtool g eth lt x gt Queries the ring size values ethtool G eth lt x gt rx lt N gt tx Modifies the rings size
50. tamping currently only HWTSTAMP TX OFF and HWTSTAMP TX ON are supported 24 Mellanox Technologies Rev 2 2 1 0 1 Receive side time sampling Enabled by ifreq hwtstamp config rx filter when possible values for hwtstamp config rx filter enum hwtstamp rx filters time stamp no incoming packet at all HWTSTAMP FILTER NONE time stamp any incoming packet HWTSTAMP FILTER ALL return value time stamp all packets requested plus some others HWTSTAMP FILTER SOME PTP v1 UDP any kind of event packet HWTSTAMP FILTER PTP V1 L4 EVENT PTP vi UDP Sync packet HWTSTAMP FILTER PTP V1 L4 SYNC PTP vi UDP Delay reg packet HWTSTAMP FILTER PTP V1 L4 DELAY REQ PTP v2 UDP any kind of event packet HWTSTAMP FILTER PTP V2 L4 EVENT PTP v2 UDP Sync packet HWTSTAMP FILTER PTP V2 L4 SYNC PTP v2 UDP Delay reg packet HWTSTAMP FILTER PTP V2 L4 DELAY REQ 802 ASI Ethernet any kind of event packet HWTSTAMP FILTER PTP V2 L2 EVENT 802 ASI Ethernet Sync packet HWTSTAMP FILTER PTP V2 L2 SYNC 802 ASI Ethernet Delay req packet HWTSTAMP FILTER PTP V2 L2 DELAY REQ PTP v2 802 A81 any layer any kind of event packet HWTSTAMP FILTER PTP V2 EVENT PTP v2 802 AS1 any layer Sync packet HWTSTAMP FILTER PTP V2 SYNC PTP v2 802 AS1 any layer Delay req packet HWTSTAMP FILTER PTP V2 DELAY REQ
51. tes each TCP UDP stream to a different queue MLNX OFED v2 2 1 0 0 provides an option to change the working RSS hash function from Toplitz to XOR and vice versa through ethtool priv flags For further information please refer to Table 7 ethtool Supported Options on page 40 Mellanox Technologies 45 J Rev 2 2 1 0 1 Driver Features 5 10 Ethernet Performance Counters Counters are used to provide information about how well an operating system an application a service or a driver is performing The counter data helps determine system bottlenecks and fine tune the system and application performance The operating system network and devices pro vide counter data that an application can consume to provide users with a graphical view of how well the system is performing The counter index is a QP attribute given in the QP context Multiple QPs may be associated with the same counter set If multiple QPs share the same counter its value represents the cumulative total ConnectX 3 support 127 different counters which allocated e A counters reserved for PF 2 counters for each port 2 counters reserved for VF 1 counter for each port All other counters if exist are allocated by demand e RoCE counters are available only through sysfs located under e sys class infiniband mlx4 ports counters sys class infiniband mlx4 ports counters ext Physical Function can also read Virtual Functions port counters
52. the mod eration time to the traffic pattern gt To set the values for packet rate limits and for moderation time high and low gt ethtool C eth x pkt rate low N pkt rate high N rx usecs low N rx usecs high N Above an upper limit of packet rate adaptive moderation will set the moderation time to its highest value Below a lower limit of packet rate the moderation time will be set to its lowest value gt To set interrupt coalescing settings when adaptive moderation is disabled gt ethtool C eth lt x gt rx usecs N rx frames N P usec settings correspond to the time to wait after the last packet is sent received before triggering an interrupt gt To query pause frame settings gt ethtool a eth lt x gt gt To set pause frame settings gt ethtool A eth lt x gt rx on off tx on off 14 Mellanox Technologies Rev 2 2 1 0 1 gt To query ring size values gt ethtool g eth lt x gt gt To modify rings size gt ethtool G eth lt x gt rx lt N gt tx lt N gt gt To obtain additional device statistics gt ethtool S eth lt x gt gt To perform a self diagnostics test gt ethtool t eth lt x gt The driver defaults to the following parameters Both ports are activated 1 e a net device is created for each port The number of Rx rings for each port is the nearest power of 2 of number of cpu cores limited by 16 LRO is enabled with 32
53. ther TCs according to a minimal guarantee policy If for instance TCO is set to 80 guarantee and TC1 to 20 the TCs sum must be 100 then the BW left after servicing all strict priority TCs will be split according to this ratio Since this is a minimal guarantee there is no maximum enforcement This means in the same example that if TC1 did not use its share of 20 the reminder will be used by TCO 5 1 4 3 Rate Limit Rate limit defines a maximum bandwidth allowed for a TC Please note that 10 deviation from the requested values is considered acceptable 5 1 5 Quality of Service Tools 5 1 5 1 mlnx qos mlnx qos is a centralized tool used to configure QoS features of the local host It communicates directly with the driver thus does not require setting up a DCBX daemon on the system The minx qos tool enables the administrator of the system to Inspect the current QoS mappings and configuration The tool will also display maps configured by TC and vconfig set egress map tools in order to give a centralized view of all QoS mappings Set UP to TC mapping Assign a transmission algorithm to each TC strict or ETS Set minimal BW guarantee to ETS TCs Setrate limit to TCs For unlimited ratelimit set the ratelimit to 0 P Usage minx qos i interface options Mellanox Technologies 19 J Rev 2 2 1 0 1 Driver Features Options version show program s version number and exit h help show
54. through sysfs located under sys class net eth vf statistics To display the network device Ethernet statistics you can run Ethtool S devname Table 8 Port IN Counters Counter Description rx packets Total packets successfully received rx bytes Total bytes in successfully received packets rx multicast packets Total multicast packets successfully received rx broadcast packets Total broadcast packets successfully received IX errors Number of receive packets that contained errors preventing them from being deliverable to a higher layer protocol rx dropped Number of receive packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher layer protocol rx length errors Number of received frames that were dropped due to an error in frame length IX Over eIrors Number of received frames that were dropped due to overflow IX CIC efrors Number of received frames with a bad CRC that are not runts jabbers or alignment errors 46 Mellanox Technologies Table 8 Port IN Counters Rev 2 2 1 0 1 Counter Description rx_jabbers Number of received frames with a length greater than MTU octets and a bad CRC rx_in_range_length_error Number of received frames with a length type field value in the decimal range 1500 46 42 is also counted for VLANtagged frames rx out range length err
55. ver for that HCA which is either a single value or a triplet as described above For example probe vfs 5 The PF driver will activate 5 VFs on the HCA and this will be applied to all ConnectX HCAs on the host probe vfs 00 04 0 5 00 07 0 8 The PF driver will activate 5 VFs on the HCA positioned in BDF 00 04 0 and 8 for the one in 00 07 0 Mellanox Technologies 33 J Rev 2 2 1 0 1 Driver Features Parameter Recommended Value probe vf 1 2 3 The PF driver will activate 1 VF on physical port 1 2 VFs on physical port 2 and 3 dual port VFs applies only to dual port HCA when all ports are Ethernet ports This applies to all ConnectX HCAs in the host probe vf 00 04 0 5 6 7 00 07 0 8 9 10 The PF driver will activate HCA positioned in BDF 00 04 0 5 single VFs on port 1 6single VFs on port 2 7 dual port VFs HCA positioned in BDF 00 07 0 8single VFs on port 1 Osingle VFs on port 2 10 dual port VFs Applies when all ports are configure as Ethernet in dual port HCAs Notes PFs not included in the above list will not activate any of their VFs in the PF driver Triplets and single port VFs are only valid when all ports are configured as Ethernet When an InfiniBand port exist only probe vf a syntax is valid where a isa single value that represents the number of VFs The second parameter in a triplet is valid only when there are more than 1 physical port Every value
Download Pdf Manuals
Related Search
Related Contents
Select. - Tecnocampo.com Alco tester Drager manual to - HellermannTyton Samsung DIGIMAX 301 Manual de Usuario PULSOX®-1 273 Branchport Ave. Long Branch, N.J. 07740 - Microem Peavey ENVOY 112 User's Manual HeAvy duty JAck stANds - K Copyright © All rights reserved.
Failed to retrieve file