Home
RoCE with Priority Flow Control Application Guide
Contents
1. 9 1 4 1 Untagged Ere E Ie ede iens 10 1 442 802 4G hes e eut 10 1 5 Priority 11 1 5 1 Developing 13 1 6 eu nouem eii aci ine 14 SPOMSlAliStiCS 15 18 DGBX Consideration iiie 15 2 and PFC Example 16 2 1 Best Test Bed Configuration enne 16 2 2 5 Switch Configuration nnne renis 16 2 3 Host Gorfig ration niece peter 17 2 4 Verification Procedures x teet TE 19 2 4 4 Network Protocol ICMP enne nensi nnne nnne 19 2 4 2 RoCE Performance Verification essen 19 243 Port Priority Counters entente theres nnns sinn entere 20 3 Various Switch 21 31 Mellanox SwitchX Based 21 22 JArista Switehes EO O 21 3 3 Gisco Nexus 5020 22 23 42 7 dt mte cett
2. E If the interface is configured to work with 802 1q VLAN tags it is possible to enable flow control either with global pause or with To configure a VLAN interface Step 1 Verify VLAN support is enabled by the kernel Usually this requires loading the 802 1q module Run modprobe 8021q Step 2 Adda VLAN device PFC cannot be used when using an interface without VLAN Run vconfig add interface name vlan id For example vconfig add 1 100 Step 3 Assign an IP address to the VLAN interface This creates a new entry in the GID table as index 1 ifconfig interface name vlan id ip netmask eral 100 10 10 10 10 24 Step 4 For applications specify only the IP address of a VLAN device in order for the traffic to go with the VLAN tagged frames 4 Mellanox Technologies Running RoCE Over L2 Network Enabled with PFC Application Guide Rev 1 5 gt To configure the mlx4_en Ethernet driver to support PFC Priority based Flow Control policy on TX and RX 7 0 The parameters of m1x4 en pfctx policy on TX 7 0 Per priority bit mask default is 0 pfcrx PFC policy on RX 7 0 Per priority bit mask default is 0 Each bit of the pfctx and pfcrx represents a priority level 0 7 To turn on on priority 0 use 0x1 to turn on PFC on priority 1 use 0x2 to turn on PFC on all prior
3. for example cat sys class infiniband mlx4 0 ports 1l counters port xmit packets 1740380 gt traffic is not shown in the associated Etherent device s counters since it is offloaded by the hardware and does not go through Ethernet network driver 1 8 DCBX Consideration It is possible to turn on LLDP with DCBX TLVs for auto PFC configuration from the switch to the host To do that the LLDP protocol should be turned on on the switch and on the host in addition DCBX TLVs should be enabled on both switch and host N 15 Mellanox Technologies 1 5 RoCE PFC Example Setup 2 1 2 2 RoCE and PFC Example Setup The objective of the example in this chapter is to run RoCE over L2 with PFC enabled This example configured PFC with priority 3 enabled Note the solution in this chapter is described and discussed in Mellanox Community Solutions space http community mellanox com docs DOC 1414 More enhanced solution for RoCE lossless and TCP lossy flows configured over L2 Ethernet network enabled with PFC can be found in this link http community mellanox com docs DOC 1415 Best Test Bed Configuration It is recommended to set up the network as follows e Mellanox Ethernet switch e g SX1036 MLNX OS version 3 3 4304 e 3x Hosts OS RH6 4 e 3x ConnectX 3 MLNX_OFED 2 1 Figure 2 Network Setup VLAN100 1 2 MLNX OS Switch Configuration gt Configur
4. gt To configure global pause on Cisco Nexus 5020 Step 1 Enter configuration mode Run switch configure terminal Step 2 Create the MAC ACL and enter ACL configuration mode Run switch config mac access list name Step 3 Creates a rule in the MAC ACL Run switch config mac acl sequence number permit deny source destination protocol Step 4 Create a named object that represents a class of traffic Run switch config class map type qos class name E Class map names can contain alphabetic hyphen or underscore characters are case sensitive and can be up to 40 characters N 23 Mellanox Technologies 1 5 Various Switch Configuration Step 5 Step 6 Step 7 Step 8 Step 9 Step 10 Step 11 Step 12 Configure a traffic class by matching packets based on the ACL name Run switch config cmap qos match access group name acl name Create a named object that represents a set of policies that are to be applied to a set of traffic classes Run switch config cmap qos policy map type qos policy name E gt Policy map names contain alphabetic hyphen underscore characters case sensitive and can be up to 40 characters Create class Run switch config cmap qos class class name Configure one or more QOS group values to match for classification of traffic into this class map Run switch config pmap c qos
5. dcbx version ieee v2 5 you may try cee instead conf if ten 1 1 4 no shut 1 1 conf interface vlan 10 vlan ID that you have the port setup for i Eyl 0 tagged ten 1 1 the port connected to the Mellanox adapter 26 Mellanox Technologies
6. mM lt un 99 99 Sa a NINI NI Mellanox Technologies Refer to the MLNX_OFED User Manual for additional information Rev 1 5 Overview 1 6 Performance To verify RoCE is working and performed as expected run a benchmark test such as ib_write_bw or any other test gt To run ib_write_bw run the following on the server side ib write bw For example For example Loews htm 1 Re 5 And the following command on the client side ib_write bw lt server name gt Jews Obi R Gone RDMA Write BW Test Dual port OFF Device mic Number of qps g il Transporti 2 iB Connection type RC Using SRQ TX depth 128 CQ Moderation 100 Mtu 1024 B Link type Ethernet Gid index 8 0 inline data O B rdma_cm QPs TON Data ex method rdma_cm local address LID 0000 0x05a8 PSN O0Ox8bf4f2 GED 2542128200500 00 S002 remote address LID 0000 0x059f PSN 0 42 9 2547228 00 007 007003 246182 120s be bytes iterations BW peak Gb sec BW average Gb sec MsgRate Mpps 65536 5000 36 5 99 36 58 0 069764 For additional information on this command and other performance commands refer to the Performance Tuning guide on Mellanox com located at http www mellanox com page products dyn product familyz27 amp mtag linux d
7. over its allocated size Step 14 Enter system class configuration mode Run switch config system qos Step 15 Specify the policy map to use as the service policy for the system Run switch config sys qos service policy type qos input policy name E policy map configuration has three modes network qos network wide system QoS mode qos classification mode system QoS input or interface input only e queuing queuing mode input and output at system QoS and interface Step 16 Create a policy Run switch config sys qos service policy type network qos policy name Step 17 Specify the interface to be changed Run switch config interface type slot port Step 18 Enables LLC for the selected interface Set receive and or transmit on or off Run flowcontrol receive on off transmit on off This example tags all traffic as lossless switch configure terminal switch config mac access list test switch config mac acl 10 permit any switch config class map type qos testl switch config cmap qos match access group name test switch config cmap qos policy map type qos testl switch config cmap qos class testl switch config pmap c qos set qos group 4 switch config class map type network gos testl switch config cmap nq match qos group 4 switch config cmap nq policy map type network qos testi switch config cmap nq class type network qos testl s
8. 0505020 Sip 202 UP We x 11 Mellanox Technologies Rev 1 5 Overview Ue 2 Ue Sigex uos 0 Skpr ronk 2 tesa 9 SOLOS 3 4 24 Sigexios 5 osea 15 7 Gigewios 9 110 111 egestito S 1 sigoieste g 113 esses 15 skprio 0 vlan 100 Sigese3tog 1 100 2 100 6 skprio 3 vlan 100 akor tog c wile 10 0 ross 22 skprio 5 vlan 100 sexes G vilem L00 toss 16 akorio 7 triana 10 0 URETA Ui 5 6 Ue y Step 2 For TCP IP application map the user priority to the VLAN priority egress from the device using the command vconfig set egress map vconfig set egress map vlan device skb priority vlan qos The outbound packets with a particular SKB priority are tagged with a particular VLAN priority The default VLAN priority is 0 This command is applied to specific VLAN For example for i in 0 7 do vconfig set egress map eth1 100 i 3 done Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egr
9. 11 11 100 2 ICMP Echo ping request 30 12 999909 11 11 100 2 11 11 100 1 ICMP Echo ping reply 31 13 999879 11 11 100 1 11 11 100 2 ICMP Echo ping request 32 13 999930 11 11 100 2 11 11 100 1 ICMP Echo ping reply 33 14 999877 11 11 100 1 11 11 100 2 1CMP Echo ping request 34 14 999936 11 11 100 2 TRIER ERIT N ICMP Echo ping reply 35 15 999853 11 11 100 1 11 11 100 2 Echo ping reque 36 15 999892 11 11 100 2 11 11 100 1 Echo ping reply 37 16 999878 11 11 100 1 11 11 100 2 ICMP Echo ping request 38 16 999915 11 11 100 2 11 11 100 1 ICMP Echo ping reply 39 17 999877 11 11 100 1 11 11 100 2 Echo ping request 40 17 999914 11 11 100 2 11 11 100 1 1CMP Echo ping reply Frame 35 102 bytes on wire 102 bytes captured Ethernet II Src Mellanox 10 60 01 00 02 9 1 60 01 Dst Mellanox ef f4 82 00 02 c9 ef f4 82 b Destination Mellanox ef f4 82 00 02 c9 ef f4 82 b Source Mellanox 1c 60 01 00 02 9 1 60 01 Type 802 10 Virtual LAN 0x8100 802 10 Virtual LAN PRI 3 CFI 0 ID 100 Priority 3 GI 0000 0110 0100 ID 100 Type IP 0x0800 2 4 2 RoCE Performance Verification To test that RoCE is running over the configuration setup Direct RoCE traffic from two servers S2 and 53 directed to one server S1 Figure 4 RoCE Test Setup VLAN100 1 1 gt Run the following performance tests Step 1 On host S1 run do wei
10. 9 hw_ver 0 1 Mellanox Technologies f Running RoCE Over L2 Network Enabled with PFC Application Guide 1 5 board id MT 1090120019 lays 2 POCUS 1 State PORT ACTIVE 4 max mtu 4096 5 active mtu 1024 3 suu 130615 0 pore lige 0 jose Mies 0x00 link layer Ethernet DOI 5 2 State PORT ACTIVE 4 max mtu 4096 5 MENE 4096 5 sim 1121615 4 pore liel 3 Port Imes 0x00 Lims laysrs InfiniBand If it is InfiniBand then run connectx port config to change the ports designation to Ethernet Step 3 Configure the IP address of the interface so that the link becomes active All InfiniBand verb applications running over InfiniBand verbs must work on RoCE links if they use GRH headers if the use of GRH is specified in the address vector 1 4 Transport Modes RDMA encapsulated in an Ethernet frame can be configured as 802 1qq tagged or untagged Global pause and PFC cannot run together on the same host If running one server with two adapter cards each with 2 ports all ports work in PFC or global pause PFC and global configuration may mislead If PFC is enabled global pause does not work even though it could also be enabled To make sure global pause is working make sure PFC is disabled and global pause is enabled Unlike PFC global pause is cannot be configured globally Table 2 PFC Global Pause Configuration Relation PFC Configuration Gl
11. DOCUMENTATION EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE Mellanox TECHNOLOGIES Mellanox Technologies 350 Oakmead Parkway Suite 100 Sunnyvale CA 94085 U S A www mellanox com Tel 408 970 3400 Fax 408 970 3403 Copyright 2014 Mellanox Technologies All Rights Reserved Mellanox Mellanox logo BridgeX ConnectX Connect IB CoolBox CORE Direct InfiniBridge InfiniHost InfiniScale MetroX MLNX OS TestX PhyX ScalableHPC SwitchX UFM Virtual Protocol Interconnect and Voltaire are registered trademarks of Mellanox Technologies Ltd ExtendX FabricIT HPC X Mellanox Open Ethernet PeerDirect Mellanox Virtual Modular Switch MetroDX Unbreakable Link are trademarks of Mellanox Technologies Ltd All other trademarks are property of their respective owners SOUS U Mellanox Technologies Confidential Contents Rev 1 5 Contents About this 6 REVISION 5 a pea 7 1 OV SRV IC WW 8 11 Software Dependencies 8 12 Firmware 04 1 8 1 9 General Guidelines iniecta e edet Hie 8 1 4 Transport 0
12. Mellanox TECHNOLOGIES Connect Accelerate Outperform Running RoCE Over L2 Network Enabled with PFC Application Guide Rev 1 5 www mellanox com Mellanox Technologies Rev 1 5 Contents NOTE THIS HARDWARE SOFTWARE OR TEST SUITE PRODUCT PRODUCT S AND ITS RELATED DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES AS IS WITH ALL FAULTS OF ANY KIND AND SOLELY FOR THE PURPOSE OF AIDING THE CUSTOMER IN TESTING APPLICATIONS THAT USE THE PRODUCTS IN DESIGNATED SOLUTIONS THE CUSTOMER S MANUFACTURING TEST ENVIRONMENT HAS NOT MET THE STANDARDS SET BY MELLANOX TECHNOLOGIES TO FULLY QUALIFY THE PRODUCTO S AND OR THE SYSTEM USING IT THEREFORE MELLANOX TECHNOLOGIES CANNOT AND DOES NOT GUARANTEE OR WARRANT THAT THE PRODUCTS WILL OPERATE WITH THE HIGHEST QUALITY ANY EXPRESS OR IMPLIED WARRANTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT ARE DISCLAIMED IN NO EVENT SHALL MELLANOX BE LIABLE TO CUSTOMER OR ANY THIRD PARTIES FOR ANY DIRECT INDIRECT SPECIAL EXEMPLARY OR CONSEQUENTIAL DAMAGES OF ANY KIND INCLUDING BUT NOT LIMITED TO PAYMENT FOR PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES LOSS OF USE DATA OR PROFITS OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY FROM THE USE OF THE PRODUCT S AND RELATED
13. allowed vlan vlan id native vlan vlan id Step 7 Set PFC mode for the selected interface Specify auto to negotiate PFC capability Specify on to force enable PFC Run switch config if priority flow control mode auto on Step 8 Optional Enable IEEE 802 3x link level flow control for the selected interface Set receive and or transmit on or off Run switch config if flowcontrol receive on off transmit on off Step 9 Enable the VLAN The default value is no shutdown or enabled You cannot shut down the default VLAN VLANI or VLANs 1006 to 4094 Run switch config vlan no shutdown The following is an example for how to configure two ports switch configure terminal switch config vlan 50 switch config vlan name roce switch config interfac thernet 1 3 switch config if switchport mode trunk switch config if switchport trunk allowed vlan 50 switch config if priority flow control mode on switch config if flowcontrol receive on transmit on switch config vlan state active switch config vlan no shutdown switch config interfac switch config if switchport mode trunk switch config if switchport trunk allowed vlan 50 switch config if priority flow control mode on switch config if flowcontrol receive on transmit on switch config vlan state active switch config vlan no shutdown
14. andard practices for preventing accidents before you work on any equipment Mellanox Technologies Running RoCE Over L2 Network Enabled with PFC Application Guide 1 5 Revision History Table 1 Document Revision History Revision Date Description 1 5 Nov 2014 Added Force 10 4810 Dell switch configuration example 1 4 Sep 2014 Added Comware 7 HP switch configuration example 1 3 May 2014 Minor updates related to the need of MLNX_OFED for RoCE 1 2 January 2014 Updated configuration flows and description 1 1 2011 First revision 7 Mellanox Technologies Rev 1 5 Overview 1 1 1 2 1 3 Overview RDMA over Converged Ethernet RoCE enables InfiniBand transport over Ethernet networks It encapsulates InfiniBand transport and GRH headers in Ethernet packets using an IEEE assigned Ethertype Classic Ethernet is a best effort protocol in the event of congestion Ethernet discards packets and relies on higher level protocols to provide retransmission and other reliability mechanisms IEEE 802 3x pause allows a congested receiver to signal the other side of the link to pause transmission for a short period of time Pause functionality is applied to all the traffic on the link Priority Flow Control PFC IEEE 802 1Qbb applies pause functionality to specific classes of traffic on the Ethernet link For example PFC can provide lossless service for the RoCE tra
15. c net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Mellanox Technologies Running RoCE Over L2 Network Enabled with PFC Application Guide Rev 1 5 2 4 Verification Procedures 2 4 1 Network Protocol ICMP A basic sanity test would be to ping two servers and see that the ping ICMP is running over the desired priority on the network In Figure 3 you can see the ICMP packets carrying VLAN 100 and priority bit 3 Figure 3 ICMP Packets Ve r vnc08 mtr labs minc45 ophirm e gt eth4 not tcp port 33246 and ip host 10 208 0 121 and tcp port 22 and ip host 10 213 10 1 Wireshark lt reg r vrt 010 gt 5 File Edit View Go Capture Analyze Statistics Telephony Tools Help uaszxe nes stt iEEeeemiawms Filter Expression Clear Apply No Time Source Destination Protocol Info 2 26 10 999919 11 11 100 2 11 11 100 1 Echo ping reply 27 11 999877 11 11 100 1 11 11 100 2 Echo ping request 28 11 999921 11 11 100 2 11 11 100 1 ICMP Echo ping reply 29 12 999852 11 11 100 1
16. d restart Step 3 Verify PFC is enabled Run RX cat sys module mlx4 en parameters pfcrx printf 0 SRX 0x08 Step 4 Configure VLAN interface Run modprobe 8021q vconfig add eth1 100 i LEGomitnc Stal 100 lil ll lOO 1 24 we Step 5 Map skb_prio to UP Run CG urep py i 3 3 3735 3 37 39 Sy So Sr Sp Bp Sp By Up wie 4 Ue 2 Ue 3 kprio kprio kprio kprio korion pio 10 11 kd ord Me OS Mellanox Technologies 9 24 Geese 106 Onl 63 Sy gt 9 9 0 1 5 RoCE and PFC Example Setup skprio 13 1 4 2190320 15 skprio 0 vlan 100 db 00 Skpr iori 22 LOOM Osmo akorio 92 Vlam 100 skprio 4 vlan 100 tos 24 skprio 5 vlan 100 5 16 skprio 7 vlan 100 Ue 4 5 WP Que Step 6 Set Egress map of the VLAN for in 0 7 do vconfig set egress map eth1 100 51 3 done Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in pro
17. e 26 3 5 Dell 10 54810 arn ie eerte tesi 26 3 Mellanox Technologies Confidential Rev 1 5 Contents List of Figures Figure 1 IP ToS to SKB Priority Static Mapping enne 13 Figure 2 Network 16 Figure 3 IG MP Packets ter 19 Figure 4 RoCE Test 020 400 19 a Mellanox Technologies Confidential Contents Rev 1 5 List of Tables Table 1 Document Revision 7 Table 2 PFC Global Pause Configuration 22 444400 9 N 5 Mellanox Technologies Confidential Rev 1 5 Overview About this Manual This manual describes how to configure RoCE on Mellanox adapters with a lossless transport layer PFC or global pause Audience This manual is intended for server and network administrators who intend to configure RoCE applications Document Conventions The following lists conventions used in this document 4 NOTE Identifies important information that contains helpful suggestions CAUTION Alerts you to the risk of personal injury system damage or loss of data WARNING Warns you that failure to take or avoid specific action might result in A personal injury or a malfunction of the hardware or software Be aware of the hazards involved with electrical circuitry and be familiar with st
18. e the SX1036 as follows Step 1 Create and configure the required VLAN interface on the switch switch config interfac thernet 1 1 1 3 switchport mode hybrid switch config interface ethernet 1 1 switchport hybrid allowed vlan gui switch config interfac thernet 1 2 switchport hybrid allowed vlan all switch config interface ethernet 1 3 switchport hybrid allowed vlan all Step 2 The following switch configuration should be added to the Switch switch config dcb priority flow control enable switch config priority flow control priority 3 enable switch config interface ethernet 1 1 1 3 dcb priority flow control mode on force Mellanox Technologies Running RoCE Over L2 Network Enabled with PFC Application Guide Rev 1 5 Step 3 Verify your configuration Run switch config show dcb priority flow control PFC enabled Priority Enabled List 259 Priority Disabled List 890 X Z2 4 y Jue Lossless 0 N JL in 2 3 N Interface PFC admin PFC oper 1 1 On Enabled 1 2 On Enabled 125 Enabled switch config 2 3 Host Configuration gt To configure the servers in the network the following switch configuration should be applied to each host in the setup Step 1 Enable PFC Add the following to the file etc modprobe d mlx4 en conf options mlx4 en pfctx 0x08 pfcrx 0x08 Step 2 Restart openidb daemon Run etc init d openib
19. eatures cannot be configured together on the same interface Refer to the Ethernet Quality of Service QoS section of the MLNX OS User Manual for more information 3 2 Arista Switches EoS The flow below describes how to configure PFC or global pause on Arista switches via EoS To configure PFC on Arista switches Step 1 Set DCBX mode Run switch config if Et10 dcbx mode ieee Step 2 Enable PFC on specific interface Run switch config if Et10 priority flow control mode 21 Mellanox Technologies Rev 1 5 Various Switch Configuration Step 3 Set priority X as lossless no drop Run SLE CoOMmrigGeLir TEl priority priority DX noeros The following is an example for how to configure PFC with enabled on port Et10 for priority 3 switch config interface etl switch config if Etl f dcbx mode ieee switch config if Etl priority flow control mode on Switch Config 1f Etl priority flow control priority no drop Sucot gt To configure global pause on Arista switches Step 1 Enable global pause per interface receive Run switch config if Et10 flowcontrol receive on Step 2 Enable global pause per interface send Run switch config if Et10 flowcontrol send on The following is an example for how to configure global pause on port 1 1 switch config interface etlo switch config if Et10 4 f
20. ess mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 Set egress mapping on device eth1 100 Should be visible in proc net vlan eth1 100 To verify the configuration run see in boldface cat proc net vlan eth1 100 moal LOO waning 100 ipie 512088 1 Mellanox Technologies Rev 1 5 468 ID RDMA_OPTION_ID_TOS amp tos sizeof tos Ok 0 3 1 total headroom inc total encap on xmit eth4 SS priority mappings EGRESS priority mappings total bytes received RDMA_OPTION Broadcast Multicast Rcvd Figure 1 ToS to SKB Priority Static Mapping total frames received total frames transmitted total bytes transmitted ption id INGRE Device The application rdma_cm must choose a value for IP TOS according to the desired TC and call the rdma_set_o Maping of IP ToS to SKB priority kernel priority is static and cannot be modified by the user Developing RDMA Applications method Running RoCE Over L2 Network Enabled with PFC Application Guide 1 5 50 N a A m 1 1 TPN OLR of DIO SAIN
21. ffic and best effort service for the standard Ethernet traffic PFC can provide different levels of service to specific classes of Ethernet traffic using IEEE 802 1p traffic classes This document focuses on the configuration of RoCE with a lossless transport layer Software Dependencies To use over Mellanox ConnectX hardware the MLNX OFED is recommended to be installed Inbox drivers In RHEL 6 High Performance Network add on can be installed instead of the MLNX OFED Refer to the following link to additional information http www redhat com products enterprise linux add ons high performance network In RHEL 7 SLES 12 Ubuntu 14 04 the support is inbox Firmware Dependencies It is recommended to use the latest firmware available at Mellanox com site to use RoCE over Mellanox ConnectX adapter card family hardware General Guidelines Since RoCE encapsulates InfiniBand traffic in Ethernet frames the corresponding net device must be up and running In case of Mellanox hardware MLNX_OFED must be installed and mlx4 en must be loaded and the corresponding interface configured Step 1 Make sure that MLNX OFED is installed Step 2 Verify that the field Link layer is Ethernet ibv devinfo hca id mlx4 0 transport InfiniBand 0 fw ver 2 5 30 01010 0002 903 00 4 0 Sys image guid 0002 903 00 4 3 vendor id 0 02 9 1015 409
22. ities use Oxff gt Step 1 Change the values of pfctx and pfcrx in the line below in the file etc modprobe d mlx4 en conf create the file if it does not exist options mlx4 en pfctx 0x08 pfcrx 0x08 Step 2 Restart the network driver Run etc init d openibd restart To show the value of the PFC parameters in the driver run RX cat sys module mlx4 en parameters pfcrx printf 0 SRX 0x8 TX cat sys module mlx4 en parameters pfctx printf 0 STX 0x8 E The values of p ctx and p crx should be set according to the priority you need the flow control to have 1 5 Priority Mapping There are two ways to map the kernel priority skb_prio to the user priority UP that is attached to the VLAN tag on the Ethernet frame 1 traffic priority mapping kernel bypass 2 TCP IP traffic priority mapping Both mapping are required in case there are two flows from the host gt Map SKB priority to user priority Step 1 SKB priority to User priority UP for applications 16 SKB priority values are available and each is mapped to a single value in the range of 0 7 This command will be applied for all traffic for all VLANs configured on the host For example EC wiriep Sill u 0 1 Ue Ur lp Sn Sn Se Sr Gp Therefore to map all SKB priorities to a specific egress VLAN priority e g 3 WC ew 3 37 Sp Sp Sn 9 Sp 20252
23. lowcontrol receive on switch config if Et10 flowcontrol send on E 2 and Flow Control features cannot be configured together on the same interface 3 3 Cisco Nexus 5020 The flow below describes how to configure PFC or global pause on Cisco Nexus 5020 switches To configure PFC on Cisco Nexus 5020 Step 1 Enter configuration mode Run switch configure terminal Step 2 Enter VLAN configuration sub mode If the VLAN does not exist the system first creates the specified VLAN Run switch config vlan vlan id vlan range Step 3 Name the VLAN Up to 32 alphanumeric characters may be used Run switch config vlan vlan id vlan range The names of VLANI the internally allocated VLANs cannot be changed The default value is VLANxxxx where xxxx represents four numeric digits including leading zeroes equal to the VLAN ID number Step 4 Specify the interface to configure and enters the interface configuration mode The interface can be a physical Ethernet port or a port channel Run switch config interface type slot port port channel number 22 Mellanox Technologies Running RoCE Over L2 Network Enabled with PFC Application Guide 1 5 Step 5 Configure the interface as a trunk port Run switch config if switchport mode trunk Step 6 Optional Configure necessary parameters for a trunk port Run switch config if switchport trunk
24. obal Pause Status Enabled per host Enabled per interface e g ethl PFC operates on all VLAN interfaces Enabled per host Disabled on all interfaces PFC operates on all VLAN interfaces Disabled per host Enabled per interface e g 1 Global pause operates on the enabled interface eth1 PFC does not operate Note This is the default configuration for all Ethernet interfaces Disabled per host Disabled on all interfaces Neither global pause nor PFC operate X Mellanox Technologies Rev 1 5 Overview 1 4 1 1 4 2 Untagged Ethernet In case of untagged Ethernet frames without a VLAN the port should be enabled with global pause flow control Verify that sys module mlx4 en parameters pfctx and pfcrx are set to 0 to enable global pause To enable or disable global pause run cuu OEE For example ethtool ethl rx on tx on gt To check the global pause status run ethtool a eth lt x gt For example ethtool a ethl Pause parameters for ethl Autonegotiate off RX on 802 10 VLANs Tagged Ethernet frames carry 3 bit priority field The value of this field is derived from the InfiniBand Service Level SL field by taking the 3 least significant bits of the SL field 4 bits For RoCE traffic to use VLAN tagged frames you need to specify the GID table entries that are derived from the VLAN devices when creating address vectors
25. obal pause on Mellanox systems based MLNX OS refer to http www mellanox com page ethernet_switch_overview gt To configure PFC on SwitchX based systems Step 1 Enable PFC globally Run switch config dcb priority flow control enable Step 2 Enable specific priority on the switch all ports Run switch config dcb priority flow control priority level enable Step 3 Enable specific PFC on specific interface Run switch config interfac thernet 1 1 dcb priority flow control mode on force The following is an example for how to configure PFC enabled on port 1 1 for priority 3 switch config dcb priority flow control enable switch config dcb priority flow control priority 3 enable switch config interface ethernet 1 1 dcb priority flow control mode on force switch config gt To configure global pause on Switch X based systems Step 1 Enable global pause per interface receive Run switch config interfac thernet 1 1 flowcontrol receive on Step 2 Enable global pause per interface send Run switch config interfac thernet 1 1 flowcontrol send on The following is an example for how to configure global pause on port 1 1 switch config interface ethernet 1 1 switch config interface ethernet 1 1 flowcontrol receive on switch config interface ethernet 1 1 flowcontrol send on switch config interface ethernet 1 1 and Flow Control f
26. river Additional options to test RoCE is via ibv rc pingpong command gt To run rc pingpong run the following on the server side ibv rc pingpong options For example mb 0 1 db And the following command on the client side ibv rc pingpong options server name For example alley re C miad db 20000 e 0 local address LID 0x0000 0 0005 PSN Oxcb5e18 GID fe80 f652 14ff fel7 1lfel remote address LID 0 0000 QPN 0x0005a3 PSN 0 86 929 GID fe80 f652 14ff fe17 1b81 8192000 bytes in 0 01 seconds 10400 89 Mbit sec 1000 iters in 0 01 seconds 6 30 usec iter For additional information refer to Performance Tuning Guidelines on Mellanox com 14 Mellanox Technologies Running RoCE Over L2 Network Enabled with PFC Application Guide Rev 1 5 1 7 Port Statistics It is possible to read port statistics in the same manner as regular InfiniBand ports The information is available from the sysfs at sys class infiniband lt device gt ports lt port number gt counters The supported counters are port rcv packets port xmit packets e port rcv data port xmit data E These counters count only InfiniBand data and are not account for Ethernet traffic gt To read the number of transmitted packets run cat sys class infiniband dev ports port counters port xmit packets
27. set qos group qos group value E The range of qos group value is 2 5 There is no default value be Associate a class map with the policy map and enter configuration mode for the specified system class Run switch config class type network qos class name Configure the traffic class by matching packets based on a list of QoS group values Run switch config cmap nq match qos group qos group value QOS group values range 0 5 QoS group 0 is equivalent to class default and QoS group is equivalent to class fcoe b QoS groups 0 and 1 are reserved for default classes and cannot be configured Create a named object that represents a set of policies that are to be applied to a set of traffic classes Run switch config cmap nq policy map type network qos policy name Associate a class map with the policy map and enters configuration mode for the specified system class Run switch config cmap nq class type network qos class name 24 Mellanox Technologies Running RoCE Over L2 Network Enabled with PFC Application Guide Rev 1 5 Step 13 Configure a no drop class Run switch config pmap nq c pause no drop pfc cos pfc cos value E If no parameter is specified the default policy is drop The range of pfc cos value 0 7 This option is supported only for an ACL based system class The drop policy is a simple tail drop where arriving packets are dropped if the queue goes
28. te loy report Gloucs 50 12500 d 10 lt 19 wite low R GILES 5 D 30 Step 2 host S2 run it HO weite ow gorte 1 11 100 1 joore 12500 p 10 Step 3 host 53 run weite bw SR eeeece colts 11 1 100 1 pot 125140 p 10 d 8 Mellanox Technologies Rev 1 5 RoCE PFC Example Setup 2 4 3 Port Priority Counters gt Check host port priority counters traffic and pause counters boldface ethtool 5 eth grep 3 INDICE mpackets Ex prio 3 bytes 424080 4823209 tx priok oi bytes S607294 rx pause prio 3 14812 rx pause duration prio 3 0 rx pause transition prio 3 0 T perse prio 3s O tx pause duration prio 3 47848 tx pause transition prio 3 7406 gt Check switch port priority counters traffic and pause counters boldface show interfaces ethernet 1 1 counters priority 3 333364 packets 333364 unicast packets 0 multicast packets 0 broadcast packets 362177148 bytes 14814 pause packets 49168 pause duration seconds TX 1518137011 packets unicast packets 6 multicast packets broadcast packets 368845148 bytes 0 pause packets 20 Mellanox Technologies Running RoCE Over L2 Network Enabled with PFC Application Guide Rev 1 5 3 Various Switch Configuration 3 1 Mellanox SwitchX Based Systems The flow below describes how to configure PFC or gl
29. witch config pmap nq c pause no drop switch config system qos switch config sys qos service policy type qos input testl switch config sys qos service policy type network qos testi switch config interfac thernet 1 2 switch config if flowcontrol receive on transmit 25 Mellanox Technologies Rev 1 5 Various Switch Configuration 3 4 HP Comware 7 To configure global pause on Comware 7 HP switch follow this example interface FortyGigE2 0 7 port link mode bridge port link type trunk port trunk permit vlan all flow control flow interval 5 To configure PFC on Comware 7 HP switch follow this example interface FortyGigE2 0 6 port link mode bridge port link type trunk port trunk permit vlan all priority flow control auto Control clio Mots S lidp tlv enable dotl tlv qos trust dotlp 3 5 Dell Force10 54810 To configure on 10 54810 Dell switch follow this example configure terminal enable conf f service class dynamic dotlp conf interface ten 1 1 the port connected to the Mellanox adapter conf if ten 1 1 description to NIC conf if ten 1 1 mtu 12000 conf if ten 1 1 portmode hybrid conf if ten 1 1 switchport conf if ten 1 1 pfc priority 3 or whatever priority you want to use 0 7 conf if ten 1 1 protocol lldp conf if ten 1 1
Download Pdf Manuals
Related Search
Related Contents
document papier - Site de l`académie de Grenoble Samsung HT-C5200 Instrukcja obsługi 第149期定時株主総会招集ご通知(PDF 684KB Information Power Parts Appliance Manual - LAM2202 Copyright © All rights reserved.
Failed to retrieve file