Recently, I came across a challenge where I have to deploy QFX VCF as server farm switching block and configure it for connectivity with VMware ESXI hosts. Initially it seems very simple deployment and I configured LAG (Port Bundle / Ether channels for non-juniper folks) without LACP on ports connected with ESXI hosts and allowed VLANs as per server team requirements. Reference topology is as under:-
Port xe-0/0/0 on both switches configured as LAG (ae0) for data VLAN and port xe-0/0/45 on both switches configured as LAG (ae1) for ESXI host MGMT VLAN 11 and VM-Motion VLAN 12. During testing both LAGs did not work as per expected behavior. I, tried lot many things to figure out the problem and finally narrowed down it is not related to device behavior but it is related to design consideration how Juniper switches should be configured for connectivity with VMWare ESXI hosts. I tried to find some reference document from Juniper or VMWare but unfortunately could not find any document for Juniper & VMWare connectivity whereas for Cisco and HP network connectivity many documents were available publicly.
With extensive troubleshooting I finally got my focus on Load Balancing mechanism over the member interfaces of LAG and after burning mid night oil I finally found the root of this un-expected results and solution to this problem. Here are my findings:-
Juniper default load balancing over LAG member interfaces is based on “layer 2 payload” and it takes into consideration “SC IP”, “Destin IP”, SC Port” and Destin Port. In order to support similar behavior on VMWare ESXI hosts Active/ Active NIC teaming must be enabled with “Route Based on IP Hash” (reference VMWare KB 2006129). By Configuring correct load balacning on ESXI host Data VLAN LAG started working as per expected behviuor.
However , LAG cofigured for ESXI MGMT and VM-Motion was still not working with above described arrangments. Of further exploring the VMWare documnetation it was learned that Ling Aggregation or Bundling is not recommended for VM-Kernal which is used for VM-Motion.
Final configuration for VM-Motion and ESXI MGMT VLAN is not configure LAG on Juniper swithces and configure particpating physical interfaces as “Tag” interfaces by allowing both VM-Motion and ESXI MGMT VLANs and in addition ESXI MGMT VLAN must be allowed as native VLAN over the phyiscal interface (I am unable to under the reasons behind it). On ESXI side enable “Route Based on Orginating Port ID” which is defualt load balancing mechanism