Quantcast
Channel: All Ethernet Switching posts
Viewing all articles
Browse latest Browse all 10307

Re: Software upgrade techniques for QFX10008 / MC-Lag

$
0
0

I guess you would need to better define what you [personnally] think "hitless upgrade" is.  If you truely mean hitless, like lose a few bits, then answer is no.  But if by hitless you mean, "little to no impact on user traffic (especially TCP based traffic)" then answer is more likely yes.

 

With MC-LAG (and any similar technology) what is sync'd is L2 MAC between the switches, not L3.  Each switch has its own separate control plane, which is one reason some prefer MC-LAG over say VC.  In VC you can sync both L2 and L3.  So VC failover should be less disruptive, but anytime you need to re-boot a device during the upgrade, you should expect some disruption.

 

For MC-LAG upgrades (my experience is with EX9200) you can minimize disruption via the following steps (IMHO):

 

NOTE: This assumes all connections are dual-homed via MC-LAG AE; no single-homed devices.

 

1.  Isolate the MC-LAG "Standby Peer" from "Master Peer".  Just go in and deactivate all interfaces on "Backup Peer", and commit.  This 'should' be MC-LAG hitless in that other Peer should now handle all traffic.

2.  Upgrade "Backup Peer" like any standalone node.  Easiest is to just load new SW on both REs and reboot entire switch.  Switch will come backup isolated running the new SW.

3. As synchronized as possible, bring all interfaces on "Master Peer" down, and then bring up all interfaces on [what was old] "Backup Peer".  Your outage time will depend upon how long it takes the switch to learn proper forwarding for both L2 (initially will be flood) and then L3; less routes, faster convergence.  I expect this time is generally between 30 sec to 120 sec from past experience.  Generally users see little to no impact from a user point of view.

4.  You can now run the network and test on this single "new Master Peer" with running the newer code.  You can perform this task for as little or as long as you like.  You always now have easy back-out to older code and set-up, but just re-enabling (delete deactivate) the Node running prior code and either isolating or powering off node with newer code. This is a nice option to have, IMHO.

5.  At some point in time you can now upgrade "old Master Peer" to newer code, reboot it, and then delete deactivate interfaces.  I would suggest you bring up just ICL/ICCP interfaces (or more likely shared AE; I only suggest this config option) and then after a little while, bring up the actually MC-LAG AE interfaces.  This will cause a slight "hitless" hit.  No so much due to MC-LAG nodes, but more so of the attached LAG'd switches.  It is much easier for a switch to react to link up to down, then it is for down to up, with not bit/byte/traffic loss.  Down/up is always harder to control.

 

I think the only really hitless upgrade is for standalone nodes that have RE/control plane based upon dual VMs (anything really new within Juniper generally has such an architecture) where you upgrade back-up VM to new code, and then perform a VM 'switch' for RE/control plane, and do not interupt the data plane at all.  This is quite hard to accomplish when the data plane (I/O modules) also have a processor that likely needs some microcode upgrade at same time RE/Control Plane requires OS upgrade.  You are bound to loss 'some bits'.

 

So what is your definition of "hitless", never been really defined well (IMHO) for last 15 years plus that people have thrown this [marketing?] term around.

 

Hopefully this may help you, maybe not.


Viewing all articles
Browse latest Browse all 10307

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>