B-04 | Arista Smart System Upgrade (SSU)ΒΆ
OverviewΒΆ
Arista's Smart System Upgrade (SSU) is a feature to minimize traffic loss when upgrading from one SSU-supported EOS version to a newer SSU-supported EOS version. SSU is also referred to as βhitlessβ upgrades. The SSU feature allows a switch to maintain packet forwarding (Data Plane) while the Management and Control plane perform the OS upgrade.
Arista Smart System Upgrade
Additional information about this feature can be found in the Arista TOI for Smart System Upgrade
In our workshop lab topology you will see that each leaf in your pod is directly connected to the access point and RaspberryPi client. Traditionally, a firmware upgrade on the lead in the pod would cause the access point, wireless clients connected to the access point, and the raspberry pi client to lose network connectivity. In this lab, we will use Arista SSU on the leaf switch in your pod to perform a firmware upgrade minimizing any network disruptions for both wired and wireless clients.
PrerequisitesΒΆ
- Continuous POE should be configured to maintain POE power delivery to connected devices.
- Must be running an EOS version that includes the SSU feature.
- Must be upgrading to a new EOS version that also includes the SSU feature.
- Spanning-tree must be running in MST mode or disabled.
- Spanning-tree edge ports must have portfast and BPDUGuard enabled.
- Switches running BGP must be configured with
graceful-restartotherwise routes are lost and the hardware may fail to forward traffic.
CaveatsΒΆ
As you can imagine, disconnecting the Control and Management plane off the data plane will come with some caveats depending on what features you are running! While that's the case, most instances where SSU is valuable is where we have no resiliency or ability to "route around" the switch in maintenance.
- SSU only supports upgrades. Hitless image downgrades are not supported.
- If a new EOS version includes an FPGA upgrade, the FPGA upgrade will be suppressed. FPGA upgrades require a full reboot of a switch to apply.
- Some switch features, when in use, will prevent SSU from starting. See the Arista TOI for more details
First, download the desired EOS image to the switch flash storage using CloudVision Change Control.
For detailed instructions on using the Action Download File feature in Change Control, see the Change Control Action Download File Guide.
Perform Arista SSUΒΆ
Let's begin the hands-on portion of this lab. SSU can be triggered on the command line or via CloudVision. For this lab we will be triggering an SSU upgrade using the command line, preferably using the serial port of the switch. As in
Console via SSH
Using the console we will get a more in depth look at the logs as the switch upgrades. However, SSH is fine if you do not have a console cable. There is example output below you can refer to.
-
Connect to the
pod<##>-leaf1switch serial port (where ## is a 2 digit character between 01-20 that was assigned to your lab/Pod). The login username/password isarista/arista. Typeenableto enter privileged mode. -
Type
show versionto show the current running version of the switch.EOS Version
Smart System Upgrade (SSU) is supported on EOS versions 4.31.5M and later on CCS-710P-12/16P switches. In this example, the switch is currently running EOS-4.34.0F. This is a supported EOS version for Arista SSU.
show version
Example Output
Arista CCS-710P-16P Hardware version: 11.02 Serial number: WTW22220037 Hardware MAC address: 2cdd.e9fd.af50 System MAC address: 2cdd.e9fd.af50 Software image version: 4.34.0F Architecture: i686 Internal build version: 4.34.0F-41661064.4340F Internal build ID: 8346ed5e-061a-4a70-9c36-b6eee6fc0848 Image format version: 3.0 Image optimization: Strata-4GB Uptime: 2 hours and 5 minutes Total memory: 3952472 kB Free memory: 2193016 kB -
Type
dirto show the list of files in theflash:filesystem. You should note that there are some EOS image versions already on the flash storage of the leaf1a switch. Choose the latest EOS image version which is our target update version for this lab.dir
Example Output
Directory of flash:/ -rw- 25449761 Jul 21 2024 AristaCloudGateway-1.0.0-1.swix -rw- 32184348 Apr 14 21:54 AristaCloudGateway-1.0.2-1.swix -rw- 32566624 Jun 6 13:20 AristaCloudGateway-1.0.3-1.swix -rw- 18480 Jul 2 05:51 AsuFastPktTransmit.log -rw- 783781597 Dec 6 2021 EOS-4.29.1FX-710P-DHCP.swi -rwx 834335804 Jul 1 03:33 EOS-4.31.6M.swi -rwx 894262536 May 5 14:16 EOS-4.34.0F.swi -rw- 1531686247 Jul 2 03:42 EOS-4.34.1F.swi drwx 4096 Jun 15 00:24 Fossil -rw- 11360 Jul 2 05:51 SsuRestore.log -rw- 11360 Jul 2 05:51 SsuRestoreLegacy.log drwx 4096 Dec 6 2021 aboot -rw- 27 Jul 2 05:47 boot-config -rw- 32 Jul 2 05:47 boot-extensions drwx 4096 Jul 2 08:01 debug drwx 4096 Feb 4 2023 fastpkttx.backup drwx 16384 Dec 6 2021 lost+found drwx 4096 Jul 2 08:00 persist drwx 4096 Feb 4 2023 schedule -rw- 5224 Jul 2 05:47 startup-config drwx 4096 Jul 21 2024 tpm-data -rw- 0 Jun 29 20:50 zerotouch-config drwx 4096 Jun 23 20:34 ztp-debug 7527178240 bytes total (2043318272 bytes free) on flash: -
Type
show reload fast-boot. This command will show you an output of warnings or incompatibilities with the current configuration of the switch. As mentioned in the prerequisites section above, if any configuration is set in a way that prevents SSU from starting, the reasons will be listed here.show reload fast-boot ```
Example Output when SSU will proceed with caution. In this case, MLAG is compatible
pod00-leaf1a#show reload fast-boot Warnings in the current configuration detected: If you are performing an upgrade, and the Release Notes for the new version of EOS indicate that MLAG is not backwards-compatible with the currently installed version (4.34.1F), the upgrade will result in packet loss. Mlag is configured -
Now that we have confirmed our configuration is ready to allow SSU, let's prepare the switch for the upgrade process by setting the new boot image in the configuration of the switch. Issue the following commands:
β² Setting the boot flash will take a few seconds!
configure boot system flash:EOS-4.34.1F.swi exit write
-
Before we apply the new firmware, let's start a ping test which will run during the switch upgrade process. We will see that the ping traffic will continue to flow through the switch even while its software is being upgraded.
- Please make sure that your laptop is connected to the wireless network called
ATD-##-PSK. Use the PSK you configured in the previous lab to associate with this wireless network. -
Open a terminal and ping your gateway using the commands below. If you want to increase the interval to get more granular results, feel free!
MAC OS
Open Terminal and run the following, please replace ## with your pod number (1-12)
-
Now leave this window open for the following steps. We will see ping packets being sent and received every second. You are now pinging the gateway IP address for your pod from your wireless device connected to your pods access point. The ping traffic must traverse the
leaf1switch to reach the gateway. We should be able to observe how traffic is affected while the switch is upgrading during SSU.
- Please make sure that your laptop is connected to the wireless network called
-
Now, in a standard firmware upgrade process, you would issue a normal reload command. However, in this lab, we want to trigger a SSU upgrade. This is where we use the command below, go ahead and issue that command now. π
-
As the SSU process proceeds, you can watch the output on the serial console showing the switch preparing itself for reboot. The switch will reboot shortly, and you should see the normal output of a switch reboot.
ASU vs SSU
During the SSU reboot process, you may see messages referring to Arista Smart Upgrade, or ASU. ASU is a previous version of SSU, and some references to ASU still exist in code for the SSU process.
-
When you see the following message in your serial console of the switch, the switch is now rebooting. # yaml reloading /mnt/flash/EOS-4.34.1F.swi ```
-
SSH to the switch is not possible since the management plane of the switch is rebooting. However, the dataplane is still functional. Open the ping terminal window we started in
step 6and note that ping packets are still being sent and received even though the switch is in the middle of its reboot process. -
Below is the output of the full process, with highlights on terminal messages indicating the progress of the upgrade.
Example Output
pod00-leaf1#reload fast-boot now Running AsuPatchDb:doPatch( version=4.34.0F-38783123.4315M, model=Strata ) #(1)! Optimizing image for current system - this may take a minute... No warnings or unsupported configuration found. 2024-07-31 17:51:14.459848 Kernel Files /mnt/flash/EOS-4.34.1F.swi extracted from SWI #(2)! 2024-07-31 17:51:16.439052 ProcOutput passed to Kernel ['crashkernel=512-4G:45M,4G-8G:59M,8G-32G:89M,32G-:121M', 'nmi_watchdog=panic', 'tsc=reliable', 'pcie_ports=native', 'reboot=p', 'usb-storage.delay_use=0', 'pti=off', 'crash_kexec_post_notifiers', 'watchdog.stop_on_reboot=0', 'mds=off', 'nohz=off', 'printk.console_no_auto_verbose=1', 'CONSOLESPEED=9600', 'console=ttyS0', 'gpt', 'Aboot=Aboot-norcal6-6.2.1-2-25288791', 'net_ma1=pci0000:00/0000:00:12.0/usb1/.*$', 'platform=raspberryisland', 'scd.lpc_irq=3', 'scd.lpc_res_addr=0xf00000', 'scd.lpc_res_size=0x100000', 'block_flash=pci0000:00/0000:00:14.7/mmc_host/.*$', 'block_usb1=pci0000:00/0000:00:12.0/usb1/1-1/1-1.1/.*$', 'block_usb2=pci0000:00/0000:00:12.0/usb1/1-1/1-1.4/.*$', 'block_drive=pci0000:00/0000:00:11.0/.*host./target.:0:0/.*$', 'sid=RaspberryIsland16', 'log_buf_len=2M', 'systemd.show_status=0', 'sdhci.append_quirks2=0x40', 'amd_iommu=off', 'nvme_core.default_ps_max_latency_us=0', 'SWI=/mnt/flash/EOS-4.34.1F.swi', 'arista.asu_hitless'] Proceeding with reload No qualified FPGAs to upgrade #(3)! waiting for platform processing ..........................................................ok Shutting down packet drivers 2024-07-31 17:52:54.117479 bringing fab down 2024-07-31 17:52:54.437128 bringing fifo down reloading /mnt/flash/EOS-4.34.1F.swi #(4)! Shutting down management interface(s) 1 block umount: /mnt/flash: target is busy. [71576.090602][T15227] kexec_core: Starting new kernel [ 2.363505][ T296] Running e2fsck on: /mnt/flash [ 2.803117][ T303] e2fsck on /mnt/flash took 1s [ 3.116739][ T370] Running e2fsck on: /mnt/crash [ 3.181814][ T375] e2fsck on /mnt/crash took 0s Mounting SWIM Filesystem Optimization Strata-4GB root squash found Optimization Strata-4GB all squashes found Mounting optimization Strata-4GB #(5)! Switching rootfs Welcome to Arista Networks EOS 4.34.1F Architecture: i386 [ 43.938521] sh[2099]: Starting EOS initialization stage 1 Starting NorCal initialization: [ OK ] [ 48.023047] sh[2181]: Starting EOS initialization stage 2 Completing EOS initialization (press ESC to skip): [ OK ] Model: CCS-710P-16P Serial Number: WTW22200366 System RAM: 3952504 kB Flash Memory size: 7.1G pod00-leaf1 login: Wait for the SSU process to complete. This can take up to 10 minute. When you see the following console message, the switch management plane has finished its reboot. pod00-leaf1 login:- Remember the mention, there may still be some mention of ASU in the code. This is the SSU process kicking off.
- Extracting the boot image we configured
- No FPGAs to upgrade, recall if there were, this would require a full reload!
- Reloading the management and control plane to the new software image
- Mounting the software on to the hardware,
Stratain this case is the family of ASICs
-
You can now login with the username/password and type
enableto get back to privileged commands mode. Check the new current running version of the switch with the commandshow version. You should see the switch has upgraded toEOS-4.34.1FExample Output
Arista CCS-710P-16P Hardware version: 11.04 Serial number: WTW23350461 Hardware MAC address: 2cdd.e9f6.e9f2 System MAC address: 2cdd.e9f6.e9f2 Software image version: 4.34.1F Architecture: i686 Internal build version: 4.34.1F-37710335.4314M Internal build ID: d26721db-c526-41ec-bf9d-0a14b4edfcf5 Image format version: 3.0 Image optimization: Strata-4GB Uptime: 5 minutes Total memory: 3952504 kB Free memory: 2880904 kB -
After the management plane boots up, there are still some processes running before SSU can be considered successful. Run the following command to watch for the log message indicating SSU is fully successful. You should see the message
reload hitless reconciliation completeabout 2 minutes after the switch completes its reload.show log follow | inc hitless
-
As our final step, take another look at the terminal window that was running the consistent pings. You should see pings continue to flow without issue during the upgrade. Only towards the end of the process you may see 1 or 2 pings lost as the ASIC reconnects to the updated management plane.
800 ms Cutover ποΈ
The below example output was sending pings every 100ms, as you see below the cutover time to the new EOS image disrupted the dataplane for all of 700-800ms!! This is fast enough you may not even notice the disruption on an active zoom call!
Example Output
me@MacBook-Pro ~ % ping -i 0.1 10.0.111.1 64 bytes from 9.9.9.9: icmp_seq=0 ttl=51 time=10.974 ms 64 bytes from 9.9.9.9: icmp_seq=1 ttl=51 time=10.147 ms 64 bytes from 9.9.9.9: icmp_seq=2 ttl=51 time=10.583 ms 64 bytes from 9.9.9.9: icmp_seq=3 ttl=51 time=10.657 ms ... truncated for brevity 64 bytes from 9.9.9.9: icmp_seq=4885 ttl=51 time=10.586 ms 64 bytes from 9.9.9.9: icmp_seq=4886 ttl=51 time=10.954 ms 64 bytes from 9.9.9.9: icmp_seq=4887 ttl=51 time=10.632 ms Request timeout for icmp_seq 4888 Request timeout for icmp_seq 4889 Request timeout for icmp_seq 4890 Request timeout for icmp_seq 4891 Request timeout for icmp_seq 4892 Request timeout for icmp_seq 4893 Request timeout for icmp_seq 4894 Request timeout for icmp_seq 4895 64 bytes from 9.9.9.9: icmp_seq=4896 ttl=51 time=11.481 ms 64 bytes from 9.9.9.9: icmp_seq=4897 ttl=51 time=11.637 ms 64 bytes from 9.9.9.9: icmp_seq=4898 ttl=51 time=11.237 ms 64 bytes from 9.9.9.9: icmp_seq=4899 ttl=51 time=10.859 ms 64 bytes from 9.9.9.9: icmp_seq=4900 ttl=51 time=10.595 ms 64 bytes from 9.9.9.9: icmp_seq=4901 ttl=51 time=10.516 ms -
We see from this example output above, only 8 pings were lost (100 ms each) at the end of the process near the 10 minute mark after starting the ping test. If you we're pinging at 1 per second, you should see 1 maybe 2 pings drop.
-
As a last item to leave you with, you can validate your reload reason to confirm why the switch last reloaded using the command below.
show reload cause
π€ AI Lab AssistantΒΆ
Want to automate all the commands above? Use our embedded AI agent to execute the entire lab automatically!
π B04 Lab Automation Agent
Let the AI handle the SSU upgrade while you focus on learning the concepts!