The adoption of new Catalyst 9800 not only brought new ways to configure Wireless Controllers, but also several new mechanisms to troubleshoot them. It has some awesome troubleshooting features that will help to identify root causes faster spending less time and effort.
There are several new key troubleshooting differentiators compared to previous WLC models:
Let’s start by a user reporting wireless client connectivity issue. He was kind to provide the client mac address and timestamp for the problem, so scope starts already partially delimited.
First feature that I would use to troubleshoot is the Trace-on-failure. The Catalyst 9800 can keep track of predefined failure conditions and show the number of events per each one, with details about failed events. This feature allows to be proactive and detect issues that could be occurring in our network even without clients reporting them. There is nothing required for this feature to work, it is continuously working in the background without the need of any debug command.
How to collect Trace-on-failure:
show wireless stats trace-on-failure.
Show different failure conditions detected and number of events.
show logging profile wireless start last 2 days trace-on-failure
With those commands we can identify which are the failure events detected in the last few day by the controller, and check if there is any reported event for client mac and timestamp provided by the user.
In case that there is no event for user reported issue or we need more details I would use the next feature, Always-on-tracing.
The Catalyst 9800 is continuously logging control plane events per process into a memory buffer, copying them to disk regularly. Each process logs can span several days, even in the case of a fully controller.
This feature allows to check events that have occurred in the past even without having any debugs enabled. This can be very useful to get context and actions that caused a client or AP disconnections, to check client roaming patterns, or the SSIDs where client had connected. This is a huge advantage if we compare with previous platforms where we had to enable “debug client” command after issue occurred and wait for next occurrence.