We've been having intermittent issues with close to 100% CPU usage and a sort of live lock up where the Fortigate responds, but packets do not flow (some of the scanning engines (ipsengine, or ipsmonitor) monopolise CPU time and need to be restarted several times [three, that's the magic number - diagnose test application ipsmonitor 99 - and wait several minutes between attempts] - or the unit rebooted, with resulting network chaos). And packets, like the spice, must flow.
It turns out that one of the design decisions that was made was not ideal - it completely disables the use of the onboard dedicated traffic ASICs...
Unfortunately, schools need quite paranoid and intense filters, and this comes at a cost (in terms of power, and price!).
Obviously, if you have a gadget that is designed to offload traffic from a busy CPU, you want to be using it.
In the initial configuration, a bridged switch interface was created to allow fairly transparent connectivity between one firewall and two core/dist/agg switches. This disables offloading, as in this configuration, every packet must be handled by the general purpose CPU.
If you want to do this (have multiple redundant links to the firewall), then if you have L3 capable switches, you may be better off routing two interfaces with appropriate weightings and IP addressing, or use LACP if you have MLAG (or Virtual Chassis) capable core infrastructure. Interestingly, nobody pointed this out when I was having performance issues and had a ticket open (and they had full config files) with Fortigate - it ought to be a performance red flag.
If you don't create a switchport, you will make use of features like "fast path" - so long as the requirements are met for that traffic to be offloaded. Such features offload some of the processes/sessions that would otherwise tax your unit's CPU, leading to one or more of faster throughput or greater ability to handle a lot of traffic/connections.
Aside from software switch, there are a few other features that disable this too, like enabling sFlow/NetFlow, and strict protocol header checking.
You should read the FortiASIC guide for the model of Fortigate and the version of FortiOS you're using. Our 300C uses the CP6 and NP2 ASIC chipsets. This one covers FortiOS 5.2.8. This will help you understand the limitations, and some tweaks that may - or may not be - relevant to your model and unique circumstances. The general best practices documentation is probably also worth your time.
You should also note that fastpath (and related offloads), unless there is an interconnect between the NP chips (Fortigate's terminology seems to be EEI, which exists on 300C) , requires the inbound and outbound (ingress/egress) interfaces to be on the same chip - check your model. For ours, any of the first 8 ports are acceptable; the last two ports are not on the ASIC - these are useful for management and, particularly, tftp uploads of firmware (ASIC accelerated ports cannot do tftp - handy to know if your unit ends up very unhappy).
My other attempts to improve performance have mainly centered around massively reducing the logging levels (annoyingly, turning this on greatly decreases Fortigate performance).
Turning off SSL Certificate Inspection (not even full SSL interception) turned out to be a terrible idea (without it, https:// connections are a mystery to the unit - with the expected results... :( ).
I also tweaked some of the session timers as per some guide I read somewhere - shortening unnecessarily long "open" sessions clears out the state table a little more regularly, clawing back some resources.
It's early days, as I only made the interface changes on Friday (the end our our Half Term break) - school is back in session now, so we'll see how it holds up and if it helps with our "live lockup" issue, or the issue where SSO doesn't always work...!
I'm seeing (System>FortiView>All Sessions>filter FortiASIC Accelerated) around 17-20% of sessions being passed to the ASIC now, and there seems to be a bit of a decrease in overall CPU usage - although we've had a 99% CPU usage this morning, traffic still flowed - and that's progress, of a sort!
|yay, accelerated connections.|
|Yay, lower CPU... I think.|