Monday, 23 October 2017

RTFM: Speeding up your (Fortigate) firewall performance

I was witness to the installation of the (yes, single) Fortigate 300C firewall the school uses, however, it was not my own configuration/installation/design, although I've maintained it for several years now.

We've been having intermittent issues with close to 100% CPU usage and a sort of live lock up where the Fortigate responds, but packets do not flow (some of the scanning engines (ipsengine, or ipsmonitor) monopolise CPU time and need to be restarted several times [three, that's the magic number - diagnose test application ipsmonitor 99 - and wait several minutes between attempts] - or the unit rebooted, with resulting network chaos). And packets, like the spice, must flow.


The "death knell" for the 300C was a member of senior management unilaterally decreeing (without asking IT) that a whole year of pupils could have twice as many devices - a year ahead of schedule, and before the planned replacement to deal with the load... So I've been looking for ways to eke out a little more performance until we can afford/acquire a replacement for it.

It turns out that one of the design decisions that was made was not ideal - it completely disables the use of the onboard dedicated traffic ASICs...

Unfortunately, schools need quite paranoid and intense filters, and this comes at a cost (in terms of power, and price!).

Obviously, if you have a gadget that is designed to offload traffic from a busy CPU, you want to be using it.

In the initial configuration, a bridged switch interface was created to allow fairly transparent connectivity between one firewall and two core/dist/agg switches. This disables offloading, as in this configuration, every packet must be handled by the general purpose CPU.

If you want to do this (have multiple redundant links to the firewall), then if you have L3 capable switches, you may be better off routing two interfaces with appropriate weightings and IP addressing, or use LACP if you have MLAG (or Virtual Chassis) capable core infrastructure. Interestingly, nobody pointed this out when I was having performance issues and had a ticket open (and they had full config files) with Fortigate - it ought to be a performance red flag.

If you don't create a switchport, you will make use of features like "fast path" - so long as the requirements are met for that traffic to be offloaded. Such features offload some of the processes/sessions that would otherwise tax your unit's CPU, leading to one or more of faster throughput or greater ability to handle a lot of traffic/connections.

Aside from software switch, there are a few other features that disable this too, like enabling sFlow/NetFlow, and strict protocol header checking.

You should read the FortiASIC guide for the model of Fortigate and the version of FortiOS you're using. Our 300C uses the CP6 and NP2 ASIC chipsets. This one covers FortiOS 5.2.8. This will help you understand the limitations, and some tweaks that may - or may not be - relevant to your model and unique circumstances. The general best practices documentation is probably also worth your time.

You should also note that fastpath (and related offloads), unless there is an interconnect between the NP chips (Fortigate's terminology seems to be EEI, which exists on 300C) , requires the inbound and outbound (ingress/egress) interfaces to be on the same chip - check your model. For ours, any of the first 8 ports are acceptable; the last two ports are not on the ASIC - these are useful for management and, particularly, tftp uploads of firmware (ASIC accelerated ports cannot do tftp - handy to know if your unit ends up very unhappy).

My other attempts to improve performance have mainly centered around massively reducing the logging levels (annoyingly, turning this on greatly decreases Fortigate performance).
Turning off SSL Certificate Inspection (not even full SSL interception) turned out to be a terrible idea (without it, https:// connections are a mystery to the unit - with the expected results... :( ).
I also tweaked some of the session timers as per some guide I read somewhere - shortening unnecessarily long "open" sessions clears out the state table a little more regularly, clawing back some resources.

It's early days, as I only made the interface changes on Friday (the end our our Half Term break) - school is back in session now, so we'll see how it holds up and if it helps with our "live lockup" issue, or the issue where SSO doesn't always work...!
I'm seeing (System>FortiView>All Sessions>filter FortiASIC Accelerated) around 17-20% of sessions being passed to the ASIC now, and there seems to be a bit of a decrease in overall CPU usage - although we've had a 99% CPU usage this morning, traffic still flowed - and that's progress, of a sort!

yay, accelerated connections.
Yay, lower CPU... I think. 
Sometimes, going back to the manuals and reading what they have to say about things is a good move. In fact, it's always a good move when things are not... right.

No comments:

Post a Comment