Thursday 16 June 2016

Synology RackStation RS3614xs+

Some time back, we received two Synology RS3614xs+ units so that we could address our rather painful lack of on campus storage (my Thruk monitoring system has a lot of machines that complain about lack of disk space cluttering up the place; most typically are over 90% full, which is not ideal). Off campus storage, thanks to Google Apps for Education, is not exactly a problem any more.

Synology RS3614xs+
We chose these after carefully surveying the low-mid range NAS/SAN market. They had a number of features which appealed, including dynamic flash/SSD caching, AD integration, integration with Google Drive, and 10 gigabit network interfaces; they also feature redundant power supplies and the possibility of adding more storage through expansion chassis. I'm particularly excited about having them set up as a redundant pair. The price/performance seemed about the best we were going to get. And it never hurts when a fellow sysadmin has had similar units in production for a while, and rather likes them.

One thing I'm particularly excited to see is faster student logins, one shared profiles end up cached on SSDs and spewed into the LAN at multi-gigabit speeds.

So, it's probably about time we configured these and put them into production...


This is actually the third time I'll be setting these up. I can't now recall why I initially nuke 'n paved the first time; this time it's because the high availability simply refuses to get itself configured (Synology helpdesk had no useful ideas), so now it's time to start from a clean slate and see if that allows me to set up the pair, and also if it fixes a glitch my colleague doesn't have - every time it updates the DSM software, it forgets all the Google Drive accounts you've configured. This happens quite often, and is a minor pain if you only have one or two. If you decide to set up tens or hundreds, you might start having issues with such shenanigans.

When you order 10 gigabit network cards, solid state or "traditional" hard drives, make sure you check the compatibility list. We chose Synology's own 10Gb interfaces, 4TB WD enterprise hard drives and Kingston SSDs (which is what the vendors suggested, and we agreed to after checking compatibility; several tried to offer things not on the compatibility list, and we suggested that they might like to try again).

RAID. Is it RAID-undant today?

Whilst exactly what will work for your environment probably differs from ours, what I chose was a single RAID 6 array (two parity drives) and a "hot spare" disk with 2 SSDs so that we have read and write SSD caching; this means I have 9 drives for storage, one as a hot spare and 2 SSDs; your roughly storage amount will therefore be about 7 (n-2) times the size of the hard drive you base your RAID array on (4TB drives in our case, with about 25TiB of space once in the array - the arithmetical difference (shortfall) being because of the difference between megabytes (the units HDD manufacturers sell in) and mebibytes (the units OSs typically display). You should carefully consider the challenges of RAID in large disk environments - it is no longer uncommon for a RAID array to fail twice in a row - the second time whilst trying to rebuild (NB: goodbye, ALL data in that failure scenario) - as the mean time between errors on multiple large disks is getting uncomfortably close to the average RAID rebuild time. The BRE/URE rate is the "nail in the coffin" of this challenge. And the larger the disks you use and the bigger your RAID, the worse it gets. RAID 5 is definitely dead in the water. You may also like to consider building lots of smaller RAID 1 mirrors, which is what I did on the server I configured last year (six drives; three RAID 1 arrays; two of traditional HDDs, one of SSD). Putting these things into a redundant pair has some features of a RAID 1 arrangement. But you still better make a plan to make regular backups offsite somewhere... This also echoes that a minimum of three (ideally completely) independent copies of the data is what you require.

Buy "cold spare" disks to keep in your IT stores in case a drive fails; the quicker you replace a failed drive in a RAID array, the better. You do not need to be caught short or unable to obtain a spare disk within a reasonable timeframe. Fetching one out of the cupboard is about the optimal "turn around" on this! Make sure those specialised disks aren't used for "random" purposes, unless you're using your general HDD model (which you probably won't be). Mark them!

Enterprise storage, even at the "low end" is not exactly cheap; you may need to explain why your "server drives" cost so much more than the 1TB drives they can buy in just about every shop that sells IT gadgets - and perhaps why your enterprise storage cost per terabyte is "so much" higher. (Things to ponder: expected duty cycle (24x7x365 vs 8x5 at most); BER/URE; performance; data storage redundancy; not losing data; relative IO loads, security features, performance... etc). It's really a completely different game.

If your school is anything like mine, you may have to toss in an analogy. I think I would probably use "it's like trying to play football (soccer) with a rugby (or american football) ball".

You might be able to make a fairly reliable storage fabric out of consumer drives. You should probably build several of your storage array and make sure every single byte is in several places at once. It will not be fun, and people will probably look at you funny when you tell them about it. If you have the space and skills to build the sort of things "big data" and cloud providers do*, go right ahead. Ceph might be a good starting point if you want to go that route; there are others of course. Backblaze will give you some insights into the challenges of doing this sort of thing "affordably" at petabyte scale. You're probably not going to want to do this in practice, and "paying someone" to get this right is probably the right move - i.e. buy a purpose-build storage device and integrated software platform (like a Synology, or something more fancy if you have the cash).

* hint: they build things with enough redundancy that they literally don't care when an entire machine dies - indeed, they may leave it there for quite some time before it even makes sense to expend manpower on replacing it; if you've not seen them, Google's data centre pages are fascinating. Eventually, you get to the point that entire datacentres can fail without noticeably affecting the use of their platforms (at least unless you're really paying attention to latency). 

Setting up the Synology Rackstations in a HA pair

Assumptions

I'm going to assume you've already installed any drives and network cards, and I'm further going to assume you want to use 10Gb interfaces; you can get much of this right with (built in) copper 1Gb/s interfaces, but as I have no screenshots of that, you're on your own. Setup should be similar though. I'm also going to assume you have two factory default rackstations of the same model and DSM version with the same drives installed in the same bays on both (basically, you need two more or less identical units).

You should also recognise that it's inherently an Active-Passive setup (that's what they support), so you won't gain any performance increases or be able to handle more load as you would expect from an Active-Active pair (you can probably expect a slight hit, in fact). However, you should gain more reliability - both in terms of uptime, and in terms of data availability/reliability. But make backups elsewhere too... 

Downloads

You may want to get a copy of Synology Assistant from the download site. It makes life easier at first.

Certainly, you will need a copy of the latest DSM (6.0.1 at the time of writing); it's a 247 megabyte file and may take some time to wind its way to your PC if your internet is slow.

The canny will verify the MD5 hash after downloading the DSM file.
  • Microsoft's FCIV can be used for this, but you may like to use things that are already built in; 
  • In Powershell 4.0+, you can try powershell's Get-FileHash -algorithm md5 <file>
For DSM 6.0.1, the MD5 hash is d8668a7a15b889de9fc714b5485e2b07 but you should verify on the Synology website. Other platforms have MD5 hashing tools as well.

LAN Setup

Make sure your DHCP server is set to hand out IP addresses to devices on your LAN; if you make use of only static assignments on your server LAN, you're going to have "fun times"; you can probably engineer around this by using a client LAN to do the initial setup, and then manually configure the correct IP addresses at a later date.

In factory default state, the RackStations seem not to use the 10GbE interfaces; connect a normal gigabit LAN cable to the relevant port (LAN 1 is a good bet) on each unit and plug that into a switch. Connect one 10GbE interface on each unit to a switch, and connect another 10GbE interface directly to the corresponding port on the other unit. You may want to do this to one unit at a time. Remember that the secondary heartbeat interface should be direct port-port links at layer one (no switches etc. between them) and must be over the fastest available interface link (so if you want 10 gig, you need to buy two single interface cards for each chassis, or a dual interface card).

Take a note of the serial numbers of each unit so you can identify each one individually. I like to know specifically which unit I'm working with until such time as they have working DNS A and PTR records. A hardback notebook works well for this, and is useful for taking notes all over the place. I get one every year. Paper works even if everything IT is broken; it's a handy "backup" for certain types of documentation and info. I also usually document steps of configuration changes I may want to roll back like this, step by step. Replicate useful ones as howtos / electronic documentation in your onsite Wiki with copy/paste code snippets, if relevant.

Synology Assistant

Install and launch Synology assistant; if your devices are powered up and connected to the same L2 network segment as your workstation, you'll be able to discover them with the utility; click on Search.
Synology Assistant finds two devices. How convenient. 
It should find both (or just one if you've only connected one) of the devices. Mine is in a slightly different state as a "virgin" machine in the screenshot above - it's been reset to factory defaults. Double click on the device you want to connect to; your system default web browser should launch and connect to the unit.

Installing DSM on the device

In the web interface, you'll want to tell the system to install DSM, the Synology NAS software/firmware.

A factory reset device DSM installation screen
Hovering over / Clicking on the Device Info link at the bottom will show your the device serial number, so you know which one you're playing with. 
Again, because this is a factory reset, the screen will be slightly different to yours on a factory fresh unit. In this case, click Re-Install. In any case, run through the basic installation step until the normal DSM login interface pops up.

Browse for the DSM software .pat file you downloaded earlier. Click Install Now.



Wait for the format and install to complete.



Eventually, you'll be greeted with a login window where you can set up some basic account details:


Unfortunately, no sign of the serial number on this page, so hopefully you made a note of which IP address corresponds with which unit... Or you can check the Synology Assistant again, which will tell you both. Interestingly, once the DSM is installed, the 10GbE interfaces are enabled and will get DHCP addresses, so you can end up with multiple copies of the same unit showing up in Synology Assistant.

Once you've set up your account, click next.

Pick a suitable set up update settings for your environment; automatic can be nice, but if you have maintenance windows, a manual version is better - but keep your eyes peeled for any CVEs or other security announcements from Synology. I ticked the SMART monitor.

Remember updates usually imply reboots and some brief related down-time; having this automated might be OK on a native, client facing file server, but would probably make a server with an iSCSI target on the unit freak out. In general, automatic updates on a HA pair are going to be a very bad idea.
You're then done with the basic install, will see some tips and get dumped onto the "desktop" of DSM.

A quick tour of a few basic features...
And, you're in!
Synology require static IP addresses to be set up on the devices for HA; in the Control Panel>Network, pick your network interface and change from DHCP to Static, using the correct settings and IP address.

Repeat this process for the second unit. 

NB If you want to change default ports for things, do so now (on both devices). 

Configuring Synology High Availability


Next, you should probably get your High Availability Pair set up.

Open up the Main Menu, and go to High Availability Manager:

Open up High Availability Manager from Main Menu,

Run the High Availability setup wizard.
Enter details for the passive server in the pair.
I think you really ought to be using https, even if you trust your LAN; by default this is on port 5001.
The performance penalties are, IMO, outweighed by the security win. 
Tell it which interface to use for the Heartbeat.
Remember it has to be, or match, the fastest available connection speed.
So if you've got 10GbE, heartbeat must be on a 10GbE port.  
You're going to need another IP address.
Make sure it's not going to be handed out at a later date by DHCP to another device;
this must be statically configured on the Synology. 

 Are you sure?
Are you really, really sure?
Let's do some waiting. 

Let's do some more waiting.
If you've set up the RAID and have data in there, this stage may take some time,
as it will all have to be duplicated across before the setup completes. 
 Eventually, you'll be greeted with a (hopefully happy looking) completed HA Pair:

Happiness is a healthy storage array. 
Now, it's onwards and upwards to make use of your shiny new redundant storage.

Make sure you physically separate the two units as much as you can (ideally in separate datacenters - doing so will be the subject of a further post, once our current fibre expansion project is finished and can support this).

Synology of course have their own guide to doing this.

Further Steps

After this, I suggest that you:

  • Create your RAID 6 array (or whatever other data storage structure you require) and any volumes you may require; not all available file system formats are necessarily compatible with HA, so watch out what you pick. 
    • You can do this on the master device before enrolling it in HA if you wish (even after a "factory reset", my unit remembered the RAID setup and data that was already on it, which rather surprised me - but that's a bonus from a "oops, I lost all my data" point of view, if you're not thinking about what a factory default reset might entail! This is also why I don't have a screenshot step-by-step for this - but it's pretty straightforward). 
      • Consider whether or not you want to use iSCSI targets (which you can use as "virtual" hard drives over the LAN - a cheap SAN, if you will). 
      • Or if you're going to use is as a more basic "file server".
      • Or both. 
  • Secure your login(s) with two factor authentication (2FA); leverage the Google Authenticator app for you mobile device platform to do so. Control Panel> User> Select Username>Edit go through the 2FA ("2-step verification") enrollment. 
  • Enrol your device in Active Directory, if you use it at your site. Note that this typically gives Domain Admins full admin rights to the Synology - you probably want this. 
    • Get each admin to login and enrol in 2FA. 
    • When you're in the user admin area of the control panel, you may like to tick Enforce 2-step verification for the following users under Advanced. 
  • Set up HTTPS redirection (i.e. always manage over HTTPS), possibly with other certificates from your own trusted PKI or a commercial vendor. Control Panel > Network > DSM settings. You can change the ports too, if you want to (preferably before you set up HA); at minimum, tick the redirect both. You should probably untick Windows and Bonjour discovery, unless you want those. You can also consider enabling HSTS. If you don't install your own certificates, you'll likely have to get through security warnings in your browser; you may want to get into the habit of checking the ceritificate to make sure you've not been MITMd (SHA fingerprints are a useful verification datum).
  • Make sure you're happy with the NTP server used in Control Panel>Regional Options>Time; you may like to use an on-campus NTP server or a round-robin DNS NTP service name if you have several. 
  • Install any other packages you might like to use through Package Center. Realise that the more functions you add, the less secure the platform is, the more load you put on it, and the more complex maintenance becomes. As a primarily SOHO NAS platform, they have a lot of things you really don't need in an enterprise LAN; choose carefully! If you like the sound of some of the features, you might like to get another Synology box to play with them (or do so before you press your existing one(s) into production, and get rid of those you don't "need" before you go into production). 
  • TEST your storage before it goes into production - use it for a few weeks before starting to switch over a few services, and then eventually your entire desired storage needs of it. 

3 comments:

  1. Any updates on how this has fared so far?

    ReplyDelete
  2. Fairly well, although I did have one failover meltdown, where they both went active.
    I factory reset the passive unit and redid the pairing.

    We've migrated quite a bit of stuff across over the last month (school holiday) - the kids get back tomorrow, so we'll see how the "stress test" goes!

    ReplyDelete
  3. ... I strongly suspect non-resilience in failover is a big part of why Synology are about 1/10th of the "normal" brands you think about with this deploying sort of thing.

    A collegue of mine that works at a research institute has been running two synologies for a long time - but with only one active, and the other as a target for backups, essentially (i.e. manual failover); this may be more reliable, and certainly will cause less heart failure, should things go "squiffy".

    Quite how they ever both went active when they have a layer one connection between the two heartbeat ports, I have no idea; the logging seems really poor.

    ReplyDelete