Hi everyone, in January and into February, SR saw a major hard drive crash on its main database. Fortunately we maintain caches of the content on the delivery nodes, so when you saw SR behaving strangely, that was why. I happened to be in Singapore running a workshop at the time (everything happens at the worst possible moment), so there was no way to get the server up in the mean time. The drive crash was also very frustrating because besides being a major drive, it also (somehow) corrupted data on our solid state drives, from which the database is run, and pages are delivered. Super Frustrating.
Anyway, if you’ve noticed pages gone, comments gone here and there, weird things like more spam and fake-comments appearing than usual, this is all being corrected as we bring the services back online. Fortunately the new hardware seems to be running more or less smoothly for now, but I have to admit, I kinda have my fingers crossed.
For those who are techy-inclined, the drives that crashed were Western Digital 1TB Caviar Black’s in RAID 1 configuration. Turns out that those drives aren’t long term compatible with RAID, despite it’s moniker “Redundant Array of INEXPENSIVE Drives”. My mom jokes that it is actually a Redundant array of Incompatible drives. hah! Also the Intel Server-Raid is actually a firmware raid, not a hardware raid, so electrically faulting drives don’t get handled properly — i.e. when one disk went down, it took the whole machine with it. <LAME>.
Anyhow, it’s back up and delivery content (it seems) as fast as ever, I took the opportunity to upgrade the RAM and the disks some more, so it should be even faster.
Also, it seems like this was partially caused by our WEATHER DATA, which we loaded and was so large that the backups were overloading the disks and stalling the server. Cheerio!