StudentsReview ™ :: Blog

Search for Colleges by Region

or within distance of city

  Who's got the Best (variable)?

Perceptual Rankings:
You Make 'Em.
We Post 'Em.
You Vote 'Em Up.
You Vote 'Em Down.
Aww yeah.

Major Server Hard Drive Crash

Oh here's why the drive crashed, there were no disks in it!Oh here's why the drive crashed, there were no disks in it!
And of course, my ultra-stylish-arty shot.And of course, my ultra-stylish-arty shot.

Hi everyone, in January and into February, SR saw a major hard drive crash on its main database.  Fortunately we maintain caches of the content on the delivery nodes, so when you saw SR behaving strangely, that was why.  I happened to be in Singapore running a workshop at the time (everything happens at the worst possible moment), so there was no way to get the server up in the mean time.  The drive crash was also very frustrating because besides being a major drive, it also (somehow) corrupted data on our solid state drives, from which the database is run, and pages are delivered.  Super Frustrating.

Anyway, if you’ve noticed pages gone, comments gone here and there, weird things like more spam and fake-comments appearing than usual, this is all being corrected as we bring the services back online.  Fortunately the new hardware seems to be running more or less smoothly for now, but I have to admit, I kinda have my fingers crossed.

For those who are techy-inclined, the drives that crashed were Western Digital 1TB Caviar Black’s in RAID 1 configuration.  Turns out that those drives aren’t long term compatible with RAID, despite it’s moniker “Redundant Array of INEXPENSIVE Drives”.  My mom jokes that it is actually a Redundant array of Incompatible drives. hah!  Also the Intel Server-Raid is actually a firmware raid, not a hardware raid, so electrically faulting drives don’t get handled properly — i.e. when one disk went down, it took the whole machine with it. <LAME>.

Anyhow, it’s back up and delivery content (it seems) as fast as ever, I took the opportunity to upgrade the RAM and the disks some more, so it should be even faster.

Also, it seems like this was partially caused by our WEATHER DATA, which we loaded and was so large that the backups were overloading the disks and stalling the server.  Cheerio!

Tags: server

Wow! Look at that traffic!

I guess acceptance letters are arriving for students because the SR servers are noticing! :D  Around 170,000 pages are going out of our new servers per day lately, so a lot of you must have gotten into a lot of schools.  Congratulations!!! I know it was a hard process, and it’s over for a little while at least :)

We’re also looking to bring lots of exciting new features to you, not the least of which is the long-outdated need to edit reviews after they’ve been made :P  Plus lots of little fixes that have affected the review and data quality are about to be addressed!  And when the additional delivery nodes come online as they will shortly, we’ll be able to handle even more reviews!!

One particular feature is the ability to write “comparison” reviews.  I for one have been itching to write an MIT vs. Michigan review, given that I’ve attended both schools :)

Tags: admissions, server

New SR Server — Scooty Puff SR. — The Doombringer.

We custom-fabbed our own server!

Here you can see the design and internals of `Doombringer’ the custom SR server.  She’s fast — really fast in fact; she can deliver webpages at the capacity of the gigabit uplinks!  She can handle almost all of the visitors to SR in a single second!  She took 4 months to put together, spec out, purchase for, and drill custom holes in a custom case for.
And check out the super-huge pile of stuff going into it!


Tags: server

At long last...

The blogging system is back online. Talk about taking forever to figure out how to make wordpress multiuser run on a different port… But we did it! Still, almost all the new delivery nodes are operational, and I have to get the videos running off of the main server as well, but SR is almost fully functional again. Yay for a slow 6 months!

Tags: server

Setting up the new server: Murphy's Fifth

Hi everyone, I’d like to apologize for all the problems you may have had with SR in the last month. I have been working on building a new co-located server cluster for SR to handle the traffic, and while that was going fine, I planned on installing it and then slowly migrating SR from the old servers to the new cluster as time permitted (i.e. in an orderly way). The old SR main server has been running for 3-4 years continuously, so I wasn’t too worried that everything would be fine. Of course, Murphy had something to say about that, so while the building was taking place, the old SR main server hard drive crashed (in August). I was able to get all the data off, but the new server wasn’t fully put together yet. So SR then automatically switched over to the backup servers, which generally are OK for a week or so, but then they started to crash under the load as well. This was responsible for all of those `can’t connect’ database errors we received emails about. Thanks for the notes — it really helps to know that people use SR and to hear from people! :) So the new server did not come online at the new co-location provider until around Sept 15, which left us with about a month of flaky-downtime that we could not really do anything about. And its funny, all the systems I put in place over the last 2 years to prevent exactly this problem from occurring (prompted by an explosion at our old provider several years ago) only barely held up the growth in traffic. You know, I idly wonder if while Murphy is playing his game, he also listens to `Walter Murphy: A Fifth of Beethoven’… I certainly would, dancing around, poking at cables, kicking peoples’ disks. Whoops! I didn’t mean to knock over that water into your server!

Tags: server