Hi everyone, in January and into February, SR saw a major hard drive crash on its main database. Fortunately we maintain caches of the content on the delivery nodes, so when you saw SR behaving strangely, that was why. I happened to be in Singapore running a workshop at the time (everything happens at the worst possible moment), so there was no way to get the server up in the mean time. The drive crash was also very frustrating because besides being a major drive, it also (somehow) corrupted data on our solid state drives, from which the database is run, and pages are delivered. Super Frustrating.
Anyway, if you’ve noticed pages gone, comments gone here and there, weird things like more spam and fake-comments appearing than usual, this is all being corrected as we bring the services back online. Fortunately the new hardware seems to be running more or less smoothly for now, but I have to admit, I kinda have my fingers crossed.
For those who are techy-inclined, the drives that crashed were Western Digital 1TB Caviar Black’s in RAID 1 configuration. Turns out that those drives aren’t long term compatible with RAID, despite it’s moniker “Redundant Array of INEXPENSIVE Drives”. My mom jokes that it is actually a Redundant array of Incompatible drives. hah! Also the Intel Server-Raid is actually a firmware raid, not a hardware raid, so electrically faulting drives don’t get handled properly — i.e. when one disk went down, it took the whole machine with it. <LAME>.
Anyhow, it’s back up and delivery content (it seems) as fast as ever, I took the opportunity to upgrade the RAM and the disks some more, so it should be even faster.
Also, it seems like this was partially caused by our WEATHER DATA, which we loaded and was so large that the backups were overloading the disks and stalling the server. Cheerio!
I guess acceptance letters are arriving for students because the SR servers are noticing! :D Around 170,000 pages are going out of our new servers per day lately, so a lot of you must have gotten into a lot of schools. Congratulations!!! I know it was a hard process, and it’s over for a little while at least
We’re also looking to bring lots of exciting new features to you, not the least of which is the long-outdated need to edit reviews after they’ve been made :P Plus lots of little fixes that have affected the review and data quality are about to be addressed! And when the additional delivery nodes come online as they will shortly, we’ll be able to handle even more reviews!!
One particular feature is the ability to write “comparison” reviews. I for one have been itching to write an MIT vs. Michigan review, given that I’ve attended both schools
Here you can see the design and internals of `Doombringer’ the custom SR server. She’s fast — really fast in fact; she can deliver webpages at the capacity of the gigabit uplinks! She can handle almost all of the visitors to SR in a single second! She took 4 months to put together, spec out, purchase for, and drill custom holes in a custom case for.
And check out the super-huge pile of stuff going into it!
OK, I’m not quite certain what is going on here, but I’m seeing a lot of popups coming from the SR server lately. Now, we have all popup settings with our ad partners blocked, so nothing should be showing up, but still it seems to be… This is very very very perplexing…
I am trying to get this fixed as soon as possible.
The blogging system is back online. Talk about taking forever to figure out how to make wordpress multiuser run on a different port… But we did it!
Still, almost all the new delivery nodes are operational, and I have to get the videos running off of the main server as well, but SR is almost fully functional again. Yay for a slow 6 months!
Hi everyone, I’d like to apologize for all the problems you may have had with SR in the last month. I have been working on building a new co-located server cluster for SR to handle the traffic, and while that was going fine, I planned on installing it and then slowly migrating SR from the old servers to the new cluster as time permitted (i.e. in an orderly way). The old SR main server has been running for 3-4 years continuously, so I wasn’t too worried that everything would be fine.
Of course, Murphy had something to say about that, so while the building was taking place, the old SR main server hard drive crashed (in August). I was able to get all the data off, but the new server wasn’t fully put together yet. So SR then automatically switched over to the backup servers, which generally are OK for a week or so, but then they started to crash under the load as well. This was responsible for all of those `can’t connect’ database errors we received emails about. Thanks for the notes — it really helps to know that people use SR and to hear from people!
So the new server did not come online at the new co-location provider until around Sept 15, which left us with about a month of flaky-downtime that we could not really do anything about. And its funny, all the systems I put in place over the last 2 years to prevent exactly this problem from occurring (prompted by an explosion at our old provider several years ago) only barely held up the growth in traffic.
You know, I idly wonder if while Murphy is playing his game, he also listens to `Walter Murphy: A Fifth of Beethoven’… I certainly would, dancing around, poking at cables, kicking peoples’ disks. Whoops! I didn’t mean to knock over that water into your server!
So, it was completely my goal to have the new server in place by mid june, but let me tell you, receiving lots of dead parts totally put a cramp in that goal :P That said, the new server is really fast and almost ready to go. I mean… *really fast*. Well, I look forward to sending it in next week!
A few weeks ago, we added the ability to upload entire galleries of campus photos! This is a big improvement over the single image upload that we had for the past 9 years or so :P Also, we rebuilt all the pie chart graphs and rebuilt the SAT/ACT graphs!
AND we added graphs of distributions of ratings to the university breakdown page. So now, lets say that you click on “Educational Quality”, it will break down to show you a distribution of the number of ratings from A-F for educational quality at a particular school — this can be broken down even further by major!
The next update is updates to all of the tuitions for the schools!
Well, this happened in March/April, but now that the blog system works, I can finally write about it! To think that 9 years ago to this period, I started SR along with assistance of 4 friends and the help of my parents immediately after I graduated from college… It brings back memories. We’d sit in my inexpensive and unfurnished apartment, above this lunatical old woman, thinking of ideas for SR & how it was going to improve the whole college process, while she pounded the ceiling with her broom and came up to accuse of doing all sorts of rather… ah, unsavory things. What an imagination she had… (use your imagination). Read the rest of this entry »
Lately many of you have noticed that SR has slowed down a lot, and that lots of graphs have been falling out of date. Thank you for letting us know! The reason this has been happening is because the traffic to SR has grown way past what our 3 servers can handle. We made some improvements this academic year that multiplied the overall speed by 10, but the traffic (at least on peak days) quickly saturated the newfound capacity. To overcome this, we employed 5 levels of disk caching (built up over the past 10 yrs), which both fill up the disks and create a huge amount of disk-read-load. Without getting boring, suffice it to say that every time an update to the site is made, basically everything has to be rewritten, re-analyzed, copied, driving the number of users that we can handle from 100,000 down to like 10. Not 10,000… just ten. It gets that slow. It seems bad, but that’s the tradeoff we made to be able to handle everyone within our current setup & budget. As long as things don’t change every 5 minutes, we gain a whole lot of speed.
Unf, the time that is an ok has passed. To support the systems, we’ve had to drop a lot of things over the years that made SR awesome, like dynamic PDFs, dynamic analysis, etc. Since those were important things, we will soon be moving from 3 dedicated, geo-located servers to co-located high bandwidth servers of our own design. The new servers are… for lack of a better word… awesome. I tried to understand and characterize how awesome they are, but they are beyond my comprehension (even though I designed them). Each one is more than 10x faster than the sum of our original 3, and comes with satellite distribution nodes… I’ll upload pictures when they are finished being constructed!