Linode having some issues

dpcan · on Oct 28, 2009

If you go back in time and watch threads during almost ALL web hosting outages, the #1 COMPLAINT is always lack of communication with users.

PLEASE, web hosting companies, get the point, we want constant communications, even if you have nothing new to report. WE are smart too, let us know what you're trying, what's working, what's not, maybe your USERS can help you fix the problem.

thaumaturgy · on Oct 28, 2009

I've been on the other side of this wall, though not for anything near this large of a service.

A lot of shops run with minimal staff, because -- let's face it -- cost is a very large factor of competition in the hosting industry.

Then a problem occurs, and if it were an easy problem to solve, they'd just solve it straight off and then do damage control in PR. Or, the problem is more like this one, and it requires some hard and fast thinking on the part of the staff that are available.

If you're one of those staff, the last thing you want to do is distract yourself by logging in to a forum every twenty minutes, or checking email, or writing status updates every thirty minutes. You instead feel incredible pressure to get the problem fixed immediately, meaning you don't even stop for a bathroom break if you can help it.

As a sysadmin, you'll also justify your decisions by saying that telling the users what's going on won't really change anything; it's not like they're assisting you in troubleshooting. Even if they're intelligent, they still aren't familiar with the specific systems and network topology and other good stuff involved in your operations, so you'll probably spend a lot more time answering their suggestions than you would if you just got in and figured it out yourself.

Though as a user, that kind of attitude is extremely frustrating.

dpcan · on Oct 28, 2009

You have an excellent point because I too host websites for hundreds of people and have been in the middle of a disaster. It's hard to keep updating a blog, but I got around to it about every hour regardless. Usually it just said what I was working on and that nothing had changed. Two sentences, but everyone knew what was going on. I also changed my outgoing voice mail recording which helped a ton. Most people didn't even leave messages.

It's REALLY hard, but well worth it in the end because you get praise for your communication instead of hate mail for abandoning your customers.

yrb · on Oct 28, 2009

Things just really don't work on human time/relationship scales anymore, especially when it comes to internet services. Communications can move in milliseconds now, diagnosing and fixing complex issues is still on the order of hours/years/lifetimes!

It doesn't help that your customers often know about issues you are having before you do. So say ~30min to get notified of an issue (Which is not unrealistic for a non obvious issue that isn't something you actively monitor) - Quick 10min investigation to assess what the scale of the issue is, another 3-5min to post a couple notifications, and that is the better part of a hour gone, and you haven't even got around to trying to come up with a fix.

benofsky · on Oct 28, 2009

This is why Slicehost rock, they always communicate brilliantly. What I find amazing here is that the Linode representative hasn't even said sorry in the first post they made and hasn't in the second post either, crazy.

chrischen · on Oct 28, 2009

Slicehost seems to cost more.

EDIT: Wait? Does Slicehost not cost more? Please, correct me if I'm wrong. But from what I'm seeing you get less RAM, Bandwidth, and Space for the same price. Btw, I use Slicehost, and just recently they had problems with my server. Otherwise they're flawless though. Great support too.

jrockway · on Oct 28, 2009

Not sure why this is downmodded; but Slicehost is more expensive than Linode. Both have good service and web interfaces, but Linode gives you more RAM for less cash.

They are down today, I guess, but that is to be expected for something you pay $20 a month for.

bham · on Oct 28, 2009

... and does not have bandwidth pooling, does it?

tlrobinson · on Oct 28, 2009

Slicehost does: http://www.slicehost.com/articles/2008/9/17/bandwidth-poolin...

bham · on Nov 3, 2009

Oh, I missed that. Great.

jamesbritt · on Oct 28, 2009

"This is why Slicehost rock, they always communicate brilliantly."

Um, ok, here's some counter-anecdotal info: My company at the time had an account on Slicehost, and every so often our instance would just go offline.

No word from anyone, just dead. After some e-mail, we learned that there was some concern of malicious activity, so the image was shutdown. Now, if Slicehost was getting complaints, and were planning on shutting down the instance, why couldn't they give some warning so some action could be taken. Or even say something right after shutting down a suspicious instance? Instead, we were left wondering why our site was not up.

I never figured out just how the instance was supposedly compromised, and eventually left Slicehost for eApps.

Nowadays I use Linode, or The Planet.

I know a few people who swear by Slicehost, but not everyone has a rosey exeperience.

dpcan · on Oct 28, 2009

I agree. I love Slicehost, and I really love how they haven't changed in terms of customer service since they joined up with Rackspace.

cmelbye · on Oct 28, 2009

I've never tried Slicehost before, but I've actually had really good results with Rackspace's "fanatical" support. Slicehost's support could have only gotten better! ;-)

Maxious · on Oct 28, 2009

My ISP Internode always says when they expect the next status update to be. Usually there's a deadline of 2-3 hours between updates and recently there was a period of 72 hours with several cascading issues ( http://advisories.internode.on.net/item/6611/ ). There were status updates all the way through and staff communicating with users out-of-band to discover the extent of the issues so they could resolve them.

ShabbyDoo · on Oct 28, 2009

What is notable to me is the surprise received by those who purchased hosting in multiple geographies to survive a datacenter-wide issue. I hadn't thought much before about the effect of centralized IT management decisions on availability. Perhaps those who really need uptime will now consider using another hosting provider as fail-over. Not sure how the DNS issues would play out as network-level load balancing isn't something I know much about.

zefhous · on Oct 28, 2009

It's disconcerting to discover the cause of my server problems today via a thread on Hacker News instead of getting an email or from their RSS feed or something.

That being said, I think Linode is great and I've had a great experience with them so far.

axod · on Oct 27, 2009

Looks like they did an update on their host machines. The update unfortunately meant that many hosts marked linodes as being shutdown, when they should have been marked up.

This meant many linodes were unreachable (No network).

linode.com was extremely slow also, don't know if it's related.

They're rebooting hosts to fix the issue, which is just horrible.

Here's a copy of the original post, since their own website and forum are slow/offline:

http://kovaya.com/linode.html

fjabre · on Oct 28, 2009

Bump Technologies uses Linode and I can't access their site. http://www.bumptechnologies.com/

That's rough. I'm sure some heads are gonna roll come tomorrow morning.

EDIT: I just recently signed up for Linode and love their service. Mine is still up. The only thing I was a little freaked out by is that their main site uses Cold Fusion. Poor judgement?

petercooper · on Oct 28, 2009

And Hacker News uses a dialect of Lisp. Point?

tsuraan · on Oct 28, 2009

It doesn't look like it was a graceful shutdown either; I had to enter maintenance mode and fsck by hand. Lots of bad free block counts, but now that it's booted, I'm not seeing any corruption. I just use my linode for fun, so I don't really care too much, but it would be nice to get a better explanation than that initial explanation of hosts being erroneously marked as down.

ajju · on Oct 28, 2009

My first warning about this was my "Host down" sensor firing from the monitoring service, almost an hour after they noticed the issue.

I really love Linode and my experience with them so far has been great but such a lag in communicating critical issues is not acceptable.

kellishaver · on Oct 28, 2009

Personally, I haven't noticed any downtime today on our linodes in Dallas and Atlanta. I don't know. I get the complaint about lack of communication, on the other hand, though, I'd rather have them working to solve the problem than sitting and updating a blog.

Was it a dumb move to push the update to multiple data centers in the middle of the day? Maybe, but then, without knowing all the details of how and why it as done, that's a big assumption to jump to.

Linode is still, hands-down, the best hosting provider I've ever used (and yes, I had a slicehost account for a long time), so I'll give them a pass on this one. Sometimes problems just happen.

shpxnvz · on Oct 28, 2009

It's not unreasonable to be shocked at a upgrade in the middle of the business day for the majority of their customers.

Every significant online business I've worked with does deployments and upgrades off-peak, without exception. In fact one prefers to do production changes on Friday evenings so that the technical staff has the entire weekend to come in and resolve any problems that arise before the following Monday morning peak.

kellishaver · on Oct 28, 2009

Oh, no, I didn't mean to imply that. I don't think anyone here is being unreasonable. I just meant that I've had enough awesome from Linode that I'm willing to give them a pass on this one. Everyone screws up now and then. :)

brlewis · on Oct 28, 2009

Guess I lucked out. My linode in Fremont was unaffected.

techiferous · on Oct 28, 2009

I lucked out, too. The world's not fair: I'm hosting a bunch of stupid sites on my Linode in New Jersey (like http://histaniputyourpictureonthewebihopeyoudontmind.com/ ), whereas I bet some other people with Linode problems are losing money. I feel for them.

EDIT: I just checked my uptime and it looks like my linode was rebooted. But I never noticed (and I was actually logged in and working there a lot today).

buugs · on Oct 27, 2009

The forum page isn't loading for me, what problems are they having?

My vps is still up and has been up (at the dallas datacenter) and seems to be working fine.

robotadam · on Oct 27, 2009

Here's the initial note from the forum. I copied it before and can't reload, so I don't know the user that posted it.

"During a shared library update distributed to our hosts, a number of the hosts incorrectly have marked Linodes as being shut down. To recover from this we may be issuing host reboots to upgrade their software to our latest stack, and then bringing the Linodes to their last state. We're working on this now and expect to have additional updates shortly. We'll also be notifying those affected via our support ticket system. Please stand by."

andrewtj · on Oct 28, 2009

Mine is also at Dallas (dallas98) but has only been up for ~20 minutes and is extremely slow.

edit: Looks like my VPS was offline for 20 ~ 30 minutes and has settled down now.

spokey · on Oct 28, 2009

For what it's worth, I have also have VPS in Dallas that is running fine right now, and has been up since my last manual reboot (more than 30 days ago).

bengtan · on Oct 28, 2009

One of my linode servers got affected. It was shown as running but I couldn't ping it. Reboot, and still could not ping.

Then I logged in via the LISH console and found out that eth0 wasn't up. Manually configured the network settings, and it's working again.

Someone of you who are affected might like to try this too.

brettbender · on Oct 28, 2009

I recently (in the last week) signed up for the most basic Linode package for a pet project. Many of the posts on the forums note Linode's stellar track record, which is a bit reassuring, but doesn't really offset the lack of communication Linode has had with this.

va_coder · on Oct 28, 2009

It's funny how our standards change. I've seen a lot of internal-to-the-organization hosting providers be down for days and not seem to care much.

It's good that we demand more, but in retrospect the service we have today is pretty good and much better than it was in the past.

dreur · on Oct 28, 2009

Other post saying they will improve their process.

http://www.linode.com/forums/viewtopic.php?t=4768&start=...

bk · on Oct 28, 2009

Yeah, all my linode sites are dead (timeout) and no shell, at the very least for 1 hour, NJ datacenter.

Sucks. Will be interesting to see how this unfolds.

dwiel · on Oct 28, 2009

I've not noticed any downtime or lagging in Dallas