Depending on what side of Hanlon's razor you fall, the only conclusion I get from this is that they are either incompetent or dishonest. I have a very hard time believing that this issue remained unknown to them for years.
As for the post, it's pretty much just documentation. I didn't see any apology. And the only promise of a better tomorrow is a vague "Working to better support concurrent-request Rails apps on Cedar".
I also didn't see any mention of refunds for all of the extra dynos that were needed due to the degrading performance of their service - or all the extra support hours where they told everyone 'not our problem!'.
They apologized in the last post. Also, self-critical language like "fallen short of [our] promise" and "we failed to..." is a de facto apology and acceptance of responsibility even when the word 'sorry' only appeared earlier.
I can understand how this developed. Things worked well for most customers. Many of those with problems got them under control with more dynos or multi-worker setups. Heroku's Rails roots biased them towards a "keep it simple, throw hardware at it, or look for optimizations in the app/sql/db" mindset. Well, many of their Rails/Bamboo customers complaining about latency, even in the presence of this growing issue, may have also (or even primarily) had other app issues too. (When supporting developers, especially many beginning/free-plan developers, it doesn't take long for your conditional probability P((we have a real problem)|(customer thinks we have a problem)) to go very low, and P((customer app has a problem)|(customer thinks we have a problem)) to go very high.)
Even when Heroku had a unitary (and thus 'smart') router, they surely got latency complaints that were completely due to customer app issues or under-provisioning, so they stuck with the 'optimize app or throw dynos at it' recommendation for too long. And, when they habitually threw more hardware at the Bamboo routing mesh, they were unwittingly making the pile-up issues for Bamboo web dynos worse. Some key data about the uneven pre-accept queueing at dynos was missing, which combined with habits of thought that had worked so far gave them a blind spot.
Despite the growing issue, adding dynos at the margin would still always help (at least a little) — as well as adding to Heroku revenues. Even without any nefarious intent, a 'problem' that fits neatly into your self-conception ("we give people the dyno knob to handle any scaling issues and it works"), and is also correlated with rising business, may not be recognized promptly. That's just a natural human biased-perception issue, not incompetence or dishonesty.
In short, Heroku needs to hire someone with some operations research experience. This is a mathematical modelling problem, not really a code problem.
Break out Mathematica, Matlab or R and model the damn problem. Then go research the solutions already available (Hint: look at many grocery stores, queuing problems).
I think apologies are over-demanded by our somewhat hysterical media that likes nothing better than to enhumble/humiliate a public figure (because it sells papers); and this flows through into expectations of private and corporate behaviour. But I've never had much use for apologies from other people. Years of abuse make "sorry" an entirely debased term in my lexicon. I've seen statements of regret that omit the word and are all the more sincere for it.
Much more useful than an apology is an acceptance of fault (which is not the same thing); an expression of desire to improve, and a sincere and demonstrable commitment to doing so.
[NB: don't mean to imply that Heroku have necessarily achieved all of that here]
"We failed to explain how our product works. We failed to help our customers scale. We failed our community at large. I want to personally apologize, and commit to resolving this issue"
I agree fully with either they are incompetent or dishonest. I hope this response gets more press because Heroku better be beyond perfect from this point on. There is no excuse for this.
Your razor has a false-dilemma. They may be very competent, but having no intentions of caring for non-concurrent applications. Either because they did not think about the scenario or because the way RoR operates is silly.
Well, 'package' system is a big word. It does not have versioning, checksums, or signatures. An import of a package may bring in (1) a version that is API-incompatible; (2) a version that is API compatible but has new bugs; and (3) a version that has been trojaned/backdoored/whatever.
The only solutions is doing your own package management in $GOPATH, tracking a bunch of Git/Mercurial repositories and finding out by hand which commits are sane and which are not.
I didn't realize NuGet let you do remote path import. Nor that it was around in 2004. Nor that it's part of the C# language spec.
Goroutine + Channels are vastly different than the TPL. The comparison is so far off that I can't even come up with a clever analogy. There's more to concurrency than threads, and more to communication than locks.
> I didn't realize NuGet let you do remote path import. Nor that it was around in 2004. Nor that it's part of the C# language spec.
Fare enough if you want to limit yourself to 2004 and compiler specific support.
> Goroutine + Channels are vastly different than the TPL
If you take out your Go coloured glasses and read in what TPL does, you will see that what Goroutines and channels are indeed available as Tasks and Queues.
I don't know enough about Go, but Goroutines and Channels sound quite similar to F# agents and mailboxprocessor concepts, which as I understand is not quite the same as TPL. The former tackles concurrency, while the latter is about parallelism. C# 5 is getting there with async/await concepts I guess.
Here's a dining philosophers implementation in F# and Go (among other languages)
Was curious what happened if you inhaled it. Nothing specific on their site, but it does say:
The coating has been found to be safe for use in nonfood contact areas
of food processing plants. The coating meets FDA and USDA regulations for those
types of applications.
I am surprised no one has linked to the youtube video provided by the manufacturer about application of the product[1]. They recommend the use of a respirator during application; this doesn't necessarily mean anything, you won't get sued for telling someone to use a respirator when a product is safe, but it at least indicates they aren't comfortable yet to tell people to apply this without protective equipment. Like many things that require a respirator to apply, it likely becomes much safer once dry (the carrier solvent evaporates away, making the actual coating nonvolatile)
I remember seeing old videos of this tech used inside a ketchup bottle. I wonder if we're going to see food-safe versions soon on products like toothpaste and peanut butter :)
I remember reading it's already food-safe, as in if you scraped it with a knife and munched on it, it'd still pass through you without any effects, but sorry, I don't have a source for that.
The delusion I have, which he didn't address, is that the price/performance ratio is atrocious. Sticking with Heroku, the $3200/m Ronin DB could essentially be a 1-time purchase if you are collocating, or 1/10th the monthly price if you go dedicated.
The cost of the database to an enterprise is not dominated by how much the license for the software costs, the hardware costs, or the electricity costs. It is dominated by how much you have to pay in headcount to manage it. At many companies, that app would have two people whose full-time job would be "Do everything Heroku ops would do, except slower and suckier."
I get this same question from some potential customers for AR, too: "Why would I pay you $X a month when I could buy a competitor's dedicated appliance for $X and then just pay for phone calls?" "Who is going to maintain that appliance?" "What do you mean maintain?" "Like, if a security vulnerability is discovered in a technology AR uses, I stay up all night. Do you have a guy like that?" "... No." "Do you know what a guy like that costs?" "... Uh, plumber money?" "Doctor money." "Shoot."
The cost can be so out of whack that it can become cheaper to hire talent. Two things make this truer: the person might have spare time to do other things, and sensitivity to things that PaaS are poor at: raw processing speed, memory or SSDs.
I've seen it over and over again with SaaS: server monitoring, logs, statistics, ...
We had an internal fight recently about how to manage our logs. Group A wanted to spend thousands a month on Splunk. Group B spent a weekend setting up logstash + kibana and deploying it in production.
Server monitoring? Nagios. Stats? statsd+graphite. Even if you have to outsource setting it up, it'll probably be cheaper from the very first month.
1) Many companies already have most of these skills, or can acquire them cheaply and reliably via a service contract. The cloud isn't the only way to outsource operations.
2) The performance/reliability constraints imposed by certain popular cloud platforms can create very expensive development work, distracting developers from doing things that actually create value for customers.
I worked with a successful firm that was close to outgrowing their dedicated server and was considering going cloud. They sketched out an architecture that would get everything to fit into nice little EC2-shaped pieces. But there was a problem: it would take a ton of developer time.
Instead, they spent a fraction of the expected developer cost on some serious hardware, and they were done nigh on instantly. Their developers then used that saved time to do things that actually delivered more value to the customers.
The cloud is awesome when it fits, but it doesn't always fit.
The author was speaking more about SaaS (like Google Apps), and not just IaaS hosting (like EC2) or PaaS (like Heroku). EC2 and Heroku don't compete primarily on price, but on the flexibility to scale up/down as needed. If you can offer more of a commitment, there are much cheaper options available.
Does Heroku actually charge the face value prices for real customers with that level of need? I would think that discounts would readily be made available to keep the business, but I have trouble finding anecdotes from people who have spent a lot of money on Heroku.
Thanks! I wonder why he didn't just link it in the blog article. Your comment should be more towards the top, I just had to spend a few minutes clicking around his site to get to that article.
To Mr.Minimal: The archive link only shows last month, so I had to hover the subtly-bolded numbers in the calendar box to be able to browse older titles and hope for the best ... I feel like I must have missed some completely obvious way to navigate your blog?
(edit: the above remark is intended as constructive criticism--I tried my best wording it as such, but reading it back, that maybe didn't quite come out right)
We just finished doing some video encoding testing on a few different platform and EC2 (along with EC2-based offerings) are considerably slower and more expensive. Although 10x more expensive than a 3930K, a cc2.8xlarge instance was only 1.75x faster.
I think EC2 is almost always going to be more expensive than bare metal servers, unless you are taking advantage of the ability to pay by the hour, or leveraging flexible pricing. Reserved instances will save quite a bit of money, but spot pricing can do you even better. Check out this post from a few months ago from someone who is using spot instances for core services, and estimates 70% savings vs on demand pricing.
How about comparing EC2 to a Virtual Private Server? Thats a bit more of an apples-to-apples comparison.
Serverbear notes that Amazon 7.5GB Large instances (which cost $180+ / month) benchmark at ~650 on Unixbench... with 30 MB/s for its disk. In comparison, a 8GB VM from Digital Ocean only costs $80/month. I don't have the numbers for the 8GB VM, but the smaller $20/month 2GB instance has a UnixBench of ~1900 with over 300MB/s I/O from its solid state drive.
(I presume the larger instances have more CPU power / priority in the VM scales)
That is half the cost for triple the CPU performance and 10x better disk performance. Other smaller providers, such as RamNode, offer extremely fast I/O with RAID 10 Solid State Drives in their Virtual Private Servers (500+ MB/s).
Amazon vs Digital Ocean
serverbear.com/239-large-amazon-web-services
serverbear.com/1990-2gb-ssd--2-cpu-digitalocean
To be fair though, Amazon's CPUs are more consistent... consistently bad, but consistent. VPS CPUs and I/O are affected by their neighboring VMs, while Amazon seems to have removed that uncertainty. Nonetheless, in practice, you will always get a better performing CPU and I/O from other providers.
And if we compare both to bare metal servers, obviously bare metal servers win in price/performance, but are harder to maintain, so its hard to do an apples-to-apples comparison. But Digital Ocean VMs can be spun up/down just like Amazon instances... although Amazon has more load balancers and other infrastructure. (But nothing is stopping you from setting HAProxy on a front-end VM to loadbalance a cluster of VMs from Digital Ocean. Even then, other VPS providers like Linode offer Load Balancers as part of their infrastructure now)
Its hard for me to see the case for Amazon's cloud offerings. They don't have very much price/performance at all. At all ends of the spectrum, low end to high end, VPS providers such as Digital Ocean offers more vertical scalability as well as a cheaper price on all of Amazon's offerings.
Unless you need some specialized VM from Amazon (ie: GPU compute), or are locked into their vendor-specific API (oh I feel sorry for you), there is no reason to use Amazon's services IMO.
In the next few months we will be migrating a number of servers to EC2. The only reason is to take advantage of latency based routing -- we really really need to reduce latency as far as possible.
Anyway, there's your reason.
The other reason is that big businesses just don't care. Margins are high enough on software that cost of EC2 over another provider is outweighed by the benefit of existing infrastructure, developer experience, and the risk limitation by choosing AWS.
Fair enough. I consider that part of the "specialty" kind of service however. I still wouldn't touch their S3 compute stuff though, even if I'd use Amazon's DNS services. I know that you can use Amazon's CDN with other provider's VPSes or your own dedicated boxes somewhere.
And certainly, for the small 2 or 3 server clusters that a small startup uses, Amazon's prices are much significantly higher than other providers.
Anyway, I'd have to check out the latency based routing thing, and how it differs from typical Geo DNS or "Anycast" DNS that is offered by a number of providers. My bet is that its just Amazon marketing speak for GeoDNS or Anycast technology.
EDIT: http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Cre... As far as I can tell, Amazon's "Latency based Routing" is just GeoDNS with much better marketing name. Its all about reducing latency, but at the end of the day, it is no different from GeoDNS.
That said, Route53 does seem to be a good DNS service from Amazon. $0.75 per million anycast queries per month + $0.50 per zone is a good price methinks.
So while I'd never use a compute instance at Amazon, I probably definitely keep their Route53 service on my list. Looks pretty nice from what I can tell.
As noted above, it is hard to do a "fair" comparison between Amazon and the others due to the fact that Amazon offers a bit more consistency. Linode and Digital Ocean benchmarks are all over the place depending on how much CPU or IO that their neighbors are using.
Another thing to consider is the number of mistakes a company has done. While Amazon and Linode have been around for years... Amazon had the Virginia fiasco this past year (Netflix outage), and Linode had the bitcoin hack. Digital Ocean has only been around for a few months, so their security / reliability is basically untested.
With those caveats in mind, it is then possible to look at the inherently flawed benchmarks and work off of them. Serverbear is a good resource for comparing those things.
A raspberry pi is by many considering a minimum viable computer of sorts, and the bottom of what one would consider acceptable performance.
Therefore seeing how Amazon compares to that is an interesting exercise. I was personally floored by how poor performance some EC2 instances has for some types of tasks (java/clojure related things among them).
I quickly decided Amazon was not able to serve my needs within the price-range I was willing to pay.
As for the post, it's pretty much just documentation. I didn't see any apology. And the only promise of a better tomorrow is a vague "Working to better support concurrent-request Rails apps on Cedar".