Hacker Newsnew | past | comments | ask | show | jobs | submit | dennis82's commentslogin

Interesting. How would this play with Mongo?


ETL from mongo to MemSQL to analyze? SQL wipes the flow compared to Mongo's aggregation framework.


They voted to strike. And yes, they assuredly are "public servants." It used to be that in exchange for working for the public and taking a lower salary, you'd be cushioned and shielded from recessions, in exchange for more stability.

Now, public workers are better compensated than the private sector and still maintain guarantees for their jobs.

So which is it? Because the way I and many other Californians see it, our state is in such dire straits because unions have squeezed the public dry, run up huge deficits to fund ridiculous pension plans, and haven't done anything beyond baseline to help the People.


We the People pay and employ public workers. Ergo, We the People are entitled to see their salaries, and in this particular case, how badly they're "ripping off" the Public in general.


no, it's categorically not good to make an application more fragile. A weakness restated is not a strength. Every database should have an optimizer, period.


If a database system does not support joins, aggregations or subqueries like most realtime NoSQL solutions do, an optimizer becomes pretty trivial. Optimizers are needed for analytical stuff. That's why most optimizers are evaluated on analytic workloads (e.g. TPC-H, TPC-W) not transactional / realtime (TPC-C).


I did not state that an optimizer should not exist for a database - I think thats key actually - but rather that the tradeoff they made this time around was fundamentally good in that - at least for now - it forces the developer to think about application performance.

If that happens to make an application more fragile, I think that is more of a code organization/tooling issue than anything else.


this is a good update and not to rain on their parade, but doesn't this seem market redundant and reinvent-the-wheel?


RethinkDB is definitely late to the party, so this is a great question. Here's how we think about it.

1. Technically, Rethink already does very useful things leading NoSQL contenders don't do. An extremely expressive query language, massive query parallelization, distributed joins, etc. You can't get that anywhere else, and we have lots of features in the pipeline to keep raising the bar.

2. I believe we have a unique take on usability and design that is very valuable to users but isn't present in other products. Rethink is pleasant to set up and (hopefully) pleasant to use. To us, design matters, and we put an inordinate amount of effort to make the product beautiful. It turns out users care about this a lot.

3. I feel like leading NoSQL contenders are somewhat stagnant. There is an enormous amount of innovation that could be done, needs to be done, and isn't being done. We'll continue releasing really useful features that nobody has, and (I think) nobody expects. I think we bring a unique philosophy to product design that's very valuable.

4. Details matter (which is why Rethink is late to the party). Once you outgrow the ten-minute blog stage, a lot of underlying architectural decisions start to really matter, and we took the time to do them right.

TL;DR: Rethink is already really good for building apps on top of it and offers things nobody else does. It will continue getting better. We'd be honored if you took it for a spin!


I'd rather have had them take their time to get it right than rush it out the door to be first-to-market.


I have to disagree; disk is ancient - it's mechanical egads! - while 10GigE is pretty commonplace now and infiniband and fiber channel are even faster.

back from my CS 101 takeaways: there are only 3 bottlenecks in a computer system: CPU, network, and IO.

looks like MemSQL is fixing the CPU and IO bottlenecks, but physics is physics so network is pure hardware solution haha


The problem is that you can end up with larger latency over the network because it still takes a fixed amount of time for nodes to communicate. Even with a 1TB/s link between nodes you can still have a good 30ms between them all adding even more latency. That can be mitigated somewhat by a good protocol that can manage that latency properly (e.g. not blocking while waiting on ACKs and such), it can still end up with far more latency than a few large disks would be (even better now with SSDs). That said I do imagine that some datasets will benefit from this kind of topology (I can imagine that geospatial stuff will do well with that, since you can locate physically close things on a single machine and reduce the amount of talking needed).


30ms? In anything resembling a modern datacenter? 0.3-0.5ms is more typical these days.


He was joking.


what are the major feature updates on this release compared to last one?


there are a lot of new features! Beyond distributing data across multiple machines in a cluster, there's more SQL surface area, multiple levels of redundancy for HA, and a distributed query optimizer. Some cool stuff with bi-directional lock-free skiplists too w.r.t. indexes



are you kidding me? There is absolutely NO reason whatsoever to use a NoSQL database for a financial services company. Postgres is more than capable of sustaining the necessary speeds of a startup.

Relational databases were created in the first place to solve these very problems around transactionality and analytics for finance.

This library is a beautiful example of reinventing the wheel, and otherwise creating a patchwork of unnecessary - and ultimately brittle - infrastructure.


(I work at Stripe.)

Where we use MongoDB, it's not because of speed. PostgreSQL is certainly capable of fast performance. MongoDB is useful for its ability to log freeform data as well as for its replication model. (We use sharded MongoDB in a few places, but mostly use straight replica sets.)

We use MySQL, MongoDB, PostgreSQL, and Impala. They're all useful in different places.


Mongo's probably still got the edge as a JSON store overall, but definitely check out the new JSON object dereferencing functionality coming in 9.3. There's a Russian indexing posse consisting of Oleg, Teodor, and Sasha who have been looking at doing proper indexes for JSON but haven't managed to secure funding. (Disclosure: I think they should get funded.)

These are the same guys who built hstore, full text search, GIN and GIST indexes and I think are working on a generic regular expression index type right now.


> "We use MySQL, MongoDB, PostgreSQL, and Impala."

Thanks for the clarification, but this makes it even more obvious your engineering team is introducing needless complexity into your organization.

Postgres can store unstructured data just fine, so you have a 'solution' that uses 3 OLTP stores instead of one.


PostgreSQL is awful for storing unstructured data. It is the most cumbersome, clunky syntax I've seen for a while and it lacks ORM support meaning you are forced to manually write it.

Making developers productive is an important aspect for choosing a database.


Choosing a data store based upon syntax and slightly limited ORM support isn't exactly a great idea. Both of these things can be improved rapidly with a little code.

More important questions are how is the data stored, how is it accessible, how can you scale the system, what operational constraints are there, how fast is it, what types of data modeling can be done, what consistency/transaction guarantees does it provide, etc. These are the things that will make developers productive because they will not be putting out fires all the time.


well said!


Why do you use MySQL over Postgres and vice versa?


(Clouderan here)

How are you liking Impala? We just dropped 0.5 release yesterday which includes the JDBC driver :D!

Edit: Awesome job on the Ruby client, it's great!


It's been great -- setup was a bit of work (we're on Ubuntu, so had to build from source), but once up and running it's allowed us to do lots of ad-hoc analysis that would have been too hard otherwise.

I've been meaning to write a MoSQL equivalent for our Impala data, but at the moment we're doing a more traditional ETL.


gdb - If you have Impala, Hadoop, and Hive right now. Why use MongoDB instead of HBase and make it all work in a happy harmony?


Awesome! Great to hear it's working out for you guys, looking forward to MoSQL for Impala :-)


We've been pretty happy so far. There have been a few rough edges getting it up and keeping it running, but we've been very impressed with the performance so far.

I've passed your comment on to Colin, who wrote the Ruby client -- I'm sure he'll appreciate it!


I got myself a little Impala Herd server setup, pointed it at my Impala cluster and it's working great ;).


heh, I didn't think anyone would actually use that - I originally wrote it meaning to use it as a tutorial for the blog post, then scrapped that idea.

Thanks for the kind words!


Everywhere I've worked that did high volume transaction processing had an architecture that required a piece like this. Even if you use a relational database for intake, you still need to move the data to another database for analytics. Moving the data automatically via replication sounds a lot better than the typical batch process running at 4am.


Tell this FIS Global.

There is absolutely no reason to make banking system on GT.M but they did.

Although: GT.M is the only(?) NoSQL that is ACID-compliant.


> There is absolutely NO reason whatsoever to use a NoSQL database for a financial services company

Yes there is. PostgreSQL doesn't support multi master replication which makes it a terrible choice if you really want to make sure every transaction gets written. I really wonder at what point people that keep recommending PostgreSQL are going to wake up and realise what is happening in the industry.

People are scaling OUT not UP. Especially startups.


I'm sorry, postgres-xc doesn't work for you needs? [0] It has worked for me in the past.

[0] http://postgres-xc.sourceforge.net/


I would imagine that for your average startup, using solutions that don't even support transactionality will cause greater complexity issues. Especially given the enormous window before db scale out/up becomes an issue on well-designed applications.


Enormous window ?

Many startups would be using AWS and it is not inconceivable that you would have Multi-AZ/Multi-Region VPSs. Scaling out != Expensive.


> People are scaling OUT not UP. Especially startups.

Startups need to scale out because many of them like to deploy on mediocre EC2 instances with the slowest SAN storage ever.

People that keep recommending PostgreSQL are rightfully ignoring this industry.


> Startups need to scale out because many of them like to deploy on mediocre EC2 instances.

No. They need to scale out because providers like AWS have outages. And so startups et al need to deploy in multiple AZ/regions in order to have as close to 100% uptime as possible. You can't do that with a well considered multi master style replication strategy which PostgreSQL frankly doesn't have.

>People that keep recommending PostgreSQL are rightfully ignoring this industry.

Sure. And soon enough they will be relegated to the dustbins of history. The trends don't lie.


"The trends don't lie"

Wah. And you do not even seem to be ironic. Trends always lie, there is always a next thing that will take the opposite direction, in philosophy, in science, and particularly so in computing stuff.


In all fairness, you could use something other than Postgres that's also ACID.


this is marketing cloaked in a developer portal. I think it's great that rethinkdb is trying to distinguish themselves from Mongo, but what's the real marginal utility of a rethinkdb over Mongo?

Mongo has been around for years, and it still has problems.

Rethinkdb is just launching a new product that essentially does the same thing as Mongo, but is maybe just a little easier to use.

I think the Yet Another Database (YAD) question still hasn't been answered by this post.


Cloudera is primarily a consulting company, as are most open-source companies.

There have been only 3 successful open source companies ever: MySQL, Jboss, and Redhat


>There have been only 3 successful open source companies ever: MySQL, Jboss, and Redhat

See now, there's just no reason to go around saying things like that. Even if that were true (it's not), how would you know that it's true? Do you have a complete database of profitable businesses in your head? I doubt it. A significant number of open source companies are privately held, and there's really no telling how successful they are.

Anyway, right off the bat, you missed Mozilla Corporation.

Beyond that, the question of whether a company that does nothing but develop FOSS is irrelevant. The important question is whether developing FOSS can be a major part of a successful business strategy. You might start by asking Google and Apple.


Isn't Mozilla Foundation a non-profit? Wikipedia says it owns a taxable entity, the Mozilla Corporation, but gets the majority of it funds from donations from Google.

I think the companies listed only develop Open Source software and sell services. Google and Apple may use and support Open Source projects, but the real money makers are proprietary.

I agree that the list is still wrong.


Mozilla Foundation is non-profit. Mozilla Corporation is for-profit, and makes most of its revenue by selling the default search engine slot in Firefox to Google.

>I think the companies listed only develop Open Source software and sell services.

That's true, but I think it's looking at the situation completely wrong. It's sort of like saying that making left shoes is unprofitable because there are so few successful businesses that exclusively manufacture left shoes.


It is exceedingly easy to find other examples to counter. In fact, one of the easier ways is by looking at software where their license is of a specific type. Example: http://en.wikipedia.org/wiki/List_of_AGPL_web_applications


Tell me, is the goalpost that you're toting around there heavy?


Cloudera isn't primarily consulting. In fact we have about 8x as many engineers in product development compared to consulting, last I counted.


Alfresco and SugarCRM seem to be doing OK. I don't have deep insight into their finances, but they're certainly still around and in business.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: