Hacker Newsnew | past | comments | ask | show | jobs | submit | unohoo's commentslogin

Use pulsar - so much better than kafka


I am all for self-reliance, but if you really want to influence someone else, you might want to include a link to the project, especially when the only word you share has a much more prevalent meaning.


There was a Pulsar post on HN a few months ago, with interesting comments, some of the related to kafka. I keep Pulsar on the radar since that post:

https://news.ycombinator.com/item?id=21936252


Why is that?


If anybody has seen a good detailed comparison, I'd love to read one. The first dozen hits were pretty weak.


millions of topics, no zookeeper ect. Kafka is addressing these shortcomings on the roadmap.


The Pulsar documentation says it requires Zookeeper: https://pulsar.apache.org/docs/en/administration-zk-bk/


oh sorry i meant storing topic info in zookeper that limits kafka to a certain number of topics.


Nope, it also stores topic metadata in ZK - it's not exactly going to store that in the (near) stateless brokers, or in Bookkeeper - and BK also relies on ZK, but it's common to reuse the ZK quorum between the brokers and the bookies.

It also needs an additional ZK for cluster replication.


For a lot a projects this is hardly a problem. On the other hand Kafka is more mature and has a huge ecosystem (Kafka Connect, Kafka Streams, KSQL, ...).


Kafka also has less moving parts even today before zookeeper removal is complete (2 vs pulsar 3).


But one of those moving parts of Pulsar, BookKeeper, means that you're no longer storing data on message brokers. Worth the extra puzzle piece for a lot of use cases.


Pulsar is less mature but does provide functional equivalents to all of the above. Pulsar IO (Kafka Connect), Pulsar SQL (KSQL), Pulsar Functions (Kafka Streams).


Nah, Pulsar functions is nowhere near Kafka Streams - it's more like AWS lambda.

Example off the top of my head, is that you can't, in Pulsar Functions write the equivalent of "aggregate this stream across a 10 minute window and emit the results on window close".

But that's fine, it doesn't need to be like Kafka Streams, you can use Flink or Spark or Storm etc. to fill the same niche. In fact one of the founders of StreamNative (Pulsar's equivalent of Confluent) is a core committer on Flink.


Kafka is not more mature just more hyped. I just wish Aphyr jepsen test would also cover more scenario like - what happen to your data if x+1 server permanently fail in the cluster with a replication factor of X. - what happen if a single partition data size or request rate become 90% of the cluster capacity - what happen in multi-tenancy scenario to other user throughput and latency when one user try to use all the capacity of the cluster - ...


It's way more mature. I just spent a week evaluating Pulsar vs Kafka for a client and the fact Kafka has been open sourced for 10 years vs. Pulsar's 1.5 really shows in documentation, community support etc.

> what happen to your data if x+1 server permanently fail in the cluster with a replication factor of X.

It depends on how many in sync replica sets existed entirely within those X+1 servers. Their partitions will go offline, and other ISRs will have underreplicated partitions, and the alerting you've set up as a good engineer will have told you this was happening.

> what happen in multi-tenancy scenario to other user throughput and latency when one user try to use all the capacity of the cluster

Nothing because you're using ACL and have configured quotas appropriately.

Bad things otherwise.

PS, also been running Kafka since 0.8.


if replication factor is 3 and 3 server go down in the span of 1 or 2 hours no alert will save you


Yes, but this is true of any system offering N - 1 safety, e.g., HDFS, Vertica, Pulsar. It's not specific to Kafka.

And you can switch to your warm replicated cluster in this scenario, if you have one, Mirror Maker 2 supports replicated offsets so consumers can switch without losing state.

But what you're describing is going to shaft any replicated system.


not true for HDFS, Cassandra ,pulsar and most distributed file system.

As soon as a segment is under-replicated it”s replication factor is restored under less than 2 minutes by selecting new machine as replica.

Kafka try to do it with “kafka cruise control” but adding a replica to the in sync replica list take several hours if partition are 300GB and servers are already busy handling regular live traffic


> adding a replica to the in sync replica list take several hours if partition are 300GB

I'd be curious to hear more about this, because I run several topics with similar partition sizes, and haven't encountered several hours for one replica, and I've routinely shifted 350GB partition replicas as part of routine maintenance.

I have encountered 2 hours to restore a broker that was shut down improperly, but yeah, assuming your replica fetchers aren't throttled to shit, or your brokers aren't overloaded (what's the request handler avg idle? 20% or lower is time to add another broker, 10% is time to add another broker right now), that's really extreme.


I am dealing with the exact same situation handling a 7 year and a 2 year old. Its actually worse in my case since my wife is a healthcare professional and has to be at her job - so its effectively just me looking after the 2 kids during the day.

I tried working 1 week from 6-12 in the night, but was exhausted since I still had to wake up when kids woke up. This week, I am trying to do 3 hours before they wake up, another hour when the younger one naps and then finally 2 more hours once my wife is back home. Several folks in my reporting chain have similar aged kids or a tad bit older, so they are empathetic to the situation.

My productivity is definitely impacted, but then these are unchartered waters. I am just trying to stay sane through the process :-)


best thing you can do -- buy a town home or home -- try to manage the minimum 10% down payment (if you can manage 20%, thats great) and then rent out the bedrooms to other single folks. I wish i had done that when i was a entry level software engr.


I'm just above entry level, but by no means a recent grad. Not all of us were lucky enough to land a big software gig right out of college ;)

My family probably wouldn't be too happy to suddenly have roommates. Hah. But yes, if you're young and single, that's a solid option I would think.


2 awesome things about this release: 1) No need to wrap siblings in an enclosing element - upgrade is worth it just for this change alone (j/k)

2) I'm surprised no one has mentioned it so far -- but this release uses the new fiber architecture - I saw the video presentation about fiber and think it could really help in terms of performance


What aspect of performance?


IIRC Bezos chose seattle not just for available talent but also the fact that there are no income / corporate taxes in the state of WA. I believe this factor will be also significant when they choose the 2nd HQ. Texas / Florida / Nevada are the competitive places that come to mind that have no income tax. Perhaps Austin (availability of talent pool) seems likely ?


Don't forget New Hampshire! We have no income tax, no sales tax, and a significant population of software developers.

https://www.bls.gov/oes/current/oes151132.htm#st


de la Where?


Check out the recently launched site Comparably https://comparably.com/

It is so much better than Glassdoor at dissecting salary data.


> It is so much better than Glassdoor at dissecting salary data.

... perhaps, if you fit in its narrow definitions, share your own salary data, and are happy to 'sign in with linkedin' or give them your email before seeing any results. I wasn't, so can't comment.


How do you have any confidence that any of their numbers are real? Unlike this thread or even Glassdoor, there's a clear incentive to submit fake data to Comparably because they require you to submit a salary before you can see any of their data (and encourage you to sign in with your LinkedIn account).


Yikes. Anybody who gives both their salary and LinkedIn information to any company is either insane or stupid.


Right, so the salaries are probably biased towards the low end.


There's value in this -- ping me if you want to discuss more.


I like the idea and would like to join the discussion.


Where: San Jose / South Bay Company: MyCityPals (https://mycitypals.com) Position: Co-founder

MyCityPals is a platform for people to meet new people - whether you are new to town/ friends settled down / bored of doing things alone.

I am a developer and have coded the site by myself (Django/postgres / aws)


If you read the story, it mentions digging dirt on Sarah's family. Family includes children.

>> But what is wrong with threatening a woman's children? Clearly you dont have children. If you dont see anything wrong with this notion, having a thoughtful discussion with you is a moot point. Do you work for uber ?


Someone @Lyft PR needs to be on this stat. I've hardly seen Lyft take any advantage of fuckups like these by Uber.


It's a complicated decision: whether or not to get into the smear war. I have more respect for Lyft if they don't.


If your competition is doing something really stupid, all you have to do is get out of the way...


That would be the worst idea. They should stay far, far away and not say word one. You do not ever get involved in controversy if you can help it. Say a word and get you and yours investigated by those looking for a follow-up story.


Some things are just too radioactive to touch.


What makes you think these hit pieces against Uber aren't Lyft's PR in the first place?


this one doesnt seem to be. an uber svp was talking directly to buzzfeed editor.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: