I am all for self-reliance, but if you really want to influence someone else, you might want to include a link to the project, especially when the only word you share has a much more prevalent meaning.
Nope, it also stores topic metadata in ZK - it's not exactly going to store that in the (near) stateless brokers, or in Bookkeeper - and BK also relies on ZK, but it's common to reuse the ZK quorum between the brokers and the bookies.
It also needs an additional ZK for cluster replication.
For a lot a projects this is hardly a problem. On the other hand Kafka is more mature and has a huge ecosystem (Kafka Connect, Kafka Streams, KSQL, ...).
But one of those moving parts of Pulsar, BookKeeper, means that you're no longer storing data on message brokers. Worth the extra puzzle piece for a lot of use cases.
Pulsar is less mature but does provide functional equivalents to all of the above. Pulsar IO (Kafka Connect), Pulsar SQL (KSQL), Pulsar Functions (Kafka Streams).
Nah, Pulsar functions is nowhere near Kafka Streams - it's more like AWS lambda.
Example off the top of my head, is that you can't, in Pulsar Functions write the equivalent of "aggregate this stream across a 10 minute window and emit the results on window close".
But that's fine, it doesn't need to be like Kafka Streams, you can use Flink or Spark or Storm etc. to fill the same niche. In fact one of the founders of StreamNative (Pulsar's equivalent of Confluent) is a core committer on Flink.
Kafka is not more mature just more hyped. I just wish Aphyr jepsen test would also cover more scenario like
- what happen to your data if x+1 server permanently fail in the cluster with a replication factor of X.
- what happen if a single partition data size or request rate become 90% of the cluster capacity
- what happen in multi-tenancy scenario to other user throughput and latency when one user try to use all the capacity of the cluster
- ...
It's way more mature. I just spent a week evaluating Pulsar vs Kafka for a client and the fact Kafka has been open sourced for 10 years vs. Pulsar's 1.5 really shows in documentation, community support etc.
> what happen to your data if x+1 server permanently fail in the cluster with a replication factor of X.
It depends on how many in sync replica sets existed entirely within those X+1 servers. Their partitions will go offline, and other ISRs will have underreplicated partitions, and the alerting you've set up as a good engineer will have told you this was happening.
> what happen in multi-tenancy scenario to other user throughput and latency when one user try to use all the capacity of the cluster
Nothing because you're using ACL and have configured quotas appropriately.
Yes, but this is true of any system offering N - 1 safety, e.g., HDFS, Vertica, Pulsar. It's not specific to Kafka.
And you can switch to your warm replicated cluster in this scenario, if you have one, Mirror Maker 2 supports replicated offsets so consumers can switch without losing state.
But what you're describing is going to shaft any replicated system.
not true for HDFS, Cassandra ,pulsar and most distributed file system.
As soon as a segment is under-replicated it”s replication factor is restored under less than 2 minutes by selecting new machine as replica.
Kafka try to do it with “kafka cruise control” but adding a replica to the in sync replica list take several hours if partition are 300GB and servers are already busy handling regular live traffic
> adding a replica to the in sync replica list take several hours if partition are 300GB
I'd be curious to hear more about this, because I run several topics with similar partition sizes, and haven't encountered several hours for one replica, and I've routinely shifted 350GB partition replicas as part of routine maintenance.
I have encountered 2 hours to restore a broker that was shut down improperly, but yeah, assuming your replica fetchers aren't throttled to shit, or your brokers aren't overloaded (what's the request handler avg idle? 20% or lower is time to add another broker, 10% is time to add another broker right now), that's really extreme.
I am dealing with the exact same situation handling a 7 year and a 2 year old. Its actually worse in my case since my wife is a healthcare professional and has to be at her job - so its effectively just me looking after the 2 kids during the day.
I tried working 1 week from 6-12 in the night, but was exhausted since I still had to wake up when kids woke up. This week, I am trying to do 3 hours before they wake up, another hour when the younger one naps and then finally 2 more hours once my wife is back home. Several folks in my reporting chain have similar aged kids or a tad bit older, so they are empathetic to the situation.
My productivity is definitely impacted, but then these are unchartered waters. I am just trying to stay sane through the process :-)
best thing you can do -- buy a town home or home -- try to manage the minimum 10% down payment (if you can manage 20%, thats great) and then rent out the bedrooms to other single folks. I wish i had done that when i was a entry level software engr.
2 awesome things about this release:
1) No need to wrap siblings in an enclosing element - upgrade is worth it just for this change alone (j/k)
2) I'm surprised no one has mentioned it so far -- but this release uses the new fiber architecture - I saw the video presentation about fiber and think it could really help in terms of performance
IIRC Bezos chose seattle not just for available talent but also the fact that there are no income / corporate taxes in the state of WA.
I believe this factor will be also significant when they choose the 2nd HQ. Texas / Florida / Nevada are the competitive places that come to mind that have no income tax. Perhaps Austin (availability of talent pool) seems likely ?
> It is so much better than Glassdoor at dissecting salary data.
... perhaps, if you fit in its narrow definitions, share your own salary data, and are happy to 'sign in with linkedin' or give them your email before seeing any results. I wasn't, so can't comment.
How do you have any confidence that any of their numbers are real? Unlike this thread or even Glassdoor, there's a clear incentive to submit fake data to Comparably because they require you to submit a salary before you can see any of their data (and encourage you to sign in with your LinkedIn account).
If you read the story, it mentions digging dirt on Sarah's family. Family includes children.
>> But what is wrong with threatening a woman's children?
Clearly you dont have children. If you dont see anything wrong with this notion, having a thoughtful discussion with you is a moot point. Do you work for uber ?
That would be the worst idea. They should stay far, far away and not say word one. You do not ever get involved in controversy if you can help it. Say a word and get you and yours investigated by those looking for a follow-up story.