Can anyone explain how they decide which requests to reject? The blog post just ...

elandau · on March 26, 2018

Requests are rejected essentially when an atomic counter of inflight requests hits the limit. It's important to note that the library doesn't actually keep any kind of queue of requests. That's really not necessary because every system already has a ton of queues in the form of socket buffers, executor queues, etc...

Yes, the basic implementation does reject arbitrary requests. We do have a partitioned limit strategy (currently in the experimental state, which is why it wasn't brought up in the techblog). The partitioned limiters lets you guarantee a portion of the limit to certain types of requests. For example, let's say you want to give priority to live vs batch traffic. Live gets 90% of the limit, batch gets 10%. If live requests only account for 50% of the limit then batch can use up to the remaining 50%. But if all of a sudden there's sustained increase in live traffic you're guaranteed that live requests will only be rejected once the exceed 90% of the limit.

nvarsj · on March 26, 2018

My guess is they use a 0 / small queue in front of the request pool. If queue is full (indicating the server is at its concurrency limit), it returns a 429 (which is sort of weird - return a 503 instead). I don't think that is part of the library though - the library just provides the low level bricks.

ad-hominem · on March 26, 2018

I think their point is basically "It doesn't matter" - any client that sends a request which then gets rejected automatically retries and is bound to get a server that is up and has capacity. The retry happens so fast that even with just a naive retry implementation, the end-user won't even notice the interruption.

From https://medium.com/@NetflixTechBlog/performance-under-load-3...

> The discovered limit and number of concurrent requests can therefore vary from server to server, especially in a multi-tenant cloud environment. This can result in shedding by one server when there was enough capacity elsewhere. With that said, using client side load balancing a single client retry is nearly 100% successful at reaching an instance with available capacity. Better yet, there’s no longer a concern about retries causing DDOS and retry storms as services are able to shed traffic quickly in sub millisecond time with minimum impact to performance.

Edit: In terms of how they decide what to reject, from reading the blog post, there is a queue and there is a limit to how big the queue can be. Requests that come in while the queue is "full" get rejected immediately. They don't wait in the queue and timeout.