Skip to content
Snippets Groups Projects
Commit 707dd2c3 authored by Abhishek Kumar's avatar Abhishek Kumar
Browse files

Merge pull request #2065 from dklempner/retry_backoff

Update the connection backoff document with jitter.
parents 027994c4 e00b0c32
No related branches found
No related tags found
No related merge requests found
...@@ -8,58 +8,39 @@ requests) and instead do some form of exponential backoff. ...@@ -8,58 +8,39 @@ requests) and instead do some form of exponential backoff.
We have several parameters: We have several parameters:
1. INITIAL_BACKOFF (how long to wait after the first failure before retrying) 1. INITIAL_BACKOFF (how long to wait after the first failure before retrying)
2. MULTIPLIER (factor with which to multiply backoff after a failed retry) 2. MULTIPLIER (factor with which to multiply backoff after a failed retry)
3. MAX_BACKOFF (Upper bound on backoff) 3. MAX_BACKOFF (upper bound on backoff)
4. MIN_CONNECTION_TIMEOUT 4. MIN_CONNECT_TIMEOUT (minimum time we're willing to give a connection to
complete)
## Proposed Backoff Algorithm ## Proposed Backoff Algorithm
Exponentially back off the start time of connection attempts up to a limit of Exponentially back off the start time of connection attempts up to a limit of
MAX_BACKOFF. MAX_BACKOFF, with jitter.
``` ```
ConnectWithBackoff() ConnectWithBackoff()
current_backoff = INITIAL_BACKOFF current_backoff = INITIAL_BACKOFF
current_deadline = now() + INITIAL_BACKOFF current_deadline = now() + INITIAL_BACKOFF
while (TryConnect(Max(current_deadline, MIN_CONNECT_TIMEOUT)) while (TryConnect(Max(current_deadline, now() + MIN_CONNECT_TIMEOUT))
!= SUCCESS) != SUCCESS)
SleepUntil(current_deadline) SleepUntil(current_deadline)
current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF) current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF)
current_deadline = now() + current_backoff current_deadline = now() + current_backoff +
``` UniformRandom(-JITTER * current_backoff, JITTER * current_backoff)
## Historical Algorithm in Stubby
Exponentially increase up to a limit of MAX_BACKOFF the intervals between
connection attempts. This is what stubby 2 uses, and is equivalent if
TryConnect() fails instantly.
``` ```
LegacyConnectWithBackoff()
current_backoff = INITIAL_BACKOFF
while (TryConnect(MIN_CONNECT_TIMEOUT) != SUCCESS)
SleepFor(current_backoff)
current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF)
```
The grpc C implementation currently uses this approach with an initial backoff
of 1 second, multiplier of 2, and maximum backoff of 120 seconds. (This will
change)
Stubby, or at least rpc2, uses exactly this algorithm with an initial backoff With specific parameters of
of 1 second, multiplier of 1.2, and a maximum backoff of 120 seconds. MIN_CONNECT_TIMEOUT = 20 seconds
INITIAL_BACKOFF = 1 second
MULTIPLIER = 1.6
MAX_BACKOFF = 120 seconds
JITTER = 0.2
## Use Cases to Consider Implementations with pressing concerns (such as minimizing the number of wakeups
on a mobile phone) may wish to use a different algorithm, and in particular
different jitter logic.
* Client tries to connect to a server which is down for multiple hours, eg for Alternate implementations must ensure that connection backoffs started at the
maintenance same time disperse, and must not attempt connections substantially more often
* Client tries to connect to a server which is overloaded than the above algorithm.
* User is bringing up both a client and a server at the same time
* In particular, we would like to avoid a large unnecessary delay if the
client connects to a server which is about to come up
* Client/server are misconfigured such that connection attempts always fail
* We want to make sure these don’t put too much load on the server by
default.
* Server is overloaded and wants to transiently make clients back off
* Application has out of band reason to believe a server is back
* We should consider an out of band mechanism for the client to hint that
we should short circuit the backoff.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment