Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
G
Grpc
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
tci-gateway-module
Grpc
Commits
f1795f1f
Commit
f1795f1f
authored
7 years ago
by
Sree Kuchibhotla
Committed by
GitHub
7 years ago
Browse files
Options
Downloads
Plain Diff
Merge pull request #12139 from sreecha/fix_tm_avalanche
Fix thread avalance in thread manager
parents
abd7bce6
419b617a
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
src/cpp/thread_manager/thread_manager.cc
+33
-5
33 additions, 5 deletions
src/cpp/thread_manager/thread_manager.cc
with
33 additions
and
5 deletions
src/cpp/thread_manager/thread_manager.cc
+
33
−
5
View file @
f1795f1f
...
...
@@ -158,11 +158,39 @@ void ThreadManager::MainWorkLoop() {
}
// If we decided to finish the thread, break out of the while loop
if
(
done
)
break
;
// ... otherwise increase poller count and continue
// There's a chance that we'll exceed the max poller count: that is
// explicitly ok - we'll decrease after one poll timeout, and prevent
// some thrashing starting up and shutting down threads
num_pollers_
++
;
// Otherwise go back to polling as long as it doesn't exceed max_pollers_
//
// **WARNING**:
// There is a possibility of threads thrashing here (i.e excessive thread
// shutdowns and creations than the ideal case). This happens if max_poller_
// count is small and the rate of incoming requests is also small. In such
// scenarios we can possibly configure max_pollers_ to a higher value and/or
// increase the cq timeout.
//
// However, not doing this check here and unconditionally incrementing
// num_pollers (and hoping that the system will eventually settle down) has
// far worse consequences i.e huge number of threads getting created to the
// point of thread-exhaustion. For example: if the incoming request rate is
// very high, all the polling threads will return very quickly from
// PollForWork() with WORK_FOUND. They all briefly decrement num_pollers_
// counter thereby possibly - and briefly - making it go below min_pollers;
// This will most likely result in the creation of a new poller since
// num_pollers_ dipped below min_pollers_.
//
// Now, If we didn't do the max_poller_ check here, all these threads will
// go back to doing PollForWork() and the whole cycle repeats (with a new
// thread being added in each cycle). Once the total number of threads in
// the system crosses a certain threshold (around ~1500), there is heavy
// contention on mutexes (the mu_ here or the mutexes in gRPC core like the
// pollset mutex) that makes DoWork() take longer to finish thereby causing
// new poller threads to be created even faster. This results in a thread
// avalanche.
if
(
num_pollers_
<
max_pollers_
)
{
num_pollers_
++
;
}
else
{
break
;
}
};
CleanupCompletedThreads
();
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment