fn: agent eviction revisited (#1131)

* fn: agent eviction revisited

Previously, the hot-container eviction logic used
number of waiters of cpu/mem resources to decide to
evict a container. An ejection ticker used to wake up
its associated container every 1 sec to reasses system
load based on waiter count. However, this does not work
for non-blocking agent since there are no waiters for
non-blocking mode.

Background on blocking versus non-blocking agent:
    *) Blocking agent holds a request until the
    the request is serviced or client times out. It assumes
    the request can be eventually serviced when idle
    containers eject themselves or busy containers finish
    their work.
    *) Non-blocking mode tries to limit this wait time.
    However non-blocking agent has never been truly
    non-blocking. This simply means that we only
    make a request wait if we take some action in
    the system. Non-blocking agents are configured with
    a much higher hot poll frequency to make the system
    more responsive as well as to handle cases where an
    too-busy event is missed by the request. This is because
    the communication between hot-launcher and waiting
    requests are not 1-1 and lossy if another request
    arrives for the same slot queue and receives a
    too-busy response before the original request.

Introducing an evictor where each hot container can
register itself, if it is idle for more than 1 seconds.
Upon registry, these idle containers become eligible
for eviction.

In hot container launcher, in non-blocking mode,
before we attempt to emit a too-busy response, now
we attempt an evict. If this is successful, then
we wait some more. This could result in requests
waiting for more than they used to only if a
container was evicted. For blocking-mode, the
hot launcher uses hot-poll period to assess if
a request has waited for too long, then eviction
is triggered.
This commit is contained in:
Tolga Ceylan
2018-07-19 15:04:15 -07:00
committed by GitHub
parent 8e373005a0
commit 1258baeb7f
6 changed files with 365 additions and 40 deletions

View File

@@ -48,9 +48,6 @@ type ResourceTracker interface {
// machine. It must be called before GetResourceToken or GetResourceToken may hang.
// Memory is expected to be provided in MB units.
IsResourcePossible(memory, cpuQuota uint64, isAsync bool) bool
// returns number of waiters waiting for a resource token blocked on condition variable
GetResourceTokenWaiterCount() uint64
}
type resourceTracker struct {
@@ -77,8 +74,6 @@ type resourceTracker struct {
cpuAsyncUsed uint64
// cpu in use for async area in which agent stops dequeuing async jobs
cpuAsyncHWMark uint64
// number of waiters waiting for a token blocked on the condition variable
tokenWaiterCount uint64
}
func NewResourceTracker(cfg *AgentConfig) ResourceTracker {
@@ -142,17 +137,6 @@ func (a *resourceTracker) IsResourcePossible(memory uint64, cpuQuota uint64, isA
}
}
// returns number of waiters waiting for a resource token blocked on condition variable
func (a *resourceTracker) GetResourceTokenWaiterCount() uint64 {
var waiters uint64
a.cond.L.Lock()
waiters = a.tokenWaiterCount
a.cond.L.Unlock()
return waiters
}
func (a *resourceTracker) allocResourcesLocked(memory, cpuQuota uint64, isAsync bool) ResourceToken {
var asyncMem, syncMem uint64
@@ -271,9 +255,7 @@ func (a *resourceTracker) GetResourceToken(ctx context.Context, memory uint64, c
isWaiting = true
for !a.isResourceAvailableLocked(memory, cpuQuota, isAsync) && ctx.Err() == nil {
a.tokenWaiterCount++
c.Wait()
a.tokenWaiterCount--
}
isWaiting = false