fn: agent eviction revisited (#1131)

* fn: agent eviction revisited Previously, the hot-container eviction logic used number of waiters of cpu/mem resources to decide to evict a container. An ejection ticker used to wake up its associated container every 1 sec to reasses system load based on waiter count. However, this does not work for non-blocking agent since there are no waiters for non-blocking mode. Background on blocking versus non-blocking agent: *) Blocking agent holds a request until the the request is serviced or client times out. It assumes the request can be eventually serviced when idle containers eject themselves or busy containers finish their work. *) Non-blocking mode tries to limit this wait time. However non-blocking agent has never been truly non-blocking. This simply means that we only make a request wait if we take some action in the system. Non-blocking agents are configured with a much higher hot poll frequency to make the system more responsive as well as to handle cases where an too-busy event is missed by the request. This is because the communication between hot-launcher and waiting requests are not 1-1 and lossy if another request arrives for the same slot queue and receives a too-busy response before the original request. Introducing an evictor where each hot container can register itself, if it is idle for more than 1 seconds. Upon registry, these idle containers become eligible for eviction. In hot container launcher, in non-blocking mode, before we attempt to emit a too-busy response, now we attempt an evict. If this is successful, then we wait some more. This could result in requests waiting for more than they used to only if a container was evicted. For blocking-mode, the hot launcher uses hot-poll period to assess if a request has waited for too long, then eviction is triggered.
2022-10-28 21:29:17 +03:00 · 2018-07-19 15:04:15 -07:00
parent 8e373005a0
commit 1258baeb7f
6 changed files with 365 additions and 40 deletions
--- a/api/agent/resource.go
+++ b/api/agent/resource.go
@@ -48,9 +48,6 @@ type ResourceTracker interface {
 	// machine. It must be called before GetResourceToken or GetResourceToken may hang.
 	// Memory is expected to be provided in MB units.
 	IsResourcePossible(memory, cpuQuota uint64, isAsync bool) bool
-
-	// returns number of waiters waiting for a resource token blocked on condition variable
-	GetResourceTokenWaiterCount() uint64
 }

 type resourceTracker struct {
@@ -77,8 +74,6 @@ type resourceTracker struct {
 	cpuAsyncUsed uint64
 	// cpu in use for async area in which agent stops dequeuing async jobs
 	cpuAsyncHWMark uint64
-	// number of waiters waiting for a token blocked on the condition variable
-	tokenWaiterCount uint64
 }

 func NewResourceTracker(cfg *AgentConfig) ResourceTracker {
@@ -142,17 +137,6 @@ func (a *resourceTracker) IsResourcePossible(memory uint64, cpuQuota uint64, isA
 	}
 }

-// returns number of waiters waiting for a resource token blocked on condition variable
-func (a *resourceTracker) GetResourceTokenWaiterCount() uint64 {
-	var waiters uint64
-
-	a.cond.L.Lock()
-	waiters = a.tokenWaiterCount
-	a.cond.L.Unlock()
-
-	return waiters
-}
-
 func (a *resourceTracker) allocResourcesLocked(memory, cpuQuota uint64, isAsync bool) ResourceToken {

 	var asyncMem, syncMem uint64
@@ -271,9 +255,7 @@ func (a *resourceTracker) GetResourceToken(ctx context.Context, memory uint64, c

 		isWaiting = true
 		for !a.isResourceAvailableLocked(memory, cpuQuota, isAsync) && ctx.Err() == nil {
-			a.tokenWaiterCount++
 			c.Wait()
-			a.tokenWaiterCount--
 		}
 		isWaiting = false