* fn: agent eviction revisited
Previously, the hot-container eviction logic used
number of waiters of cpu/mem resources to decide to
evict a container. An ejection ticker used to wake up
its associated container every 1 sec to reasses system
load based on waiter count. However, this does not work
for non-blocking agent since there are no waiters for
non-blocking mode.
Background on blocking versus non-blocking agent:
*) Blocking agent holds a request until the
the request is serviced or client times out. It assumes
the request can be eventually serviced when idle
containers eject themselves or busy containers finish
their work.
*) Non-blocking mode tries to limit this wait time.
However non-blocking agent has never been truly
non-blocking. This simply means that we only
make a request wait if we take some action in
the system. Non-blocking agents are configured with
a much higher hot poll frequency to make the system
more responsive as well as to handle cases where an
too-busy event is missed by the request. This is because
the communication between hot-launcher and waiting
requests are not 1-1 and lossy if another request
arrives for the same slot queue and receives a
too-busy response before the original request.
Introducing an evictor where each hot container can
register itself, if it is idle for more than 1 seconds.
Upon registry, these idle containers become eligible
for eviction.
In hot container launcher, in non-blocking mode,
before we attempt to emit a too-busy response, now
we attempt an evict. If this is successful, then
we wait some more. This could result in requests
waiting for more than they used to only if a
container was evicted. For blocking-mode, the
hot launcher uses hot-poll period to assess if
a request has waited for too long, then eviction
is triggered.
Status calls should not directly use client
gRPC context deadlines/timeouts during Status
execution. Status should allow plenty of time
for the scheduler agent and docker to run and
emit useful error information.
Setting this timeout to 60 seconds, which should
surface disk I/O, docker, etc. issues.
This is useful in scenarios where gRPC client might want
to reliably observe/report the status latency metrics
and remove any possible duplicates. If the status query
was served from cache, then these latencies show last
execution latency.
* fn: runner status and docker load images
Introducing a function run for pure runner Status
calls. Previously, Status gRPC calls returned active
inflight request counts with the purpose of a simple
health checker. However this is not sufficient since
it does not show if agent or docker is healthy. With
this change, if pure runner is configured with a status
image, that image is executed through docker. The
call uses zero memory/cpu/tmpsize settings to ensure
resource tracker does not block it.
However, operators might not always have a docker
repository accessible/available for status image. Or
operators might not want the status to go over the
network. To allow such cases, and in general possibly
caching docker images, added a new environment variable
FN_DOCKER_LOAD_FILE. If this is set, fn-agent during
startup will load these images that were previously
saved with 'docker save' into docker.
* Initial suypport for invoking tiggers
* dupe method
* tighten server constraints
* runner tests not working yet
* basic route tests passing
* post rebase fixes
* add hybrid support for trigger invoke and tests
* consoloidate all hybrid evil into one place
* cleanup and make triggers unique by source
* fix oops with Agent
* linting
* review fixes
LB agent reports lb placer latency. It should also report
how long it took for the runner to initiate the call as
well as execution time inside the container if the runner
has accepted (committed) to the call.
* Don't try to delete an app that wasn't successfully created in the case of failure
* Allow datastore implementations to inject additional annotations on objects
* Allow for datastores transparently adding annotations on apps, fns and triggers. Change NameIn filter to Name for apps.
* Move *List types including JSON annotations for App, Fn and Trigger into models
* Change return types for GetApps, GetFns and GetTriggers on datastore to
be models.*List and ove cursor generation into datastore
* Trigger cursor handling fixed into db layer
Also changes the name generation so that it is not in the same order
as the id (well is random), this means we are now testing our name ordering.
* GetFns now respects cursors
* Apps now feeds cursor back
* Mock fixes
* Fixing up api level cursor decoding
* Tidy up treatment of cursors in the db layer
* Adding conditions for non nil items lists
* fix mock test
* Fixed up a couple of incorrect response codes
* Standardise all entities on 204 with no return content on successful delete
* Fix failing Fn.delete() test
Vast commit, includes:
* Introduces the Trigger domain entity.
* Introduces the Fns domain entity.
* V2 of the API for interacting with the new entities in swaggerv2.yml
* Adds v2 end points for Apps to support PUT updates.
* Rewrites the datastore level tests into a new pattern.
* V2 routes use entity ID over name as the path parameter.
This is a small tweak to the placer latency stats. If we have a cluster of values
around the 1-2s mark, then having a single relatively broad bucket that captures
the (1s, 10s] range will obscure that. In particular, typical Prometheus quartile
estimates may be distorted by this bucket size.