Make sure we can apply extra tags if RegisterAPIViews() is
provided with such tags. Deduplicate path/method/status and
always apply these default tags to appropriate views.
SSL related FN_NODE_CERT (and related) settings are
not very clear today. Removing this in favor of a
simple map of tls.Config objects. Three keys are
provided for this map:
TLSGRPCServer
TLSAdminServer
TLSWebServer
which correspond to server TLS settings for the
associated services.
Operators/implementers can further add more
keys to the map and add their own TLS config.
* fn: stats view/distribution improvements
*) View latency distribution is now an argument
in view creation functions. This allows easier
override to set custom buckets. It is simplistic
and assumes all latency views would use the same
set, but in practice this is already the case.
*) Removed API view creation to main, this should not
be enabled for all node types. This is consistent with
the rest of the system.
* fn: Docker samples of cpu/mem/disk with specific buckets
moves the config option for max request size up to the front end, adds the env
var for it there, adds a server test for it and removes it from agent. a
request is either gonna come through the lb (before grpc) or to the server, we
can handle limiting the request there at least now, which may be easier than
having multiple layers of request body checking. this aligns with not making
the agent as responsible for http behaviors (eventually, not at all once route
is fully deprecated).
* fn: New timeout for LB Placer
Previously, LB Placers worked hard as long as
client contexts allowed for. Adding a Placer
config setting to bound this by 360 seconds by
default.
The new timeout is not accounted during actual
function execution and only applies to the amount
of wait time in Placers when the call is not
being executed.
* fn: cleanup of docker private registry code
Start using URL parsed ServerAddress and its subdomains
for easier image ensure/pull in docker driver. Previous
code to lookup substrings was faulty without proper
URL parse and hostname tokenization. When searching
for a registry config, if image name does not contain
a registry and if there's a private registry configured,
then search for hub.docker.com and index.docker.io. This
is similar to previous code but with correct subdomain
matching.
* fn-dataplane: take port into account in auth configs
* fn: agent eviction revisited
Previously, the hot-container eviction logic used
number of waiters of cpu/mem resources to decide to
evict a container. An ejection ticker used to wake up
its associated container every 1 sec to reasses system
load based on waiter count. However, this does not work
for non-blocking agent since there are no waiters for
non-blocking mode.
Background on blocking versus non-blocking agent:
*) Blocking agent holds a request until the
the request is serviced or client times out. It assumes
the request can be eventually serviced when idle
containers eject themselves or busy containers finish
their work.
*) Non-blocking mode tries to limit this wait time.
However non-blocking agent has never been truly
non-blocking. This simply means that we only
make a request wait if we take some action in
the system. Non-blocking agents are configured with
a much higher hot poll frequency to make the system
more responsive as well as to handle cases where an
too-busy event is missed by the request. This is because
the communication between hot-launcher and waiting
requests are not 1-1 and lossy if another request
arrives for the same slot queue and receives a
too-busy response before the original request.
Introducing an evictor where each hot container can
register itself, if it is idle for more than 1 seconds.
Upon registry, these idle containers become eligible
for eviction.
In hot container launcher, in non-blocking mode,
before we attempt to emit a too-busy response, now
we attempt an evict. If this is successful, then
we wait some more. This could result in requests
waiting for more than they used to only if a
container was evicted. For blocking-mode, the
hot launcher uses hot-poll period to assess if
a request has waited for too long, then eviction
is triggered.
Status calls should not directly use client
gRPC context deadlines/timeouts during Status
execution. Status should allow plenty of time
for the scheduler agent and docker to run and
emit useful error information.
Setting this timeout to 60 seconds, which should
surface disk I/O, docker, etc. issues.