we were using the httputil.DumpRequest when there is a perfectly good
req.Write method hanging out in the stdlib, that even does the chunked thing
that a few people ran into if they don't provide a content length:
https://golang.org/pkg/net/http/#Request.Write -- so we shouldn't run into
that issue again. I hit this in testing and it was not very fun to debug, so
added a test that repro'd it on master and fixes it here. of course, adding a
content length works too. tested this and it appears to work pretty well, also
cleaned up the control flow a little bit in http protocol.
*) Stopped using latency previous/current stats, this
was not working as expected. Fresh starts usually have
these stats zero for a long time, and initial samples
are high due to downloads, caches, etc.
*) New state to track: containers that are idle. In other
words, containers that have an unused token in the slot
queue.
*) Removed latency counts since these are not used in
container start decision anymore. Simplifies logs.
*) isNewContainerNeeded() simplified to use idle count
to estimate effective waiters. Removed speculative
latency based logic and progress check comparison.
In agent, waitHot() delayed signalling compansates
for these changes. If the estimation may fail, but
this should correct itself in the next 200 msec
signal.
If we need to reissue fnproject/dind:17.12 (which fnproject/fnserver
is based upon) then let's make sure we're using the latest one
when cutting a release.
To ensure we don't accidentally use stale images lying around in
the docker cache (there probably shouldn't be *any*), call
make clear-images
before running the build.
* fn: resource and slot cancel and broadcast improvements
*) Context argument does not wake up the waiters correctly upon
cancellation/timeout.
*) Avoid unnecessary broadcasts in slot and resource.
* fn: limit scope of context in resource/slot calls in agent
since we were sending a signal before checking if a slot was available, even
in the case of serial calls locally I was seeing 2 containers launch. if we
only send a signal after first checking if a slot is available, this goes
away. 1 usec should not be too offensive of an additional wait, all things
considered here.
* fn: cancellations in WaitAsyncResource
Added go context with cancel to wait async resource. Although
today, the only case for cancellation is shutdown, this cleans
up agent shutdown a little bit.
* fn: locked broadcast to avoid missed wake-ups
* fn: removed ctx arg to WaitAsyncResource and startDequeuer
This is confusing and unnecessary.
* NOTE: the fnproject/dind release will need recutting and the
top-level Dockerfile updated to refer to it for this to be
complete.
In many k8s environments the host docker uses an overlay network
which'll take bytes away from the effective MTU of outer
containers; eg, vxlan needs 50 bytes, often leaving a 1450 MTU
on the container running dind and fn-api.
In such an arrangement, packets exceeding the smaller MTU may be
invisibly dropped as they travel across the dind's docker0
bridge. This mostly surfaces as a failure of functions to be able
to reliably talk to external services. (Note, the failure may be
intermittent depending on the profile of the resulting TCP
communication.)
A robust fix for this is to intercept the startup of the dind
dockerd and ensure that /etc/docker/daemon.json (currently
absent) contains the following setting:
{
"mtu": 1450
}
(or whatever the MTU on the external interface may be). This
should be autosized so the container works in a variety of
deployments.
The problem does not arise when using an embedded
/var/run/docker.sock - or when running with dind on a host that
can supply 1500-byte MTUs to containers on the 'host' docker.
this was getting bloated with various contexts and spans and stats
administrivia that obfuscated what was going on a lot. this makes some helper
methods to shove most of that stuff into, and simplifies the context handling
around getting a slot by moving it inside of slot acquisition code. also
removed most uses of `call.Model()` -- I'll kill this thing some day, but if a
reason is needed, then the overhead of dynamic dispatch is unnecessary, we're
inside of the implementee for the agent, we don't want to use the interface
methods inside of that.