* NOTE: the fnproject/dind release will need recutting and the
top-level Dockerfile updated to refer to it for this to be
complete.
In many k8s environments the host docker uses an overlay network
which'll take bytes away from the effective MTU of outer
containers; eg, vxlan needs 50 bytes, often leaving a 1450 MTU
on the container running dind and fn-api.
In such an arrangement, packets exceeding the smaller MTU may be
invisibly dropped as they travel across the dind's docker0
bridge. This mostly surfaces as a failure of functions to be able
to reliably talk to external services. (Note, the failure may be
intermittent depending on the profile of the resulting TCP
communication.)
A robust fix for this is to intercept the startup of the dind
dockerd and ensure that /etc/docker/daemon.json (currently
absent) contains the following setting:
{
"mtu": 1450
}
(or whatever the MTU on the external interface may be). This
should be autosized so the container works in a variety of
deployments.
The problem does not arise when using an embedded
/var/run/docker.sock - or when running with dind on a host that
can supply 1500-byte MTUs to containers on the 'host' docker.
this was getting bloated with various contexts and spans and stats
administrivia that obfuscated what was going on a lot. this makes some helper
methods to shove most of that stuff into, and simplifies the context handling
around getting a slot by moving it inside of slot acquisition code. also
removed most uses of `call.Model()` -- I'll kill this thing some day, but if a
reason is needed, then the overhead of dynamic dispatch is unnecessary, we're
inside of the implementee for the agent, we don't want to use the interface
methods inside of that.
*) revert executor wait queue size comparison. This is too
aggresive and with stall check below, now unnecessary.
*) new container logic now checks if stats are constant, if
this is the case, then we assume the system is stalled (eg
running functions that take long time), this means we need
to make progress and spin up a new container.
* Change basic stats to use opentracing rather than Prometheus API directly
* Just ran gofmt
* Extract opentracing access for metrics to common/metrics.go
* Replace quotes strings with constants where possible
Latency stats are not always read-time updated and
if calls are stuck in waiting state, isNewContainerNeeded()
needs to be a bit more aggresive if the wait queue grows.
* Logs should support specifying region when using S3-compatible object store
* Use aws-sdk-go client for s3 backed logstore
* fixes vendor with aws-sdk-go dependencies