we were using the httputil.DumpRequest when there is a perfectly good
req.Write method hanging out in the stdlib, that even does the chunked thing
that a few people ran into if they don't provide a content length:
https://golang.org/pkg/net/http/#Request.Write -- so we shouldn't run into
that issue again. I hit this in testing and it was not very fun to debug, so
added a test that repro'd it on master and fixes it here. of course, adding a
content length works too. tested this and it appears to work pretty well, also
cleaned up the control flow a little bit in http protocol.
*) Stopped using latency previous/current stats, this
was not working as expected. Fresh starts usually have
these stats zero for a long time, and initial samples
are high due to downloads, caches, etc.
*) New state to track: containers that are idle. In other
words, containers that have an unused token in the slot
queue.
*) Removed latency counts since these are not used in
container start decision anymore. Simplifies logs.
*) isNewContainerNeeded() simplified to use idle count
to estimate effective waiters. Removed speculative
latency based logic and progress check comparison.
In agent, waitHot() delayed signalling compansates
for these changes. If the estimation may fail, but
this should correct itself in the next 200 msec
signal.
* fn: resource and slot cancel and broadcast improvements
*) Context argument does not wake up the waiters correctly upon
cancellation/timeout.
*) Avoid unnecessary broadcasts in slot and resource.
* fn: limit scope of context in resource/slot calls in agent
since we were sending a signal before checking if a slot was available, even
in the case of serial calls locally I was seeing 2 containers launch. if we
only send a signal after first checking if a slot is available, this goes
away. 1 usec should not be too offensive of an additional wait, all things
considered here.
* fn: cancellations in WaitAsyncResource
Added go context with cancel to wait async resource. Although
today, the only case for cancellation is shutdown, this cleans
up agent shutdown a little bit.
* fn: locked broadcast to avoid missed wake-ups
* fn: removed ctx arg to WaitAsyncResource and startDequeuer
This is confusing and unnecessary.
this was getting bloated with various contexts and spans and stats
administrivia that obfuscated what was going on a lot. this makes some helper
methods to shove most of that stuff into, and simplifies the context handling
around getting a slot by moving it inside of slot acquisition code. also
removed most uses of `call.Model()` -- I'll kill this thing some day, but if a
reason is needed, then the overhead of dynamic dispatch is unnecessary, we're
inside of the implementee for the agent, we don't want to use the interface
methods inside of that.
*) revert executor wait queue size comparison. This is too
aggresive and with stall check below, now unnecessary.
*) new container logic now checks if stats are constant, if
this is the case, then we assume the system is stalled (eg
running functions that take long time), this means we need
to make progress and spin up a new container.
* Change basic stats to use opentracing rather than Prometheus API directly
* Just ran gofmt
* Extract opentracing access for metrics to common/metrics.go
* Replace quotes strings with constants where possible
Latency stats are not always read-time updated and
if calls are stuck in waiting state, isNewContainerNeeded()
needs to be a bit more aggresive if the wait queue grows.
* Logs should support specifying region when using S3-compatible object store
* Use aws-sdk-go client for s3 backed logstore
* fixes vendor with aws-sdk-go dependencies
* add FN_LOG_DEST for logs, fixup init
* FN_LOG_DEST can point to a remote logging place (papertrail, whatever)
* FN_LOG_PREFIX can add a prefix onto each log line sent to FN_LOG_DEST
default remains stderr with no prefix. users need this to send to various
logging backends, though it could be done operationally, this is somewhat
simpler.
we were doing some configuration stuff inside of init() for some of the global
things. even though they're global, it's nice to keep them all in the normal
server init path.
we have had strange issues with the tracing setup, I tested the last repro of
this repeatedly and didn't have any luck reproducing it, though maybe it comes
back.
* add docs