previously we would retry infinitely up to the context with some backoff in
between. for hot functions, since we don't set any dead line on pulling or
creating the image, this means it would retry forever without making any
progress if e.g. the registry is inaccessable or any other temporary error
that isn't actually temporary. this adds a hard cap of 10 retries, which
gives approximately 13s if the ops take no time, still respecting the context
deadline enclosed.
the case where this was coming up is now tested for and was otherwise
confusing for users to debug, now it spits out an ECONNREFUSED with the
address of the registry, which should help users debug without having to poke
around fn logs (though I don't like this as an excuse, not all users will be
operators at some point in the near future, and this one makes sense)
closes#727
* return bad function http resp error
this was being thrown into the fn server logs but it's relatively easy to get
this to crop up if a function user forgets that they left a `println` laying
around that gets written to stdout, it garbles the http (or json, in its case)
output and they just see 'internal server error'. for certain clients i could
see that we really do want to keep this as 'internal server error' but for
things like e.g. docker image not authorized we're showing that in the
response, so this seems apt.
json likely needs the same treatment, will file a bug.
as always, my error messages are rarely helpful enough, help me please :)
closes#355
* add formatting directive
* fix up http error
* output bad jasons to user
closes#729
woo
* fn: hot container timer improvements
With this change, now we are allocating the timers
when the container starts and managing them via
stop/clear as needed, which should not only be more
efficient, but also easier to follow.
For example, previously, if eject time out was
set to 10 secs, this could have delayed idle timeout
up to 10 secs as well. It is also not necessary to do
any math for elapsed time.
Now consumers avoid any requeuing when startDequeuer() is cancelled.
This was triggering additional dequeue/requeue causing
containers to wake up spuriously. Also in startDequeuer(),
we no longer remove the item from the actual queue and
leave this to acquire/eject, which side steps issues related
with item landing in the channel, not consumed, etc.
closes#317
we could fiddle with this, but we need to at least bound these. this
accomplishes that. 1m is picked since that's our default max log size for the
time being per call, it also takes a little time to generate that many bytes
through logs, typically (i.e. without trying to). I tested with 0, which
spiked the i/o rate on my machine because it's constantly deleting the json
log file. I also tested with 1k and it was similar (for a task that generated
about 1k in logs quickly) -- in testing, this halved my throughput, whereas
using 1m did not change the throughput at all. trying the 'none' driver and
'syslog' driver weren't great, 'none' turns off all stderr and 'syslog' blocks
every log line (boo). anyway, this option seems to have no affect on the
output we get in 'attach', which is what we really care about (i.e. docker is
not logically capping this, just swapping out the log file).
using 1m for this, e.g. if we have 500 hot containers on a machine we have
potentially half a gig of worthless logs laying around. we don't need the
docker logs laying around at all really, but short of writing a storage driver
ourselves there don't seem to be too many better options. open to idears, but
this is likely to hold us over for some time.
* push validate/defaults into datastore
we weren't setting a timestamp in route insert when we needed to create an app
there. that whole thing isn't atomic, but this fixes the timestamp issue.
closes#738
seems like we should do similar with the FireBeforeX stuff too.
* fix tests
* app name validation was buggy, an upper cased letter failed. now it doesn't.
uses unicode now.
* removes duplicate errors for datastore and models validation that were used
interchangably but weren't.
* Rejig the build process
During a build, we check and rebuild any dependencies prior to
potentially using them.
Build:
- DIND (this only produces a new docker image, no local code changes)
- fnserver (built as part of the testing)
On master, if everything works, then we release the built artifacts,
if necessary:
- DIND (this pushes a docker image and a tag)
- fnserver (this builds the docker image and releases it, if necessary).
Fnserver is dealt with last by the release script: all previous steps
in CI use locally-run go tests rather than a docker file.
When a commit happens, we need to know (a) if we need to rebuild
a set of tools and artifacts (or whether we can continue to use
published ones); and (b) if we need to release new versions of
those tools, if all tests pass.
We do this by identifying the previous release tag on origin/master
(which is the release branch), then checking for changes between
that point at the current one.
Those changes may appear in various places in the tree: some simple
boolean rules work out whether the change means we need to rebuild
and rerelease.
* Make the fnproject/fnserver build use the latest dind
As docker bumps from 17.12.x, use whatever dind we just built.
* Use bash
* pipe swapparoo each slot
previously, we made a pair of pipes for stdin and stdout for each container,
and then handed them out to each call (slot) to use. this meant that multiple
calls could have a handle on the same stdin pipe and stdout pipe to read/write
to/from from fn's perspective and could mix input/output and get garbage. this
also meant that each was blocked on the previous' reads.
now we make a new pipe every time we get a slot, and swap it out with the
previous ones. calls are no longer blocked from fn's perspective, and we don't
have to worry about timing out dispatch for any hot format. there is still the
issue that if a function does not finish reading the input from the previous
task, from its perspective, and reads the next call's it can error out the
second call. with fn deadline we provide the necessary tools to skirt this,
but without some additional coordination am not sure this is a closable hole
with our current protocols since terminating a previous calls input requires
some protocol specific bytes to go in (json in particular is tricky). anyway,
from fn's side fixing pipes was definitely a hole, but this client hole is
still hanging out. there was an attempt to send an io.EOF but the issue is
that will shut down docker's read on the stdin pipe (and the container). poop.
this adds a test for this behavior, and makes sure 2 containers don't get
launched.
this also closes the response writer header race a little, but not entirely, I
think there's still a chance that we read a full response from a function and
get a timeout while we're changing the headers. I guess we need a thread safe
header bucket, otherwise we have to rely on timings (racy). thinking on it.
* fix stats mu race
* Improve deadline handling in streaming protocols
* Move special headers handling down to the protocols
* Adding function format documentation for JSON changes
* Add tests for request url and method in JSON protocol
* Fix protocol missing fn-specific info
* Fix import
* Add panic for something that should never happen