* http now buffers the entire request body from the container before copying
it to the response writer (and sets content length). this is a level of sad i
don't feel comfortable talking about but it is what it is.
* json protocol was buffering the entire body so there wasn't any reason for
us to try to write this directly to the container stdin manually, we needed to
add a bufio.Writer around it anyway it was making too many write(fd) syscalls
with the way it was. this is just easier overall and has the same performance
as http now in my tests, whereas previously this was 50% slower [than http].
* add buffer pool for http & json to share/use. json doesn't create a new
buffer every stinkin request. we need to plumb down content length so that we
can properly size the buffer for json, have to add header size and everything
together but it's probably faster than malloc(); punting on properly sizing.
* json now sets content type to the length of the body from the returned json
blurb from the container
this does not handle imposing a maximum size of the response returned from a
container, which we need to add, but this has been open for some time
(specifically, on json). we can impose this by wrapping the pipes, but there's
some discussion to be had for json specifically we won't be able to just cut
off the output stream and use that (http we can do this). anyway, filing a
ticket...
closes#326 :(((((((
i would split this commit in two if i were a good dev.
the pprof stuff is really useful and this only samples when called. this is
pretty standard go service stuff. expvar is cool, too.
the additional spannos have turned up some interesting tid bits... gonna slide
em in
1) in dind, prevent SIGINT reaching to dockerd. This kills
docker and prevents shutdown as fn server is trying to stop.
2) as init process, always reap child processes.
we have been getting these from attach all this time and never needed these
anyway.
I ran cpu profiles of dockerd and this was 90% of docker cpu usage (json
logs). woot. this will reduce i/o quite a bit, and we don't have to worry
about them taking up any disk space either.
from tests i get about 50% speedup with these off. the hunt continues...
This PR changes the instructions for running Prometheus and Grafana in Docker to use the `--add-host` parameter instead of the `--link` parameter which is deprecated and which has been reported by @shaunsmith to sometimes not work.
The supplied command has been verified by @shaunsmith on Mac and Linux and by me on Linux.
previously we would retry infinitely up to the context with some backoff in
between. for hot functions, since we don't set any dead line on pulling or
creating the image, this means it would retry forever without making any
progress if e.g. the registry is inaccessable or any other temporary error
that isn't actually temporary. this adds a hard cap of 10 retries, which
gives approximately 13s if the ops take no time, still respecting the context
deadline enclosed.
the case where this was coming up is now tested for and was otherwise
confusing for users to debug, now it spits out an ECONNREFUSED with the
address of the registry, which should help users debug without having to poke
around fn logs (though I don't like this as an excuse, not all users will be
operators at some point in the near future, and this one makes sense)
closes#727
* return bad function http resp error
this was being thrown into the fn server logs but it's relatively easy to get
this to crop up if a function user forgets that they left a `println` laying
around that gets written to stdout, it garbles the http (or json, in its case)
output and they just see 'internal server error'. for certain clients i could
see that we really do want to keep this as 'internal server error' but for
things like e.g. docker image not authorized we're showing that in the
response, so this seems apt.
json likely needs the same treatment, will file a bug.
as always, my error messages are rarely helpful enough, help me please :)
closes#355
* add formatting directive
* fix up http error
* output bad jasons to user
closes#729
woo
* fn: hot container timer improvements
With this change, now we are allocating the timers
when the container starts and managing them via
stop/clear as needed, which should not only be more
efficient, but also easier to follow.
For example, previously, if eject time out was
set to 10 secs, this could have delayed idle timeout
up to 10 secs as well. It is also not necessary to do
any math for elapsed time.
Now consumers avoid any requeuing when startDequeuer() is cancelled.
This was triggering additional dequeue/requeue causing
containers to wake up spuriously. Also in startDequeuer(),
we no longer remove the item from the actual queue and
leave this to acquire/eject, which side steps issues related
with item landing in the channel, not consumed, etc.
closes#317
we could fiddle with this, but we need to at least bound these. this
accomplishes that. 1m is picked since that's our default max log size for the
time being per call, it also takes a little time to generate that many bytes
through logs, typically (i.e. without trying to). I tested with 0, which
spiked the i/o rate on my machine because it's constantly deleting the json
log file. I also tested with 1k and it was similar (for a task that generated
about 1k in logs quickly) -- in testing, this halved my throughput, whereas
using 1m did not change the throughput at all. trying the 'none' driver and
'syslog' driver weren't great, 'none' turns off all stderr and 'syslog' blocks
every log line (boo). anyway, this option seems to have no affect on the
output we get in 'attach', which is what we really care about (i.e. docker is
not logically capping this, just swapping out the log file).
using 1m for this, e.g. if we have 500 hot containers on a machine we have
potentially half a gig of worthless logs laying around. we don't need the
docker logs laying around at all really, but short of writing a storage driver
ourselves there don't seem to be too many better options. open to idears, but
this is likely to hold us over for some time.
* push validate/defaults into datastore
we weren't setting a timestamp in route insert when we needed to create an app
there. that whole thing isn't atomic, but this fixes the timestamp issue.
closes#738
seems like we should do similar with the FireBeforeX stuff too.
* fix tests
* app name validation was buggy, an upper cased letter failed. now it doesn't.
uses unicode now.
* removes duplicate errors for datastore and models validation that were used
interchangably but weren't.