Moving the timeout management of various docker operations
to agent. This allows for finer control over what operation
should use. For instance, for pause/unpause our tolerance
is very low to avoid resource issues. For docker remove,
the consequences of failure will lead to potential agent
failure and therefore we wait up to 10 minute.
For cookie create/prepare (which includes docker-pull)
we cap this at 10 minutes by default.
With new UDS/FDK contract, health check is now obsoleted
as container advertise health using UDS availibility.
* improve contribution guide
* at least a stub security bug section, this works as a policy most likely
until we can improve this (TODO we need an email for this!)
* adds info in CONTRIBUTING.md for creating helpful normal issues for us, this
is much the same info as the template
* coding style section. This was lacking, and led to wishy washy reviews. now
we have an official reference in place to point at for 'do this this way
please' and not just random opinions. everyone should read this if they
haven't! I have it bookmarked...
* info on creating useful commit messages and commit formatting, like code
reviews, its nice to have this in the contrib guide to reference when asking
people to do this so that it's not just a one off opinion
tried to make this pretty lax, the last thing i/we want is for the
contributing process to be overbearing, I do think the contribution guide
serves the dual purpose of best practice enforcement as well as helping people
to maneuver the process to make it easier for all of us (them included).
open to idears. this is a convergence of a few guides from popular repos
* update what's in core section to reduce confusion
* get rid of old format stuff, utils usage, fix up for fdk2.0 interface
* pure agent format removal, TODO remove format field, fix up all tests
* shitter's clogged
* fix agent tests
* start rolling through server tests
* tests compile, some failures
* remove json / content type detection on invoke/httptrigger, fix up tests
* remove hello, fixup system tests
the fucking status checker test just hangs and it's testing that it doesn't
work so the test passes but the test doesn't pass fuck life it's not worth it
* fix migration
* meh
* make dbhelper shut up about dbhelpers not being used
* move fail status at least into main thread, jfc
* fix status call to have FN_LISTENER
also turns off the stdout/stderr blocking between calls, because it's
impossible to debug without that (without syslog), now that stdout and stderr
go to the same place (either to host stderr or nowhere) and isn't used for
function output this shouldn't be a big fuss really
* remove stdin
* cleanup/remind: fixed bug where watcher would leak if container dies first
* silence system-test logs until fail, fix datastore tests
postgres does weird things with constraints when renaming tables, took the
easy way out
system-tests were loud as fuck and made you download a circleci text file of
the logs, made them only yell when they goof
* fix fdk-go dep for test image. fun
* fix swagger and remove test about format
* update all the gopkg files
* add back FN_FORMAT for fdks that assert things. pfft
* add useful error for functions that exit
this error is really confounding because containers can exit for all manner of
reason, we're just guessing that this is the most likely cause for now, and
this error message should very likely change or be removed from the client
path anyway (context.Canceled wasn't all that useful either, but anyway, I'd
been hunting for this... so found it). added a test to avoid being publicly
shamed for 1 line commits (beware...).
Previously evictor did not perform an eviction
if total cpu/mem of evictable containers was less
than requested cpu/mem. With this change, we
try to perform evictions based on actual needed cpu & mem
reported by resource tracker.
Currently the default time format, time.RFC3339, is used, which doesn't include any
subsecond resolution information. This makes it hard to understand the
ordering of log messages when viewing in a log aggregator, like
Kibana.
This change sets the TimestampFormat of the logrus JSONFormatter to
time.RFC3339Nano.
* the dispatch span actually encloses dispatch and gives an accurate span now
* turning a call into an http request can't fail unless it's our fault, if
tests don't catch this, we don't deserve money
* moved http req creation inside of dispatch goroutine
there's further work to do cleaning up dispatch... removing the old formats
will make this slightly more clear, waiting for that. this was bugging me
anyway after seeing something else and was easy to fix up.
Streaming docker events is useful as we can record/capture some
asynchronous containers events such as out-of-memory. For now,
we record these in opencensus/prometheus stats.
Default fn server keys should be minimal (empty) since not
all stats have associated app name, fn id, etc.
API tags for requests should not include "status" as this is
part of responses.
If checkLaunch triggers evictions, it must wait
for these eviction to complete before returning.
Premature returning from checkLaunch will cause
checkLaunch to be called again by hot launcher.
This causes checkLaunch to receive an out of
capacity error and causes a 503.
The evictor is also improved with this PR and it
provides a slice of channels to wait on if evictions
are taking place.
Eviction token deletion is performed *after*
resource token close to ensure that once an
eviction is done, resource token is also free.
This simplifies resource tracker. Originally, logically we had
split the cpu/mem into two pools where a 20%
was kept specifically for sync calls to avoid
async calls dominating the system. However, resource
tracker should not handle such call prioritization.
Given the improvements to the evictor, I think
we can get rid of this code in resource tracker
for time being.