Container initialization phase consumes resource tracker
resources (token), during lengthy operations.
In order for agent stability/liveness, this phase has
to be evictable/cancelable and time bounded.
With this change, introducing a new system wide environment setting
to bound the time spent in container initialization phase. This phase
includes docker-pull, docker-create, docker-attach, docker-start
and UDS wait operations. This initialization period is also now
considered evictable.
* get rid of old format stuff, utils usage, fix up for fdk2.0 interface
* pure agent format removal, TODO remove format field, fix up all tests
* shitter's clogged
* fix agent tests
* start rolling through server tests
* tests compile, some failures
* remove json / content type detection on invoke/httptrigger, fix up tests
* remove hello, fixup system tests
the fucking status checker test just hangs and it's testing that it doesn't
work so the test passes but the test doesn't pass fuck life it's not worth it
* fix migration
* meh
* make dbhelper shut up about dbhelpers not being used
* move fail status at least into main thread, jfc
* fix status call to have FN_LISTENER
also turns off the stdout/stderr blocking between calls, because it's
impossible to debug without that (without syslog), now that stdout and stderr
go to the same place (either to host stderr or nowhere) and isn't used for
function output this shouldn't be a big fuss really
* remove stdin
* cleanup/remind: fixed bug where watcher would leak if container dies first
* silence system-test logs until fail, fix datastore tests
postgres does weird things with constraints when renaming tables, took the
easy way out
system-tests were loud as fuck and made you download a circleci text file of
the logs, made them only yell when they goof
* fix fdk-go dep for test image. fun
* fix swagger and remove test about format
* update all the gopkg files
* add back FN_FORMAT for fdks that assert things. pfft
* add useful error for functions that exit
this error is really confounding because containers can exit for all manner of
reason, we're just guessing that this is the most likely cause for now, and
this error message should very likely change or be removed from the client
path anyway (context.Canceled wasn't all that useful either, but anyway, I'd
been hunting for this... so found it). added a test to avoid being publicly
shamed for 1 line commits (beware...).
Largely a removal job, however many tests, particularly system level
ones relied on Routes. These have been migrated to use Fns.
* Add 410 response to swagger
* No app names in log tags
* Adding constraint in GetCall for FnID
* Adding test to check FnID is required on call
* Add fn_id to call selector
* Fix text in docker mem warning
* Correct buildConfig func name
* Test fix up
* Removing CPU setting from Agent test
CPU setting has been deprecated, but the code base is still riddled
with it. This just removes it from this layer. Really we need to
remove it from Call.
* Remove fn id check on calls
* Reintroduce fn id required on call
* Adding fnID to calls for execute test
* Correct setting of app id in middleware
* Removes root middlewares ability to redirect fun invocations
* Add over sized test check
* Removing call fn id check
* fn: paused and evicted container stats
With this change, now stats reports paused state
as well as incidents of container exit due to evictions.
* fn: update/document state transitions in state tracker
There's no case of a transition moving from done to waiting. This
must be deprecated behavior.
* fn: agent eviction revisited
Previously, the hot-container eviction logic used
number of waiters of cpu/mem resources to decide to
evict a container. An ejection ticker used to wake up
its associated container every 1 sec to reasses system
load based on waiter count. However, this does not work
for non-blocking agent since there are no waiters for
non-blocking mode.
Background on blocking versus non-blocking agent:
*) Blocking agent holds a request until the
the request is serviced or client times out. It assumes
the request can be eventually serviced when idle
containers eject themselves or busy containers finish
their work.
*) Non-blocking mode tries to limit this wait time.
However non-blocking agent has never been truly
non-blocking. This simply means that we only
make a request wait if we take some action in
the system. Non-blocking agents are configured with
a much higher hot poll frequency to make the system
more responsive as well as to handle cases where an
too-busy event is missed by the request. This is because
the communication between hot-launcher and waiting
requests are not 1-1 and lossy if another request
arrives for the same slot queue and receives a
too-busy response before the original request.
Introducing an evictor where each hot container can
register itself, if it is idle for more than 1 seconds.
Upon registry, these idle containers become eligible
for eviction.
In hot container launcher, in non-blocking mode,
before we attempt to emit a too-busy response, now
we attempt an evict. If this is successful, then
we wait some more. This could result in requests
waiting for more than they used to only if a
container was evicted. For blocking-mode, the
hot launcher uses hot-poll period to assess if
a request has waited for too long, then eviction
is triggered.
* Initial suypport for invoking tiggers
* dupe method
* tighten server constraints
* runner tests not working yet
* basic route tests passing
* post rebase fixes
* add hybrid support for trigger invoke and tests
* consoloidate all hybrid evil into one place
* cleanup and make triggers unique by source
* fix oops with Agent
* linting
* review fixes
* fn: lb & pure-runner slot hash id communication
With this change, LB can pre-calculate the slot hash
key and pass it to runners. If LB knows/calculates
the slot hash ids, then it can also make better
estimates on which runner can successfully execute
it especially when status messages from runner
include a small summary of idle slots for a given
slot hash id. (TODO)
* fn: fix mock test
* fn: size restricted tmpfs /tmp and read-only / support
*) read-only Root Fs Support
*) removed CPUShares from docker API. This was unused.
*) docker.Prepare() refactoring
*) added docker.configureTmpFs() for size limited tmpfs on /tmp
*) tmpfs size support in routes and resource tracker
*) fix fn-test-utils to handle sparse files better in create file
* test typo fix
* add user syslog writers to app
users may specify a syslog url[s] on apps now and all functions under that app
will spew their logs out to it. the docs have more information around details
there, please review those (swagger and operating/logging.md), tried to
implement to spec in some parts and improve others, open to feedback on
format though, lots of liberty there.
design decision wise, I am looking to the future and ignoring cold containers.
the overhead of the connections there will not be worth it, so this feature
only works for hot functions, since we're killing cold anyway (even if a user
can just straight up exit a hot container).
syslog connections will be opened against a container when it starts up, and
then the call id that is logged gets swapped out for each call that goes
through the container, this cuts down on the cost of opening/closing
connections significantly. there are buffers to accumulate logs until we get a
`\n` to actually write a syslog line, and a buffer to save some bytes when
we're writing the syslog formatting as well. underneath writers re-use the
line writer in certain scenarios (swapper). we could likely improve the ease
of setting this up, but opening the syslog conns against a container seems
worth it, and is a different path than the other func loggers that we create
when we make a call object. the Close() stuff is a little tricky, not sure how
to make it easier and have the ^ benefits, open to idears.
this does add another vector of 'limits' to consider for more strict service
operators. one being how many syslog urls can a user add to an app (infinite,
atm) and the other being on the order of number of containers per host we
could run out of connections in certain scenarios. there may be some utility
in having multiple syslog sinks to send to, it could help with debugging at
times to send to another destination or if a user is a client w/ someone and
both want the function logs, e.g. (have used this for that in the past,
specifically).
this also doesn't work behind a proxy, which is something i'm open to fixing,
but afaict will require a 3rd party dependency (we can pretty much steal what
docker does). this is mostly of utility for those of us that work behind a
proxy all the time, not really for end users.
there are some unit tests. integration tests for this don't sound very fun to
maintain. I did test against papertrail with each protocol and it works (and
even times out if you're behind a proxy!).
closes#337
* add trace to syslog dial
* fn: non-blocking resource tracker and notification
For some types of errors, we might want to notify
the actual caller if the error is directly 1-1 tied
to that request. If hotLauncher is triggered with
signaller, then here we send a back communication
error notification channel. This is passed to
checkLaunch to send back synchronous responses
to the caller that initiated this hot container
launch.
This is useful if we want to run the agent in
quick fail mode, where instead of waiting for
CPU/Mem to become available, we prefer to fail
quick in order not to hold up the caller.
To support this, non-blocking resource tracker
option/functions are now available.
* fn: test env var rename tweak
* fn: fixup merge
* fn: rebase test fix
* fn: merge fixup
* fn: test tweak down to 70MB for 128MB total
* fn: refactor token creation and use broadcast regardless
* fn: nb description
* fn: bugfix
when the annotations change, we need to get a different slot key to launch new
containers with these annotations and let the old containers die off unused.
I started on a test for this, but changing all combinations of each field in
isolation to change is not very fun without reflection, and there's still a
subset of fields we're managing, so it put us in about the same spot as we are
now.
There are alternative formulations of this, for instance see
https://www.reddit.com/r/golang/comments/5zctpf/unsafe_conversion_between_strings_and_byte_slices/
The problem manifested in the returned values from unsafeBytes occasionally
being broken. It's possible that by keeping a reference to the `a` parameter
alive, the original code would still work - however, this definitely seems like
a fix.
(A cast to `[]byte(a)` looks increasingly attractive, for all that it'll
perform small allocations and copies.)
* App ID
* Clean-up
* Use ID or name to reference apps
* Can use app by name or ID
* Get rid of AppName for routes API and model
routes API is completely backwards-compatible
routes API accepts both app ID and name
* Get rid of AppName from calls API and model
* Fixing tests
* Get rid of AppName from logs API and model
* Restrict API to work with app names only
* Addressing review comments
* Fix for hybrid mode
* Fix rebase problems
* Addressing review comments
* Addressing review comments pt.2
* Fixing test issue
* Addressing review comments pt.3
* Updated docstring
* Adjust UpdateApp SQL implementation to work with app IDs instead of names
* Fixing tests
* fmt after rebase
* Make tests green again!
* Use GetAppByID wherever it is necessary
- adding new v2 endpoints to keep hybrid api/runner mode working
- extract CallBase from Call object to expose that to a user
(it doesn't include any app reference, as we do for all other API objects)
* Get rid of GetAppByName
* Adjusting server router setup
* Make hybrid work again
* Fix datastore tests
* Fixing tests
* Do not ignore app_id
* Resolve issues after rebase
* Updating test to make it work as it was
* Tabula rasa for migrations
* Adding calls API test
- we need to ensure we give "App not found" for the missing app and missing call in first place
- making previous test work (request missing call for the existing app)
* Make datastore tests work fine with correctly applied migrations
* Make CallFunction middleware work again
had to adjust its implementation to set app ID before proceeding
* The biggest rebase ever made
* Fix 8's migration
* Fix tests
* Fix hybrid client
* Fix tests problem
* Increment app ID migration version
* Fixing TestAppUpdate
* Fix rebase issues
* Addressing review comments
* Renew vendor
* Updated swagger doc per recommendations
while escape analysis didn't lie that the bytes underlying this string escaped
to the heap, the reference to them died and led to us getting an undefined
byte array underlying the string.
sadly, this makes 4 allocs here (still down from 31), but only adds 100ns per
op. I still don't get why 'buf' and 'byts' escape to the heap, blaming faulty
escape analysis code.
this one is kind of impossible to write a test for.
found this from doing benchmarking stuff and was getting weird behavior at the
end of runs where calls didn't find a slot, ran bisect on a known-good commit
from a couple weeks ago and found that it was this. voila. this could explain
the variance from the slack dude's benchmarks, too. anyway, confirmed that
this fixes the issue.
this somewhat minimally comes up in profiling, but it was an itch i needed to
scratch. this does 10x less allocations and is 3x faster (with 3x less bytes),
and they're the small painful kind of allocation. we're only reading these
strings so the uses of unsafe are fine (I think audit me). the byte array
we're casting to a string at the end is also heap allocated and does
escape. I only count 2 allocations, but there's 3 (`hash.Sum` and
`make([]string)`), using a pool of sha1 hash.Hash shaves 120 byte and an alloc
off so seems worth it (it's minimal). if we set a max size of config vals with
a constant we could avoid that allocation and we could probably find a
checksum package that doesn't use the `hash.Hash` that would speed things up a
little (no dynamic dispatch, doesn't allocate in Sum) but there's not one I
know of in stdlib.
master:
```
✗: go test -run=yodawg -bench . -benchmem -benchtime 1s -cpuprofile cpu.out
goos: linux
goarch: amd64
pkg: github.com/fnproject/fn/api/agent
BenchmarkSlotKey 200000 6068 ns/op 696 B/op 31 allocs/op
PASS
ok github.com/fnproject/fn/api/agent 1.454s
```
now:
```
✗: go test -run=yodawg -bench . -benchmem -benchtime 1s -cpuprofile cpu.out
goos: linux
goarch: amd64
pkg: github.com/fnproject/fn/api/agent
BenchmarkSlotKey 1000000 1901 ns/op 168 B/op 3 allocs/op
PASS
ok github.com/fnproject/fn/api/agent 2.092s
```
once we have versioned apps/routes we don't need to build a sha or sort
configs so this will get a lot faster.
anyway, mostly funsies here... my life is that sad now.
* fn: hot container timer improvements
With this change, now we are allocating the timers
when the container starts and managing them via
stop/clear as needed, which should not only be more
efficient, but also easier to follow.
For example, previously, if eject time out was
set to 10 secs, this could have delayed idle timeout
up to 10 secs as well. It is also not necessary to do
any math for elapsed time.
Now consumers avoid any requeuing when startDequeuer() is cancelled.
This was triggering additional dequeue/requeue causing
containers to wake up spuriously. Also in startDequeuer(),
we no longer remove the item from the actual queue and
leave this to acquire/eject, which side steps issues related
with item landing in the channel, not consumed, etc.
*) Stopped using latency previous/current stats, this
was not working as expected. Fresh starts usually have
these stats zero for a long time, and initial samples
are high due to downloads, caches, etc.
*) New state to track: containers that are idle. In other
words, containers that have an unused token in the slot
queue.
*) Removed latency counts since these are not used in
container start decision anymore. Simplifies logs.
*) isNewContainerNeeded() simplified to use idle count
to estimate effective waiters. Removed speculative
latency based logic and progress check comparison.
In agent, waitHot() delayed signalling compansates
for these changes. If the estimation may fail, but
this should correct itself in the next 200 msec
signal.
* fn: resource and slot cancel and broadcast improvements
*) Context argument does not wake up the waiters correctly upon
cancellation/timeout.
*) Avoid unnecessary broadcasts in slot and resource.
* fn: limit scope of context in resource/slot calls in agent
* fn: cancellations in WaitAsyncResource
Added go context with cancel to wait async resource. Although
today, the only case for cancellation is shutdown, this cleans
up agent shutdown a little bit.
* fn: locked broadcast to avoid missed wake-ups
* fn: removed ctx arg to WaitAsyncResource and startDequeuer
This is confusing and unnecessary.
*) revert executor wait queue size comparison. This is too
aggresive and with stall check below, now unnecessary.
*) new container logic now checks if stats are constant, if
this is the case, then we assume the system is stalled (eg
running functions that take long time), this means we need
to make progress and spin up a new container.
Latency stats are not always read-time updated and
if calls are stuck in waiting state, isNewContainerNeeded()
needs to be a bit more aggresive if the wait queue grows.
possible breakages:
* `FN_HEADER` on cold are no longer `s/-/_/` -- this is so that cold functions
can rebuild the headers as they were when they came in on the request (fdks,
specifically), there's no guarantee that a reversal `s/_/-/` is the original
header on the request.
* app and route config no longer `s/-/_/` -- it seemed really weird to rewrite
the users config vars on these. should just pass them exactly as is to env.
* headers no longer contain the environment vars (previously, base config; app
config, route config, `FN_PATH`, etc.), these are still available in the
environment.
this gets rid of a lot of the code around headers, specifically the stuff that
shoved everything into headers when constructing a call to begin with. now we
just store the headers separately and add a few things, like FN_CALL_ID to
them, and build a separate 'config' now to store on the call. I thought
'config' was more aptly named, 'env' was confusing, though now 'config' is
exactly what 'base_vars' was, which is only the things being put into the env.
we weren't storing this field in the db, this doesn't break unless there are
messages in a queue from another version, anyway, don't think we're there and
don't expect any breakage for anybody with field name changes.
this makes the configuration stuff pretty straight forward, there's just two
separate buckets of things, and cold just needs to mash them together into the
env, and otherwise hot containers just need to put 'config' in the env, and then
hot format can shove 'headers' in however they'd like. this seems better than
my last idea about making this easier but worse (RIP).
this means:
* headers no longer contain all vars, the set of base vars can only be found
in the environment.
* headers is only the headers from request + call_id, deadline, method, url
* for cold, we simply add the headers to the environment, prepending
`FN_HEADER_` to them, BUT NOT upper casing or `s/-/_/`
* fixes issue where async hot functions would end up with `Fn_header_`
prefixed headers
* removes idea of 'base' vars and 'env'. this was a strange concept. now we just have
'config' which was base vars, and headers, which was base_env+headers; i.e.
they are disjoint now.
* casing for all headers will lean to be `My-Header` style, which should help
with consistency. notable exceptions for cold only are FN_CALL_ID, FN_METHOD,
and FN_REQUEST_URL -- this is simply to avoid breakage, in either hot format
they appear as `Fn_call_id` still.
* removes FN_PARAM stuff
* updated doc with behavior
weird things left:
`Fn_call_id` e.g. isn't a correctly formatted http header, it should likely be
`Fn-Call-Id` but I wanted to live to fight another day on this one, it would
add some breakage.
examples to be posted of each format below
closes#329
*) during shutdown, errors should be 503
*) new inactivity time out for hot queue, we previously kept hot queues in memory forever.
*) each hot queue now has a hot launcher to monitor and launch hot containers
*) consumers now create a consumer channel with startDequeuer() that can be cancelled via context
*) consumers now ping (signal) hot launcher every 200 msecs until they get a slot
*) tests for slot queue & mgr
* fn: remove 100 msec sleep for hot containers
*) moved slot management to its own file
*) slots are now implemented with LIFO semantics, this is important since we do
not want to round robin hot containers. Idle hot containers should timeout properly.
*) each slot queue now stores a few basic stats such as avg time a call spent in a given
state and number of running/launching containers, number of waiting calls in those states.
*) first metrics in these basic stats are discarded to avoid initial docker pull/start spikes.
*) agent now records/updates slot queue state and how much time a call stayed in that state.
*) waitHotSlot() replaces the previous wait 100 msec logic where it sends a msg to
hot slot go routine launchHot() and waits for a slot
*) launchHot() is now a go routine for tracking containers in hot slots, it determines
if a new containers is needed based on slot queue stats.