Commit Graph

68 Commits

Author SHA1 Message Date
Tolga Ceylan
fe2b9fb53d fn: cookie and driver api changes (#1312)
Now obsoleted driver.PrepareCookie() call handled image and
container creation. In agent, going forward we will need finer
grained control over the timeouts implied by the contexts.
For this reason, with this change, we split PrepareCookie()
into Validate/Pull/Create calls under Cookie interface.
2018-11-14 16:51:05 -08:00
Tolga Ceylan
8ee4c1098b fn: correct typo in docker command tag (#1311) 2018-11-14 11:38:48 -08:00
Eric Fode
90e39c8fd3 initial addition of the diskfree op (#1308)
* initial addition of the diskfree op

fixing up some typos

last of fmt errors

* fixed up some feedbacks
2018-11-14 09:22:07 -08:00
Tolga Ceylan
25afb2f478 fn: remove tini option & env variable (#1301) 2018-11-07 12:35:19 -08:00
Tolga Ceylan
975b780695 fn: tests for hung and bad docker repo during docker-pull (#1298)
* fn: tests for hung and bad docker repo during docker-pull
2018-11-05 16:01:42 -08:00
Tolga Ceylan
de9c2cbb63 fn: cleanup of docker timeouts and docker health check (#1292)
Moving the timeout management of various docker operations
to agent. This allows for finer control over what operation
should use. For instance, for pause/unpause our tolerance
is very low to avoid resource issues. For docker remove,
the consequences of failure will lead to potential agent
failure and therefore we wait up to 10 minute.
For cookie create/prepare (which includes docker-pull)
we cap this at 10 minutes by default.

With new UDS/FDK contract, health check is now obsoleted
as container advertise health using UDS availibility.
2018-11-01 14:22:47 -07:00
Reed Allman
e13a6fd029 death to format (#1281)
* get rid of old format stuff, utils usage, fix up for fdk2.0 interface

* pure agent format removal, TODO remove format field, fix up all tests

* shitter's clogged

* fix agent tests

* start rolling through server tests

* tests compile, some failures

* remove json / content type detection on invoke/httptrigger, fix up tests

* remove hello, fixup system tests

the fucking status checker test just hangs and it's testing that it doesn't
work so the test passes but the test doesn't pass fuck life it's not worth it

* fix migration

* meh

* make dbhelper shut up about dbhelpers not being used

* move fail status at least into main thread, jfc

* fix status call to have FN_LISTENER

also turns off the stdout/stderr blocking between calls, because it's
impossible to debug without that (without syslog), now that stdout and stderr
go to the same place (either to host stderr or nowhere) and isn't used for
function output this shouldn't be a big fuss really

* remove stdin

* cleanup/remind: fixed bug where watcher would leak if container dies first

* silence system-test logs until fail, fix datastore tests

postgres does weird things with constraints when renaming tables, took the
easy way out

system-tests were loud as fuck and made you download a circleci text file of
the logs, made them only yell when they goof

* fix fdk-go dep for test image. fun

* fix swagger and remove test about format

* update all the gopkg files

* add back FN_FORMAT for fdks that assert things. pfft

* add useful error for functions that exit

this error is really confounding because containers can exit for all manner of
reason, we're just guessing that this is the most likely cause for now, and
this error message should very likely change or be removed from the client
path anyway (context.Canceled wasn't all that useful either, but anyway, I'd
been hunting for this... so found it). added a test to avoid being publicly
shamed for 1 line commits (beware...).
2018-10-26 10:43:04 -07:00
Tolga Ceylan
29dcf0a791 fn: adding docker events to stats (#1262)
Streaming docker events is useful as we can record/capture some
asynchronous containers events such as out-of-memory. For now,
we record these in opencensus/prometheus stats.
2018-10-04 18:54:09 -07:00
Vijay Krishnan
b2f85b70ea Use registry auth token from Call extensions to pull images (#1228) 2018-09-20 13:57:41 -07:00
Reed Allman
3a82790d99 clean up hardcoded lsnr.sock refs, move iofs to /tmp (#1221)
* clean up hardcoded lsnr.sock refs

because what drivers.ContainerTask needs is another method, and we all know it

atoning for my sins the first time around. and yes, i refuse to use a cross
package exported constant (just think of the dep graphs)

* fix tests
2018-09-18 08:12:44 -07:00
Richard Connon
493790dbd2 Add tmpfs IOFS (#1212)
* Define an interface for IOFS handling. Add no-op and temporary directory implementations.

* Move IOFS stuff out into separate file, add basic tmpfs implementation for linux only

* Switch between directory and tmpfs based on platform and config

* Respect FN_IOFS_OPTS

* Make directory iofs default on all platforms

* At least try to clean up a bit on failure

* Add backout if IOFS creation fails

* Add comment about iofs.Close
2018-09-17 11:50:43 -07:00
Tom Coupland
d56a49b321 Remove V1 endpoints and Routes (#1210)
Largely a removal job, however many tests, particularly system level
ones relied on Routes. These have been migrated to use Fns.

* Add 410 response to swagger
* No app names in log tags
* Adding constraint in GetCall for FnID
* Adding test to check FnID is required on call
* Add fn_id to call selector
* Fix text in docker mem warning
* Correct buildConfig func name
* Test fix up
* Removing CPU setting from Agent test

CPU setting has been deprecated, but the code base is still riddled
with it. This just removes it from this layer. Really we need to
remove it from Call.

* Remove fn id check on calls
* Reintroduce fn id required on call
* Adding fnID to calls for execute test
* Correct setting of app id in middleware
* Removes root middlewares ability to redirect fun invocations
* Add over sized test check
* Removing call fn id check
2018-09-17 16:44:51 +01:00
Owen Cliffe
6567f6e8ef support configuration-based relative dirs (host and agent) for iofs (#1213)
* support configuration-based relative dirs (host and agent) for iofs mounts
* Send UDS requests as POST to <UDS>/call
2018-09-17 11:59:16 +01:00
Reed Allman
3a9c48b8a3 http-stream format (#1202)
* POC code for inotify UDS-io-socket

* http-stream format

introducing the `http-stream` format support in fn. there are many details for
this, none of which can be linked from github :( -- docs are coming (I could
even try to add some here?). this is kinda MVP-ish level, but does not
implement the remaining spec, ie 'headers' fixing up / invoke fixing up. the
thinking being we can land this to test fdks / cli with and start splitting
work up on top of this. all other formats work the same as previous (no
breakage, only new stuff)

with the cli you can set `format: http-stream` and deploy, and then invoke a
function via the `http-stream` format. this uses unix domain socket (uds) on
the container instead of previous stdin/stdout, and fdks will have to support
this in a new fashion (will see about getting docs on here). fdk-go works,
which is here: https://github.com/fnproject/fdk-go/pull/30 . the output looks
the same as an http format function when invoking a function. wahoo.

there's some amount of stuff we can clean up here, enumerated:

* the cleanup of the sock files is iffy, high pri here

* permissions are a pain in the ass and i punted on dealing with them. you can
run `sudo ./fnserver` if running locally, it may/may not work in dind(?) ootb

* no pipe usage at all (yay), still could reduce buffer usage around the pipe
behavior, we could clean this up potentially before removal (and tests)

* my brain can’t figure out if dispatchOldFormats changes pipe behavior, but
tests work

* i marked XXX to do some clean up which will follow soon… need this to test fdk
tho so meh, any thoughts on those marked would be appreciated however (1 less
decision for me). mostly happy w/ general shape/plumbing tho

* there are no tests atm, this is a tricky dance indeed. attempts were made.
need to futz with the permission stuff before committing to adding any tests
here, which I don't like either. also, need to get the fdk-go based test image
updated according to the fdk-go, and there's a dance there too. rumba time..

* delaying the big big cleanup until we have good enough fdk support to kill
all the other formats.

open to ideas on how to maneuver landing stuff...

* fix unmount

* see if the tests work on ci...

* add call id header

* fix up makefile

* add configurable iofs opts

* add format file describing http-stream contract

* rm some cruft

* default iofs to /tmp, remove mounting

out of the box fn we can't mount. /tmp will provide a memory backed fs for us
on most systems, this will be fine for local developing and this can be
configured to be wherever for anyone that wants to make things more difficult
for themselves.

also removes the mounting, this has to be done as root. we can't do this in
the oss fn (short of requesting root, but no). in the future, we may want to
have a knob here to have a function that can be configured in fn that allows
further configuration here. since we don't know what we need in this dept
really, not doing that yet (it may be the case that it could be done
operationally outside of fn, eg, but not if each directory needs to be
configured itself, which seems likely, anyway...)

* add WIP note just in case...
2018-09-14 10:59:12 +01:00
Tolga Ceylan
aabbe0fba5 fn: check context timeout when waiting for non-blocking attach (#1201)
* fn: check context timeout when waiting for non-blocking attach

With this change, we no longer allow docker client AttachToContainerNonBlocking
to block on Success channel more than our context deadline/timeout.

* fn: move nbio chan handling in attach to docker from docker-client
2018-09-12 13:01:51 -07:00
Tolga Ceylan
bb8436c3ee fn: docker driver stats/metrics for prometheus (#1197)
* fn: docker driver stats/metrics for prometheus
2018-09-10 13:35:50 -07:00
Reed Allman
7638b31e11 use tini to run every container (#1195)
fixes #1101

additional context:

* this was introduced in docker 1.13 (1/2017), we require docker 17.10
(10/2017), this should not have any issues dependency-wise, as `docker-init`
is in the docker install from that point in time. unless explicitly removed,
it should be in the dind container we use as well...
* the PR that introduced this to docker is
https://github.com/moby/moby/pull/26061 for additional context
* it may be wise to put this through some paces, if anybody has any...
interesting... function containers. the tests seem to work fine, however, and
this shouldn't be something users have to think about (?) at all, just
something that we are doing. this isn't the default in docker for
compatibility reasons, which is maybe a yellow flag but I am not sure tbh
2018-09-04 15:41:30 -07:00
Tolga Ceylan
ad011fde7f fn: introducing docker-syslog driver as default logger (#1189)
* fn: introducing docker-syslog driver as default logger

With this change, fn-agent prefers RFC2454 docker-syslog driver
for logging stdout/stderr from containers. The advantage
of this is to offload it to docker itself instead of
streaming stderr along with stdout, which gets multiplexed
through single connection via docker-API.

The change will need support from FDKs in order to log
correct call-id and supress '\n' that splits syslog lines.
2018-08-29 13:08:02 -07:00
Reed Allman
292f673747 Go1.11 (#1188)
* update circleci to go1.11

* update opencensus dep to build with go1.11

* fix up for new gofmt rules
2018-08-27 10:55:52 -07:00
Reed Allman
9cac4c8eea update fsouza to v1.2.0 (#1186)
* update fsouza to v1.2.0

* unwind timeouts on docker

previously, we were setting our own transport on the docker client, but this
does not work anymore as fsouza now needs to call this:
https://github.com/fsouza/go-dockerclient/blob/master/client_unix.go
which makes a platform dependent client. fsouza now also appears to make a
transport that modifies the default http client with some saner values for
things like max idle conns per host (they get reaped if idle 90s):
https://github.com/fsouza/go-dockerclient/blob/master/client.go#L1059
-- these settings are sane and were why we were doing this to begin with.

additionally, have removed our setting of timeout on the docker client for 2
minutes. this is a leftover relic of a bygone era from a time when we relied
on these timeouts to timeout higher level things, which now we're properly
timing out in the enclosing methods. so, they gone, this makes the docker
client a little less whacky now.
2018-08-24 11:36:02 -07:00
Tolga Ceylan
0105f8321e fn: stats view/distribution improvements (#1154)
* fn: stats view/distribution improvements

*) View latency distribution is now an argument
in view creation functions. This allows easier
override to set custom buckets. It is simplistic
and assumes all latency views would use the same
set, but in practice this is already the case.
*) Removed API view creation to main, this should not
be enabled for all node types. This is consistent with
the rest of the system.

* fn: Docker samples of cpu/mem/disk with specific buckets
2018-08-03 11:06:54 -07:00
Tolga Ceylan
2706323cec fn: tests for private repo auth and rename DOCKER_AUTH (#1134)
Renamed DOCKER_AUTH with FN_ prefix to clarify the purpose. Docker
does not use this variable.

New tests to clarify the repo/auth-config behavior.
2018-07-24 15:19:59 -07:00
Tolga Ceylan
cf37a21fab fn: cleanup of docker private registry code (#1130)
* fn: cleanup of docker private registry code

Start using URL parsed ServerAddress and its subdomains
for easier image ensure/pull in docker driver. Previous
code to lookup substrings was faulty without proper
URL parse and hostname tokenization. When searching
for a registry config, if image name does not contain
a registry and if there's a private registry configured,
then search for hub.docker.com and index.docker.io. This
is similar to previous code but with correct subdomain
matching.

* fn-dataplane: take port into account in auth configs
2018-07-24 02:15:25 +01:00
Tolga Ceylan
5dc5740a54 fn: runner status and docker load images (#1116)
* fn: runner status and docker load images

Introducing a function run for pure runner Status
calls. Previously, Status gRPC calls returned active
inflight request counts with the purpose of a simple
health checker. However this is not sufficient since
it does not show if agent or docker is healthy. With
this change, if pure runner is configured with a status
image, that image is executed through docker. The
call uses zero memory/cpu/tmpsize settings to ensure
resource tracker does not block it.

However, operators might not always have a docker
repository accessible/available for status image. Or
operators might not want the status to go over the
network. To allow such cases, and in general possibly
caching docker images, added a new environment variable
FN_DOCKER_LOAD_FILE. If this is set, fn-agent during
startup will load these images that were previously
saved with 'docker save' into docker.
2018-07-12 13:58:38 -07:00
Owen Cliffe
fff95e7992 Clean up/make consistent the APIs for registering core components, make Docker an optional component at compile time (#1111) 2018-07-07 10:37:19 +01:00
Reed Allman
51ff7caeb2 Bye bye openapi (#1081)
* add DateTime sans mgo

* change all uses of strfmt.DateTime to common.DateTime, remove test strfmt usage

* remove api tests, system-test dep on api test

multiple reasons to remove the api tests:

* awkward dependency with fn_go meant generating bindings on a branched fn to
vendor those to test new stuff. this is at a minimum not at all intuitive,
worth it, nor a fun way to spend the finite amount of time we have to live.
* api tests only tested a subset of functionality that the server/ api tests
already test, and we risk having tests where one tests some thing and the
other doesn't. let's not. we have too many test suites as it is, and these
pretty much only test that we updated the fn_go bindings, which is actually a
hassle as noted above and the cli will pretty quickly figure out anyway.
* fn_go relies on openapi, which relies on mgo, which is deprecated and we'd
like to remove as a dependency. openapi is a _huge_ dep built in a NIH
fashion, that cannot simply remove the mgo dep as users may be using it.
we've now stolen their date time and otherwise killed usage of it in fn core,
for fn_go it still exists but that's less of a problem.

* update deps

removals:

* easyjson
* mgo
* go-openapi
* mapstructure
* fn_go
* purell
* go-validator

also, had to lock docker. we shouldn't use docker on master anyway, they
strongly advise against that. had no luck with latest version rev, so i locked
it to what we were using before. until next time.

the rest is just playing dep roulette, those end up removing a ton tho

* fix exec test to work

* account for john le cache
2018-06-21 11:09:16 -07:00
Tolga Ceylan
e67d0e5f3f fn: Call extensions/overriding and more customization friendly docker driver (#1065)
In pure-runner and LB agent, service providers might want to set specific driver options.

For example, to add cpu-shares to functions, LB can add the information as extensions
to the Call and pass this via gRPC to runners. Runners then pick these extensions from
gRPC call and pass it to driver. Using a custom driver implementation, pure-runners can
process these extensions to modify docker.CreateContainerOptions.

To achieve this, LB agents can now be configured using a call overrider.

Pure-runners can be configured using a custom docker driver.

RunnerCall and Call interfaces both expose call extensions.

An example to demonstrate this is implemented in test/fn-system-tests/system_test.go
which registers a call overrider for LB agent as well as a simple custom docker driver.
In this example, LB agent adds a key-value to extensions and runners add this key-value
as an environment variable to the container.
2018-06-18 14:42:28 -07:00
Peter Jausovec
bd5150f1ac Extract register view functionality (#1056)
* WIP

* Create separate Register*Views functions that are called from main.
2018-06-12 17:24:21 +01:00
Owen Cliffe
1ad27f4f0d Inverting deps on SQL, Log and MQ plugins to make them optional dependencies of extended servers, Removing some dead code that brought in unused dependencies Filtering out some non-linux transitive deps. (#1057)
* initial Db helper split - make SQL and datastore packages optional

* abstracting log store

* break out DB, MQ and log drivers as extensions

* cleanup

* fewer deps

* fixing docker test

* hmm dbness

* updating db startup

* Consolidate all your extensions into one convenient package

* cleanup

* clean up dep constraints
2018-06-11 18:23:28 +01:00
Tolga Ceylan
f97b63f878 fn: fixup temp dir read/write permissions if tmp fs size is not set. (#1024)
When TmpFsSize is not set in a route, docker fails to create a /tmp
mount that is writable. Forcing docker to explicitly to this if
read-only root directory is enabled (default).
2018-06-01 10:49:07 -07:00
Tolga Ceylan
9584643142 fn: size restricted tmpfs /tmp and read-only / support (#1012)
* fn: size restricted tmpfs /tmp and read-only / support

*) read-only Root Fs Support
*) removed CPUShares from docker API. This was unused.
*) docker.Prepare() refactoring
*) added docker.configureTmpFs() for size limited tmpfs on /tmp
*) tmpfs size support in routes and resource tracker
*) fix fn-test-utils to handle sparse files better in create file

* test typo fix
2018-05-25 14:12:29 -07:00
Tolga Ceylan
8e440c835e fn: fixup undeterministic test (#986) 2018-05-10 08:08:10 -07:00
Tolga Ceylan
0f50537150 fn: allow specified docker networks in functions (#982)
* fn: allow specified docker networks in functions

If FN_DOCKER_NETWORK is specified with a list of
networks, then agent driver picks the least used
network to place functions on.

* add mutex comment
2018-05-09 12:24:15 -07:00
jan grant
91e58afa55 The opencensus API changes between 0.6.0 and 0.9.0 (#980)
We get some useful features in later versions; update so as to not
pin downstream consumers (extensions) to an older version.
2018-05-09 14:55:00 +01:00
Tolga Ceylan
584e4e75eb Experimental Pre-fork Pool: Recycle net ns (#890)
* fn: experimental prefork recycle and other improvements

*) Recycle and do not use same pool container again option.
*) Two state processing: initializing versus ready (start-kill).
*) Ready state is exempt from rate limiter.

* fn: experimental prefork pool multiple network support

In order to exceed 1023 container (bridge port) limit, add
multiple networks:

    for i in fn-net1 fn-net2 fn-net3 fn-net4
    do
            docker network create $i
    done

to Docker startup, (eg. dind preentry.sh), then provide this
to prefork pool using:

    export FN_EXPERIMENTAL_PREFORK_NETWORKS="fn-net1 fn-net2 fn-net3 fn-net4"

which should be able to spawn 1023 * 4 containers.

* fn: fixup tests for cfg move

* fn: add ipc and pid namespaces into prefork pooling

* fn: revert ipc and pid namespaces for now

Pid/Ipc opens up the function container to pause container.
2018-04-05 15:07:30 -07:00
Tolga Ceylan
369f2ea17c fn: experimental prefork tests should skip non Linux OSs (#904) 2018-03-28 14:40:51 -07:00
Justin Ko
9cb883ca68 Godoc fixes (#898)
Add some godoc comments for the api/agent package and some of its
subpackages.
2018-03-28 10:16:40 -07:00
Reed Allman
8af605cf3d update thrift, opencensus, others (#893)
* update thrift, opencensus, others

* stats: update to opencensus 0.6.0 view api
2018-03-26 15:43:49 -07:00
Tolga Ceylan
0addcb8911 fn: pre-fork pool for namespace/network speedup (#874)
* fn: pre-fork pool experimental implementation
2018-03-23 16:35:35 -07:00
Tolga Ceylan
cb61a678d9 fn: add storage opt size support (#860)
Added env FN_MAX_FS_SIZE_MB, which if defined and non-zero
is passed to docker as storage opt size. We do not validate
if this option is supported by docker currently. This is
because it's difficult to actually validate this since it
not only depends on storage driver and its backing filesystem,
but also the mount options used to mount that fs.
2018-03-14 15:47:34 -07:00
Tolga Ceylan
7177bf3923 fn: enable failing test back (#826)
* fn: enable failing test back

* fn: fortifying the stderr output

Modified limitWriter to discard excess data instead
of returning error, this is to allow stderr/stdout
pipes flowing to avoid head-of-line blocking or
data corruption in container stdout/stderr output stream.
2018-03-09 09:57:28 -08:00
Tolga Ceylan
0ef0118150 fn: wait for async attach with success channel (#810)
* fn: wait for async attach with success channel

* fn: debug logs in test.sh

* fn: circleci test output as artifact

* fn: docker attach non-blocking adjustments

* fn: remove retry from risky NB attach
2018-03-08 15:46:32 -08:00
Tolga Ceylan
7677aad450 fn: I/O related improvements (#809)
*) I/O protocol parse issues should shutdown the container as the container
goes to inconsistent state between calls. (eg. next call may receive previous
calls left overs.)
*) Move ghost read/write code into io_utils in common.
*) Clean unused error from docker Wait()
*) We can catch one case in JSON, if there's remaining unparsed data in
decoder buffer, we can shut the container
*) stdout/stderr when container is not handling a request are now blocked if freezer is also enabled.
*) if a fatal err is set for slot, we do not requeue it and proceed to shutdown
*) added a test function for a few cases with freezer strict behavior
2018-03-07 15:09:24 -08:00
Reed Allman
206aa3c203 opentracing -> opencensus (#802)
* update vendor directory, add go.opencensus.io

* update imports

* oops

* s/opentracing/opencensus/ & remove prometheus / zipkin stuff & remove old stats

* the dep train rides again

* fix gin build

* deps from last guy

* start in on the agent metrics

* she builds

* remove tags for now, cardinality error is fussing. subscribe instead of register

* update to patched version of opencensus to proceed for now TODO switch to a release

* meh

fix imports

* println debug the bad boys

* lace it with the tags

* update deps again

* fix all inconsistent cardinality errors

* add our own logger

* fix init

* fix oom measure

* remove bugged removal code

* fix s3 measures

* fix prom handler nil
2018-03-05 09:35:28 -08:00
Tolga Ceylan
a83f2cfbe8 fn: favor fn-test-utils over hello (to be decommissioned) (#761) 2018-02-28 17:44:13 -08:00
Tolga Ceylan
46fad7ef80 fn: plumb up I/O errors from docker wait (#798)
Reed Allman <rdallman10@gmail.com>'s I/O error fix.
2018-02-27 12:17:02 -08:00
Tolga Ceylan
8b65ae8f9a fn: add docker command info to retry when logging errors (#795) 2018-02-27 01:10:07 -08:00
Reed Allman
1a1250e5ea disable fail whale logs (#768)
we have been getting these from attach all this time and never needed these
anyway.

I ran cpu profiles of dockerd and this was 90% of docker cpu usage (json
logs). woot. this will reduce i/o quite a bit, and we don't have to worry
about them taking up any disk space either.

from tests i get about 50% speedup with these off. the hunt continues...
2018-02-13 17:45:11 -08:00
Reed Allman
f287ad274e support deeper / nesting of image names (#765)
closes #764
2018-02-13 11:26:28 -08:00
Reed Allman
cbfd659e7e cap docker retries to fixed number (#762)
previously we would retry infinitely up to the context with some backoff in
between. for hot functions, since we don't set any dead line on pulling or
creating the image, this means it would retry forever without making any
progress if e.g. the registry is inaccessable or any other temporary error
that isn't actually temporary.  this adds a hard cap of 10 retries, which
gives approximately 13s if the ops take no time, still respecting the context
deadline enclosed.

the case where this was coming up is now tested for and was otherwise
confusing for users to debug, now it spits out an ECONNREFUSED with the
address of the registry, which should help users debug without having to poke
around fn logs (though I don't like this as an excuse, not all users will be
operators at some point in the near future, and this one makes sense)

closes #727
2018-02-12 18:45:30 -08:00