fn-serverless

mirror of https://github.com/fnproject/fn.git synced 2022-10-28 21:29:17 +03:00

Author	SHA1	Message	Date
Tolga Ceylan	29dcf0a791	fn: adding docker events to stats (#1262 ) Streaming docker events is useful as we can record/capture some asynchronous containers events such as out-of-memory. For now, we record these in opencensus/prometheus stats.	2018-10-04 18:54:09 -07:00
Vijay Krishnan	b2f85b70ea	Use registry auth token from Call extensions to pull images (#1228 )	2018-09-20 13:57:41 -07:00
Reed Allman	3a82790d99	clean up hardcoded lsnr.sock refs, move iofs to /tmp (#1221 ) * clean up hardcoded lsnr.sock refs because what drivers.ContainerTask needs is another method, and we all know it atoning for my sins the first time around. and yes, i refuse to use a cross package exported constant (just think of the dep graphs) * fix tests	2018-09-18 08:12:44 -07:00
Richard Connon	493790dbd2	Add tmpfs IOFS (#1212 ) * Define an interface for IOFS handling. Add no-op and temporary directory implementations. * Move IOFS stuff out into separate file, add basic tmpfs implementation for linux only * Switch between directory and tmpfs based on platform and config * Respect FN_IOFS_OPTS * Make directory iofs default on all platforms * At least try to clean up a bit on failure * Add backout if IOFS creation fails * Add comment about iofs.Close	2018-09-17 11:50:43 -07:00
Tom Coupland	d56a49b321	Remove V1 endpoints and Routes (#1210 ) Largely a removal job, however many tests, particularly system level ones relied on Routes. These have been migrated to use Fns. * Add 410 response to swagger * No app names in log tags * Adding constraint in GetCall for FnID * Adding test to check FnID is required on call * Add fn_id to call selector * Fix text in docker mem warning * Correct buildConfig func name * Test fix up * Removing CPU setting from Agent test CPU setting has been deprecated, but the code base is still riddled with it. This just removes it from this layer. Really we need to remove it from Call. * Remove fn id check on calls * Reintroduce fn id required on call * Adding fnID to calls for execute test * Correct setting of app id in middleware * Removes root middlewares ability to redirect fun invocations * Add over sized test check * Removing call fn id check	2018-09-17 16:44:51 +01:00
Owen Cliffe	6567f6e8ef	support configuration-based relative dirs (host and agent) for iofs (#1213 ) * support configuration-based relative dirs (host and agent) for iofs mounts * Send UDS requests as POST to <UDS>/call	2018-09-17 11:59:16 +01:00
Reed Allman	3a9c48b8a3	http-stream format (#1202 ) * POC code for inotify UDS-io-socket * http-stream format introducing the `http-stream` format support in fn. there are many details for this, none of which can be linked from github :( -- docs are coming (I could even try to add some here?). this is kinda MVP-ish level, but does not implement the remaining spec, ie 'headers' fixing up / invoke fixing up. the thinking being we can land this to test fdks / cli with and start splitting work up on top of this. all other formats work the same as previous (no breakage, only new stuff) with the cli you can set `format: http-stream` and deploy, and then invoke a function via the `http-stream` format. this uses unix domain socket (uds) on the container instead of previous stdin/stdout, and fdks will have to support this in a new fashion (will see about getting docs on here). fdk-go works, which is here: https://github.com/fnproject/fdk-go/pull/30 . the output looks the same as an http format function when invoking a function. wahoo. there's some amount of stuff we can clean up here, enumerated: * the cleanup of the sock files is iffy, high pri here * permissions are a pain in the ass and i punted on dealing with them. you can run `sudo ./fnserver` if running locally, it may/may not work in dind(?) ootb * no pipe usage at all (yay), still could reduce buffer usage around the pipe behavior, we could clean this up potentially before removal (and tests) * my brain can’t figure out if dispatchOldFormats changes pipe behavior, but tests work * i marked XXX to do some clean up which will follow soon… need this to test fdk tho so meh, any thoughts on those marked would be appreciated however (1 less decision for me). mostly happy w/ general shape/plumbing tho * there are no tests atm, this is a tricky dance indeed. attempts were made. need to futz with the permission stuff before committing to adding any tests here, which I don't like either. also, need to get the fdk-go based test image updated according to the fdk-go, and there's a dance there too. rumba time.. * delaying the big big cleanup until we have good enough fdk support to kill all the other formats. open to ideas on how to maneuver landing stuff... * fix unmount * see if the tests work on ci... * add call id header * fix up makefile * add configurable iofs opts * add format file describing http-stream contract * rm some cruft * default iofs to /tmp, remove mounting out of the box fn we can't mount. /tmp will provide a memory backed fs for us on most systems, this will be fine for local developing and this can be configured to be wherever for anyone that wants to make things more difficult for themselves. also removes the mounting, this has to be done as root. we can't do this in the oss fn (short of requesting root, but no). in the future, we may want to have a knob here to have a function that can be configured in fn that allows further configuration here. since we don't know what we need in this dept really, not doing that yet (it may be the case that it could be done operationally outside of fn, eg, but not if each directory needs to be configured itself, which seems likely, anyway...) * add WIP note just in case...	2018-09-14 10:59:12 +01:00
Tolga Ceylan	aabbe0fba5	fn: check context timeout when waiting for non-blocking attach (#1201 ) * fn: check context timeout when waiting for non-blocking attach With this change, we no longer allow docker client AttachToContainerNonBlocking to block on Success channel more than our context deadline/timeout. * fn: move nbio chan handling in attach to docker from docker-client	2018-09-12 13:01:51 -07:00
Tolga Ceylan	bb8436c3ee	fn: docker driver stats/metrics for prometheus (#1197 ) * fn: docker driver stats/metrics for prometheus	2018-09-10 13:35:50 -07:00
Reed Allman	7638b31e11	use tini to run every container (#1195 ) fixes #1101 additional context: * this was introduced in docker 1.13 (1/2017), we require docker 17.10 (10/2017), this should not have any issues dependency-wise, as `docker-init` is in the docker install from that point in time. unless explicitly removed, it should be in the dind container we use as well... * the PR that introduced this to docker is https://github.com/moby/moby/pull/26061 for additional context * it may be wise to put this through some paces, if anybody has any... interesting... function containers. the tests seem to work fine, however, and this shouldn't be something users have to think about (?) at all, just something that we are doing. this isn't the default in docker for compatibility reasons, which is maybe a yellow flag but I am not sure tbh	2018-09-04 15:41:30 -07:00
Tolga Ceylan	ad011fde7f	fn: introducing docker-syslog driver as default logger (#1189 ) * fn: introducing docker-syslog driver as default logger With this change, fn-agent prefers RFC2454 docker-syslog driver for logging stdout/stderr from containers. The advantage of this is to offload it to docker itself instead of streaming stderr along with stdout, which gets multiplexed through single connection via docker-API. The change will need support from FDKs in order to log correct call-id and supress '\n' that splits syslog lines.	2018-08-29 13:08:02 -07:00
Reed Allman	292f673747	Go1.11 (#1188 ) * update circleci to go1.11 * update opencensus dep to build with go1.11 * fix up for new gofmt rules	2018-08-27 10:55:52 -07:00
Reed Allman	9cac4c8eea	update fsouza to v1.2.0 (#1186 ) * update fsouza to v1.2.0 * unwind timeouts on docker previously, we were setting our own transport on the docker client, but this does not work anymore as fsouza now needs to call this: https://github.com/fsouza/go-dockerclient/blob/master/client_unix.go which makes a platform dependent client. fsouza now also appears to make a transport that modifies the default http client with some saner values for things like max idle conns per host (they get reaped if idle 90s): https://github.com/fsouza/go-dockerclient/blob/master/client.go#L1059 -- these settings are sane and were why we were doing this to begin with. additionally, have removed our setting of timeout on the docker client for 2 minutes. this is a leftover relic of a bygone era from a time when we relied on these timeouts to timeout higher level things, which now we're properly timing out in the enclosing methods. so, they gone, this makes the docker client a little less whacky now.	2018-08-24 11:36:02 -07:00
Tolga Ceylan	0105f8321e	fn: stats view/distribution improvements (#1154 ) * fn: stats view/distribution improvements ) View latency distribution is now an argument in view creation functions. This allows easier override to set custom buckets. It is simplistic and assumes all latency views would use the same set, but in practice this is already the case. ) Removed API view creation to main, this should not be enabled for all node types. This is consistent with the rest of the system. * fn: Docker samples of cpu/mem/disk with specific buckets	2018-08-03 11:06:54 -07:00
Tolga Ceylan	2706323cec	fn: tests for private repo auth and rename DOCKER_AUTH (#1134 ) Renamed DOCKER_AUTH with FN_ prefix to clarify the purpose. Docker does not use this variable. New tests to clarify the repo/auth-config behavior.	2018-07-24 15:19:59 -07:00
Tolga Ceylan	cf37a21fab	fn: cleanup of docker private registry code (#1130 ) * fn: cleanup of docker private registry code Start using URL parsed ServerAddress and its subdomains for easier image ensure/pull in docker driver. Previous code to lookup substrings was faulty without proper URL parse and hostname tokenization. When searching for a registry config, if image name does not contain a registry and if there's a private registry configured, then search for hub.docker.com and index.docker.io. This is similar to previous code but with correct subdomain matching. * fn-dataplane: take port into account in auth configs	2018-07-24 02:15:25 +01:00
Tolga Ceylan	5dc5740a54	fn: runner status and docker load images (#1116 ) * fn: runner status and docker load images Introducing a function run for pure runner Status calls. Previously, Status gRPC calls returned active inflight request counts with the purpose of a simple health checker. However this is not sufficient since it does not show if agent or docker is healthy. With this change, if pure runner is configured with a status image, that image is executed through docker. The call uses zero memory/cpu/tmpsize settings to ensure resource tracker does not block it. However, operators might not always have a docker repository accessible/available for status image. Or operators might not want the status to go over the network. To allow such cases, and in general possibly caching docker images, added a new environment variable FN_DOCKER_LOAD_FILE. If this is set, fn-agent during startup will load these images that were previously saved with 'docker save' into docker.	2018-07-12 13:58:38 -07:00
Owen Cliffe	fff95e7992	Clean up/make consistent the APIs for registering core components, make Docker an optional component at compile time (#1111 )	2018-07-07 10:37:19 +01:00
Reed Allman	51ff7caeb2	Bye bye openapi (#1081 ) * add DateTime sans mgo * change all uses of strfmt.DateTime to common.DateTime, remove test strfmt usage * remove api tests, system-test dep on api test multiple reasons to remove the api tests: * awkward dependency with fn_go meant generating bindings on a branched fn to vendor those to test new stuff. this is at a minimum not at all intuitive, worth it, nor a fun way to spend the finite amount of time we have to live. * api tests only tested a subset of functionality that the server/ api tests already test, and we risk having tests where one tests some thing and the other doesn't. let's not. we have too many test suites as it is, and these pretty much only test that we updated the fn_go bindings, which is actually a hassle as noted above and the cli will pretty quickly figure out anyway. * fn_go relies on openapi, which relies on mgo, which is deprecated and we'd like to remove as a dependency. openapi is a _huge_ dep built in a NIH fashion, that cannot simply remove the mgo dep as users may be using it. we've now stolen their date time and otherwise killed usage of it in fn core, for fn_go it still exists but that's less of a problem. * update deps removals: * easyjson * mgo * go-openapi * mapstructure * fn_go * purell * go-validator also, had to lock docker. we shouldn't use docker on master anyway, they strongly advise against that. had no luck with latest version rev, so i locked it to what we were using before. until next time. the rest is just playing dep roulette, those end up removing a ton tho * fix exec test to work * account for john le cache	2018-06-21 11:09:16 -07:00
Tolga Ceylan	e67d0e5f3f	fn: Call extensions/overriding and more customization friendly docker driver (#1065 ) In pure-runner and LB agent, service providers might want to set specific driver options. For example, to add cpu-shares to functions, LB can add the information as extensions to the Call and pass this via gRPC to runners. Runners then pick these extensions from gRPC call and pass it to driver. Using a custom driver implementation, pure-runners can process these extensions to modify docker.CreateContainerOptions. To achieve this, LB agents can now be configured using a call overrider. Pure-runners can be configured using a custom docker driver. RunnerCall and Call interfaces both expose call extensions. An example to demonstrate this is implemented in test/fn-system-tests/system_test.go which registers a call overrider for LB agent as well as a simple custom docker driver. In this example, LB agent adds a key-value to extensions and runners add this key-value as an environment variable to the container.	2018-06-18 14:42:28 -07:00
Peter Jausovec	bd5150f1ac	Extract register view functionality (#1056 ) * WIP * Create separate Register*Views functions that are called from main.	2018-06-12 17:24:21 +01:00
Owen Cliffe	1ad27f4f0d	Inverting deps on SQL, Log and MQ plugins to make them optional dependencies of extended servers, Removing some dead code that brought in unused dependencies Filtering out some non-linux transitive deps. (#1057 ) * initial Db helper split - make SQL and datastore packages optional * abstracting log store * break out DB, MQ and log drivers as extensions * cleanup * fewer deps * fixing docker test * hmm dbness * updating db startup * Consolidate all your extensions into one convenient package * cleanup * clean up dep constraints	2018-06-11 18:23:28 +01:00
Tolga Ceylan	f97b63f878	fn: fixup temp dir read/write permissions if tmp fs size is not set. (#1024 ) When TmpFsSize is not set in a route, docker fails to create a /tmp mount that is writable. Forcing docker to explicitly to this if read-only root directory is enabled (default).	2018-06-01 10:49:07 -07:00
Tolga Ceylan	9584643142	fn: size restricted tmpfs /tmp and read-only / support (#1012 ) * fn: size restricted tmpfs /tmp and read-only / support ) read-only Root Fs Support ) removed CPUShares from docker API. This was unused. ) docker.Prepare() refactoring ) added docker.configureTmpFs() for size limited tmpfs on /tmp ) tmpfs size support in routes and resource tracker ) fix fn-test-utils to handle sparse files better in create file * test typo fix	2018-05-25 14:12:29 -07:00
Tolga Ceylan	8e440c835e	fn: fixup undeterministic test (#986 )	2018-05-10 08:08:10 -07:00
Tolga Ceylan	0f50537150	fn: allow specified docker networks in functions (#982 ) * fn: allow specified docker networks in functions If FN_DOCKER_NETWORK is specified with a list of networks, then agent driver picks the least used network to place functions on. * add mutex comment	2018-05-09 12:24:15 -07:00
jan grant	91e58afa55	The opencensus API changes between 0.6.0 and 0.9.0 (#980 ) We get some useful features in later versions; update so as to not pin downstream consumers (extensions) to an older version.	2018-05-09 14:55:00 +01:00
Tolga Ceylan	584e4e75eb	Experimental Pre-fork Pool: Recycle net ns (#890 ) * fn: experimental prefork recycle and other improvements ) Recycle and do not use same pool container again option. ) Two state processing: initializing versus ready (start-kill). ) Ready state is exempt from rate limiter. fn: experimental prefork pool multiple network support In order to exceed 1023 container (bridge port) limit, add multiple networks: for i in fn-net1 fn-net2 fn-net3 fn-net4 do docker network create $i done to Docker startup, (eg. dind preentry.sh), then provide this to prefork pool using: export FN_EXPERIMENTAL_PREFORK_NETWORKS="fn-net1 fn-net2 fn-net3 fn-net4" which should be able to spawn 1023 * 4 containers. * fn: fixup tests for cfg move * fn: add ipc and pid namespaces into prefork pooling * fn: revert ipc and pid namespaces for now Pid/Ipc opens up the function container to pause container.	2018-04-05 15:07:30 -07:00
Tolga Ceylan	369f2ea17c	fn: experimental prefork tests should skip non Linux OSs (#904 )	2018-03-28 14:40:51 -07:00
Justin Ko	9cb883ca68	Godoc fixes (#898 ) Add some godoc comments for the api/agent package and some of its subpackages.	2018-03-28 10:16:40 -07:00
Reed Allman	8af605cf3d	update thrift, opencensus, others (#893 ) * update thrift, opencensus, others * stats: update to opencensus 0.6.0 view api	2018-03-26 15:43:49 -07:00
Tolga Ceylan	0addcb8911	fn: pre-fork pool for namespace/network speedup (#874 ) * fn: pre-fork pool experimental implementation	2018-03-23 16:35:35 -07:00
Tolga Ceylan	cb61a678d9	fn: add storage opt size support (#860 ) Added env FN_MAX_FS_SIZE_MB, which if defined and non-zero is passed to docker as storage opt size. We do not validate if this option is supported by docker currently. This is because it's difficult to actually validate this since it not only depends on storage driver and its backing filesystem, but also the mount options used to mount that fs.	2018-03-14 15:47:34 -07:00
Tolga Ceylan	7177bf3923	fn: enable failing test back (#826 ) * fn: enable failing test back * fn: fortifying the stderr output Modified limitWriter to discard excess data instead of returning error, this is to allow stderr/stdout pipes flowing to avoid head-of-line blocking or data corruption in container stdout/stderr output stream.	2018-03-09 09:57:28 -08:00
Tolga Ceylan	0ef0118150	fn: wait for async attach with success channel (#810 ) * fn: wait for async attach with success channel * fn: debug logs in test.sh * fn: circleci test output as artifact * fn: docker attach non-blocking adjustments * fn: remove retry from risky NB attach	2018-03-08 15:46:32 -08:00
Tolga Ceylan	7677aad450	fn: I/O related improvements (#809 ) ) I/O protocol parse issues should shutdown the container as the container goes to inconsistent state between calls. (eg. next call may receive previous calls left overs.) ) Move ghost read/write code into io_utils in common. ) Clean unused error from docker Wait() ) We can catch one case in JSON, if there's remaining unparsed data in decoder buffer, we can shut the container ) stdout/stderr when container is not handling a request are now blocked if freezer is also enabled. ) if a fatal err is set for slot, we do not requeue it and proceed to shutdown *) added a test function for a few cases with freezer strict behavior	2018-03-07 15:09:24 -08:00
Reed Allman	206aa3c203	opentracing -> opencensus (#802 ) * update vendor directory, add go.opencensus.io * update imports * oops * s/opentracing/opencensus/ & remove prometheus / zipkin stuff & remove old stats * the dep train rides again * fix gin build * deps from last guy * start in on the agent metrics * she builds * remove tags for now, cardinality error is fussing. subscribe instead of register * update to patched version of opencensus to proceed for now TODO switch to a release * meh fix imports * println debug the bad boys * lace it with the tags * update deps again * fix all inconsistent cardinality errors * add our own logger * fix init * fix oom measure * remove bugged removal code * fix s3 measures * fix prom handler nil	2018-03-05 09:35:28 -08:00
Tolga Ceylan	a83f2cfbe8	fn: favor fn-test-utils over hello (to be decommissioned) (#761 )	2018-02-28 17:44:13 -08:00
Tolga Ceylan	46fad7ef80	fn: plumb up I/O errors from docker wait (#798 ) Reed Allman <rdallman10@gmail.com>'s I/O error fix.	2018-02-27 12:17:02 -08:00
Tolga Ceylan	8b65ae8f9a	fn: add docker command info to retry when logging errors (#795 )	2018-02-27 01:10:07 -08:00
Reed Allman	1a1250e5ea	disable fail whale logs (#768 ) we have been getting these from attach all this time and never needed these anyway. I ran cpu profiles of dockerd and this was 90% of docker cpu usage (json logs). woot. this will reduce i/o quite a bit, and we don't have to worry about them taking up any disk space either. from tests i get about 50% speedup with these off. the hunt continues...	2018-02-13 17:45:11 -08:00
Reed Allman	f287ad274e	support deeper / nesting of image names (#765 ) closes #764	2018-02-13 11:26:28 -08:00
Reed Allman	cbfd659e7e	cap docker retries to fixed number (#762 ) previously we would retry infinitely up to the context with some backoff in between. for hot functions, since we don't set any dead line on pulling or creating the image, this means it would retry forever without making any progress if e.g. the registry is inaccessable or any other temporary error that isn't actually temporary. this adds a hard cap of 10 retries, which gives approximately 13s if the ops take no time, still respecting the context deadline enclosed. the case where this was coming up is now tested for and was otherwise confusing for users to debug, now it spits out an ECONNREFUSED with the address of the registry, which should help users debug without having to poke around fn logs (though I don't like this as an excuse, not all users will be operators at some point in the near future, and this one makes sense) closes #727	2018-02-12 18:45:30 -08:00
Reed Allman	3ab49d4701	limit log size in containers (#748 ) closes #317 we could fiddle with this, but we need to at least bound these. this accomplishes that. 1m is picked since that's our default max log size for the time being per call, it also takes a little time to generate that many bytes through logs, typically (i.e. without trying to). I tested with 0, which spiked the i/o rate on my machine because it's constantly deleting the json log file. I also tested with 1k and it was similar (for a task that generated about 1k in logs quickly) -- in testing, this halved my throughput, whereas using 1m did not change the throughput at all. trying the 'none' driver and 'syslog' driver weren't great, 'none' turns off all stderr and 'syslog' blocks every log line (boo). anyway, this option seems to have no affect on the output we get in 'attach', which is what we really care about (i.e. docker is not logically capping this, just swapping out the log file). using 1m for this, e.g. if we have 500 hot containers on a machine we have potentially half a gig of worthless logs laying around. we don't need the docker logs laying around at all really, but short of writing a storage driver ourselves there don't seem to be too many better options. open to idears, but this is likely to hold us over for some time.	2018-02-08 17:16:26 -08:00
Tolga Ceylan	f27d47f2dd	Idle Hot Container Freeze/Preempt Support (#733 ) * fn: freeze/unfreeze and eject idle under resource contention	2018-02-07 17:21:53 -08:00
Tolga Ceylan	dc4d90432b	fn: memory limit adjustments (#746 ) 1) limit kernel memory which was previously unlimited, using same limits as user memory for a unified approach. 2) disable swap memory for containers	2018-02-07 16:48:52 -08:00
Tolga Ceylan	ebc6657071	fn: docker version check2 (#744 ) 1) now required docker version is 17.06 2) enable circle ci latest docker install 3) docker driver & agent check minimum version before start	2018-02-06 16:16:40 -08:00
Tolga Ceylan	39b2cb2d9b	Cpu resources (#642 ) * fn: cpu quota implementation	2018-01-12 11:38:28 -08:00
Travis Reeder	3b9818bc58	Switch to dep from glide (#664 )	2018-01-09 14:11:08 -08:00
Reed Allman	2ebc9c7480	hybrid mergy (#581 ) * so it begins * add clarification to /dequeue, change response to list to future proof * Specify that runner endpoints are also under /v1 * Add a flag to choose operation mode (node type). This is specified using the `FN_NODE_TYPE` environment variable. The default is the existing behaviour, where the server supports all operations (full API plus asynchronous and synchronous runners). The additional modes are: * API - the full API is available, but no functions are executed by the node. Async calls are placed into a message queue, and synchronous calls are not supported (invoking them results in an API error). * Runner - only the invocation/route API is present. Asynchronous and synchronous invocation requests are supported, but asynchronous requests are placed onto the message queue, so might be handled by another runner. * Add agent type and checks on Submit * Sketch of a factored out data access abstraction for api/runner agents * Fix tests, adding node/agent types to constructors * Add tests for full, API, and runner server modes. * Added atomic UpdateCall to datastore * adds in server side endpoints * Made ServerNodeType public because tests use it * Made ServerNodeType public because tests use it * fix test build * add hybrid runner client pretty simple go api client that covers surface area needed for hybrid, returning structs from models that the agent can use directly. not exactly sure where to put this, so put it in `/clients/hybrid` but maybe we should make `/api/runner/client` or something and shove it in there. want to get integration tests set up and use the real endpoints next and then wrap this up in the DataAccessLayer stuff. * gracefully handles errors from fn * handles backoff & retry on 500s * will add to existing spans for debuggo action * minor fixes * meh	2017-12-11 10:43:19 -08:00

1 2

61 Commits