fn-serverless

mirror of https://github.com/fnproject/fn.git synced 2022-10-28 21:29:17 +03:00

Author	SHA1	Message	Date
Tolga Ceylan	6eaf1578e6	fn: container initialization monitoring (#1288 ) Container initialization phase consumes resource tracker resources (token), during lengthy operations. In order for agent stability/liveness, this phase has to be evictable/cancelable and time bounded. With this change, introducing a new system wide environment setting to bound the time spent in container initialization phase. This phase includes docker-pull, docker-create, docker-attach, docker-start and UDS wait operations. This initialization period is also now considered evictable.	2018-11-15 13:37:43 -08:00
Andrea Rosa	182db94fad	Feature/acksync response writer (#1267 ) This implements a "detached" mechanism to get an ack from the runner once it actually starts to run a function. In this scenario the response returned back is just a 202 if we placed the function in a specific time-frame. If we hit some errors or we fail to place the fn in time we return back different errors.	2018-11-09 10:25:43 -08:00
Tolga Ceylan	25afb2f478	fn: remove tini option & env variable (#1301 )	2018-11-07 12:35:19 -08:00
Tolga Ceylan	975b780695	fn: tests for hung and bad docker repo during docker-pull (#1298 ) * fn: tests for hung and bad docker repo during docker-pull	2018-11-05 16:01:42 -08:00
Tolga Ceylan	de9c2cbb63	fn: cleanup of docker timeouts and docker health check (#1292 ) Moving the timeout management of various docker operations to agent. This allows for finer control over what operation should use. For instance, for pause/unpause our tolerance is very low to avoid resource issues. For docker remove, the consequences of failure will lead to potential agent failure and therefore we wait up to 10 minute. For cookie create/prepare (which includes docker-pull) we cap this at 10 minutes by default. With new UDS/FDK contract, health check is now obsoleted as container advertise health using UDS availibility.	2018-11-01 14:22:47 -07:00
Tolga Ceylan	a9bba2c3a8	fn: remove eviction timer to simplify eviction logic (#1223 ) We tie container pausing with evictions, where if a container is paused, then it is also eligible for eviction.	2018-09-18 15:20:39 -07:00
Reed Allman	3a82790d99	clean up hardcoded lsnr.sock refs, move iofs to /tmp (#1221 ) * clean up hardcoded lsnr.sock refs because what drivers.ContainerTask needs is another method, and we all know it atoning for my sins the first time around. and yes, i refuse to use a cross package exported constant (just think of the dep graphs) * fix tests	2018-09-18 08:12:44 -07:00
Richard Connon	493790dbd2	Add tmpfs IOFS (#1212 ) * Define an interface for IOFS handling. Add no-op and temporary directory implementations. * Move IOFS stuff out into separate file, add basic tmpfs implementation for linux only * Switch between directory and tmpfs based on platform and config * Respect FN_IOFS_OPTS * Make directory iofs default on all platforms * At least try to clean up a bit on failure * Add backout if IOFS creation fails * Add comment about iofs.Close	2018-09-17 11:50:43 -07:00
Owen Cliffe	6567f6e8ef	support configuration-based relative dirs (host and agent) for iofs (#1213 ) * support configuration-based relative dirs (host and agent) for iofs mounts * Send UDS requests as POST to <UDS>/call	2018-09-17 11:59:16 +01:00
Reed Allman	3a9c48b8a3	http-stream format (#1202 ) * POC code for inotify UDS-io-socket * http-stream format introducing the `http-stream` format support in fn. there are many details for this, none of which can be linked from github :( -- docs are coming (I could even try to add some here?). this is kinda MVP-ish level, but does not implement the remaining spec, ie 'headers' fixing up / invoke fixing up. the thinking being we can land this to test fdks / cli with and start splitting work up on top of this. all other formats work the same as previous (no breakage, only new stuff) with the cli you can set `format: http-stream` and deploy, and then invoke a function via the `http-stream` format. this uses unix domain socket (uds) on the container instead of previous stdin/stdout, and fdks will have to support this in a new fashion (will see about getting docs on here). fdk-go works, which is here: https://github.com/fnproject/fdk-go/pull/30 . the output looks the same as an http format function when invoking a function. wahoo. there's some amount of stuff we can clean up here, enumerated: * the cleanup of the sock files is iffy, high pri here * permissions are a pain in the ass and i punted on dealing with them. you can run `sudo ./fnserver` if running locally, it may/may not work in dind(?) ootb * no pipe usage at all (yay), still could reduce buffer usage around the pipe behavior, we could clean this up potentially before removal (and tests) * my brain can’t figure out if dispatchOldFormats changes pipe behavior, but tests work * i marked XXX to do some clean up which will follow soon… need this to test fdk tho so meh, any thoughts on those marked would be appreciated however (1 less decision for me). mostly happy w/ general shape/plumbing tho * there are no tests atm, this is a tricky dance indeed. attempts were made. need to futz with the permission stuff before committing to adding any tests here, which I don't like either. also, need to get the fdk-go based test image updated according to the fdk-go, and there's a dance there too. rumba time.. * delaying the big big cleanup until we have good enough fdk support to kill all the other formats. open to ideas on how to maneuver landing stuff... * fix unmount * see if the tests work on ci... * add call id header * fix up makefile * add configurable iofs opts * add format file describing http-stream contract * rm some cruft * default iofs to /tmp, remove mounting out of the box fn we can't mount. /tmp will provide a memory backed fs for us on most systems, this will be fine for local developing and this can be configured to be wherever for anyone that wants to make things more difficult for themselves. also removes the mounting, this has to be done as root. we can't do this in the oss fn (short of requesting root, but no). in the future, we may want to have a knob here to have a function that can be configured in fn that allows further configuration here. since we don't know what we need in this dept really, not doing that yet (it may be the case that it could be done operationally outside of fn, eg, but not if each directory needs to be configured itself, which seems likely, anyway...) * add WIP note just in case...	2018-09-14 10:59:12 +01:00
Tolga Ceylan	586d5c4735	fn: make call.End() to blocking to reduce complexity (#1208 ) agent/lb-agent/runner roles execute call.End() in the background in some cases to reduce latency. With this change, we simplify this and switch to non-background execution of call.End(). This fixes hard to detect issues such as non-deterministic calculation of call.CompletedAt or incomplete Call.Stats in runners. Downstream projects if impacted by the now blocking call.End() latency should take steps to handle this according to their requirements.	2018-09-13 11:28:11 +01:00
Reed Allman	7638b31e11	use tini to run every container (#1195 ) fixes #1101 additional context: * this was introduced in docker 1.13 (1/2017), we require docker 17.10 (10/2017), this should not have any issues dependency-wise, as `docker-init` is in the docker install from that point in time. unless explicitly removed, it should be in the dind container we use as well... * the PR that introduced this to docker is https://github.com/moby/moby/pull/26061 for additional context * it may be wise to put this through some paces, if anybody has any... interesting... function containers. the tests seem to work fine, however, and this shouldn't be something users have to think about (?) at all, just something that we are doing. this isn't the default in docker for compatibility reasons, which is maybe a yellow flag but I am not sure tbh	2018-09-04 15:41:30 -07:00
Reed Allman	a6d60551ab	disable user function logs at debug level config (#1179 )	2018-08-21 21:02:49 -07:00
Reed Allman	af94f3f8ac	move max_request_size from agent to server (#1145 ) moves the config option for max request size up to the front end, adds the env var for it there, adds a server test for it and removes it from agent. a request is either gonna come through the lb (before grpc) or to the server, we can handle limiting the request there at least now, which may be easier than having multiple layers of request body checking. this aligns with not making the agent as responsible for http behaviors (eventually, not at all once route is fully deprecated).	2018-07-31 08:58:47 -07:00
Reed Allman	409c104df3	make agent options/config pass lint checks (#1144 )	2018-07-30 16:04:27 -07:00
Tolga Ceylan	5dc5740a54	fn: runner status and docker load images (#1116 ) * fn: runner status and docker load images Introducing a function run for pure runner Status calls. Previously, Status gRPC calls returned active inflight request counts with the purpose of a simple health checker. However this is not sufficient since it does not show if agent or docker is healthy. With this change, if pure runner is configured with a status image, that image is executed through docker. The call uses zero memory/cpu/tmpsize settings to ensure resource tracker does not block it. However, operators might not always have a docker repository accessible/available for status image. Or operators might not want the status to go over the network. To allow such cases, and in general possibly caching docker images, added a new environment variable FN_DOCKER_LOAD_FILE. If this is set, fn-agent during startup will load these images that were previously saved with 'docker save' into docker.	2018-07-12 13:58:38 -07:00
Tolga Ceylan	d190167580	fn: read-only root fs becomes default (#1019 ) * fn: read-only root fs becomes default Set root fs as read-only by default. * fn: update doc for FN_DISABLE_READONLY_ROOTFS	2018-05-30 18:17:28 -07:00
Tolga Ceylan	9584643142	fn: size restricted tmpfs /tmp and read-only / support (#1012 ) * fn: size restricted tmpfs /tmp and read-only / support ) read-only Root Fs Support ) removed CPUShares from docker API. This was unused. ) docker.Prepare() refactoring ) added docker.configureTmpFs() for size limited tmpfs on /tmp ) tmpfs size support in routes and resource tracker ) fix fn-test-utils to handle sparse files better in create file * test typo fix	2018-05-25 14:12:29 -07:00
Tolga Ceylan	4ccde8897e	fn: lb and pure-runner with non-blocking agent (#989 ) * fn: lb and pure-runner with non-blocking agent ) Removed pure-runner capacity tracking code. This did not play well with internal agent resource tracker. ) In LB and runner gRPC comm, removed ACK. Now, upon TryCall, pure-runner quickly proceeds to call Submit. This is good since at this stage pure-runner already has all relevant data to initiate the call. ) Unless pure-runner emits a NACK, LB immediately streams http body to runners. ) For retriable requests added a CachedReader for http.Request Body. ) Idempotenty/retry is similar to previous code. After initial success in Engament, after attempting a TryCall, unless we receive NACK, we cannot retry that call. ) ch and naive places now wraps each TryExec with a cancellable context to clean up gRPC contexts quicker. * fn: err for simpler one-time read GetBody approach This allows for a more flexible approach since we let users to define GetBody() to allow repetitive http body read. In default LB case, LB executes a one-time io.ReadAll and sets of GetBody, which is detected by RunnerCall.RequestBody(). * fn: additional check for non-nil req.body * fn: attempt to override IO errors with ctx for TryExec * fn: system-tests log dest * fn: LB: EOF send handling * fn: logging for partial IO * fn: use buffer pool for IO storage in lb agent * fn: pure runner should use chunks for data msgs * fn: required config validations and pass APIErrors * fn: additional tests and gRPC proto simplification ) remove ACK/NACK messages as Finish message type works OK for this purpose. ) return resp in api tests for check for status code ) empty body json test in api tests for lb & pure-runner fn: buffer adjustments ) setRequestBody result handling correction ) switch to bytes.Reader for read-only safety ) io.EOF can be returned for non-nil Body in request. fn: clarify detection of 503 / Server Too Busy	2018-05-17 12:09:03 -07:00
Tolga Ceylan	eab85dfab0	fn: agent MaxRequestSize limit (#998 ) * fn: agent MaxRequestSize limit Currently, LimitRequestBody() exists to install a http request body size in http/gin server. For production enviroments, this is expected to be used. However, in agents we may need to verify/enforce these size limits and to be able to assert in case of missing limits is valuable. With this change, operators can define an agent env variable to limit this in addition to installing Gin/Http handler. http.MaxBytesReader is superior in some cases as it sets http headers (Connection: close) to guard against subsequent requests. However, NewClampReadCloser() is superior in other cases, where it can cleanly return an API error for this case alone (http.MaxBytesReader() does not return a clean error type for overflow case, which makes it difficult to use it without peeking into its implementation.) For lb agent, upcoming changes rely on such limits enabled and using gin/http handler (http.MaxBytesReader) makes such checks/safety validations difficult. * fn: read/write clamp code adjustment In case of overflows, opt for simple implementation of a partial write followed by return error.	2018-05-16 11:45:57 -07:00
Tolga Ceylan	0f50537150	fn: allow specified docker networks in functions (#982 ) * fn: allow specified docker networks in functions If FN_DOCKER_NETWORK is specified with a list of networks, then agent driver picks the least used network to place functions on. * add mutex comment	2018-05-09 12:24:15 -07:00
Tolga Ceylan	54ba49be65	fn: non-blocking resource tracker and notification (#841 ) * fn: non-blocking resource tracker and notification For some types of errors, we might want to notify the actual caller if the error is directly 1-1 tied to that request. If hotLauncher is triggered with signaller, then here we send a back communication error notification channel. This is passed to checkLaunch to send back synchronous responses to the caller that initiated this hot container launch. This is useful if we want to run the agent in quick fail mode, where instead of waiting for CPU/Mem to become available, we prefer to fail quick in order not to hold up the caller. To support this, non-blocking resource tracker option/functions are now available. * fn: test env var rename tweak * fn: fixup merge * fn: rebase test fix * fn: merge fixup * fn: test tweak down to 70MB for 128MB total * fn: refactor token creation and use broadcast regardless * fn: nb description * fn: bugfix	2018-04-24 21:59:33 -07:00
Tolga Ceylan	584e4e75eb	Experimental Pre-fork Pool: Recycle net ns (#890 ) * fn: experimental prefork recycle and other improvements ) Recycle and do not use same pool container again option. ) Two state processing: initializing versus ready (start-kill). ) Ready state is exempt from rate limiter. fn: experimental prefork pool multiple network support In order to exceed 1023 container (bridge port) limit, add multiple networks: for i in fn-net1 fn-net2 fn-net3 fn-net4 do docker network create $i done to Docker startup, (eg. dind preentry.sh), then provide this to prefork pool using: export FN_EXPERIMENTAL_PREFORK_NETWORKS="fn-net1 fn-net2 fn-net3 fn-net4" which should be able to spawn 1023 * 4 containers. * fn: fixup tests for cfg move * fn: add ipc and pid namespaces into prefork pooling * fn: revert ipc and pid namespaces for now Pid/Ipc opens up the function container to pause container.	2018-04-05 15:07:30 -07:00
Tolga Ceylan	81954bcf53	fn: perform call.End() after request is processed (#918 ) * fn: perform call.End() after request is processed call.End() performs several tasks in sequence; insert call, insert log, (todo) remove mq entry, fireAfterCall callback, etc. These currently add up to the request latency as return from agent.Submit() is blocked on these. We also haven't been able to apply any timeouts on these operations since they are handled during request processing and it is hard to come up with a strategy for it. Also the error cases (couldn't insert call or log) are not propagated to the caller. With this change, call.End() handling becomes asynchronous where we perform these tasks after the request is done. This improves latency and we no longer have to block the call on these operations. The changes will also free up the agent slot token more quickly and now we are no longer tied to hiccups in call.End(). Now, a timeout policy is also added to this which can be adjusted with an env variable. (default 10 minutes) This accentuates the fact that call/log/fireAfterCall are not completed when request is done. So, there's a window there where call is done, but call/log/fireAfterCall are not yet propagated. This was already the case especially for error cases. There's slight risk of accumulating call.End() operations in case of hiccups in these log/call/callback systems. * fn: address risk of overstacking of call.End() calls.	2018-04-05 14:42:12 -07:00
Tolga Ceylan	c58caee78d	fn: update minimum docker version required. (#916 ) Oracle Linux 7.4 backported versions still having issues with freezing/terminating containers. 17.10.0-ce seems like a resonable lowest common denominator.	2018-04-04 16:43:30 -07:00
Tolga Ceylan	0addcb8911	fn: pre-fork pool for namespace/network speedup (#874 ) * fn: pre-fork pool experimental implementation	2018-03-23 16:35:35 -07:00
Tolga Ceylan	cb61a678d9	fn: add storage opt size support (#860 ) Added env FN_MAX_FS_SIZE_MB, which if defined and non-zero is passed to docker as storage opt size. We do not validate if this option is supported by docker currently. This is because it's difficult to actually validate this since it not only depends on storage driver and its backing filesystem, but also the mount options used to mount that fs.	2018-03-14 15:47:34 -07:00
Tolga Ceylan	74a51f3f88	fn: reorg agent config (#853 ) * fn: reorg agent config ) Moving constants in agent to agent config, which helps with testing, tuning. ) Added max total cpu & memory for testing & clamping max mem & cpu usage if needed. * fn: adjust PipeIO time * fn: for hot, cannot reliably test EndOfLogs in TestRouteRunnerExecution	2018-03-13 18:38:47 -07:00
Tolga Ceylan	7177bf3923	fn: enable failing test back (#826 ) * fn: enable failing test back * fn: fortifying the stderr output Modified limitWriter to discard excess data instead of returning error, this is to allow stderr/stdout pipes flowing to avoid head-of-line blocking or data corruption in container stdout/stderr output stream.	2018-03-09 09:57:28 -08:00
Tolga Ceylan	7677aad450	fn: I/O related improvements (#809 ) ) I/O protocol parse issues should shutdown the container as the container goes to inconsistent state between calls. (eg. next call may receive previous calls left overs.) ) Move ghost read/write code into io_utils in common. ) Clean unused error from docker Wait() ) We can catch one case in JSON, if there's remaining unparsed data in decoder buffer, we can shut the container ) stdout/stderr when container is not handling a request are now blocked if freezer is also enabled. ) if a fatal err is set for slot, we do not requeue it and proceed to shutdown *) added a test function for a few cases with freezer strict behavior	2018-03-07 15:09:24 -08:00
Tolga Ceylan	89a1fc7c72	Response size clamp (#786 ) ) Limit response http body or json response size to FN_MAX_RESPONSE_SIZE (default unlimited) ) If limits are exceeded 502 is returned with 'body too large' in the error message	2018-03-01 17:14:50 -08:00
Tolga Ceylan	320b766a6d	fn: introduce agent config and minor ghostreader tweak (#797 ) * fn: introduce agent config and minor ghostreader tweak TODO: move all constants/tweaks in agent to agent config. * fn: json convention	2018-02-27 12:17:13 -08:00

32 Commits