fn-serverless

mirror of https://github.com/fnproject/fn.git synced 2022-10-28 21:29:17 +03:00

Author	SHA1	Message	Date
Reed Allman	29fdbc9b49	disable pure runner logging (#1313 ) * disable pure runner logging there's a racey bug where the logger is being written to when it's closing, but this led to figuring out that we don't need the logger at all in pure runner really, the syslog thing isn't an in process fn thing and we don't need the logs from attach for anything further in pure runner. so this disables the logger at the docker level, to save sending the bytes back over the wire, this could be a nice little performance bump too. of course, with this, it means agents can be configured to not log debug or have logs to store at all, and not a lot of guards have been put on this for 'full' agent mode while it hangs on a cross feeling the breeze awaiting its demise - the default configuration remains the same, and no behavior changes in 'full' agent are here. it was a lot smoother to make the noop than to try to plumb in 'nil' for stdout/stderr, this has a lot lower risk of nil panic issues for the same effect, though it's not perfect relying on type casting, plumbing in an interface to check has the same issues (loss of interface adherence for any decorator), so this seems ok. defaulting to not having a logger was similarly painful, and ended up with this. but open to ideas. * replace usage of old null reader writer impl * make Read return io.EOF for io.Copy usage	2018-11-16 12:56:49 -06:00
Tolga Ceylan	c89f1e5f9c	fn: safer hand over between monitoring and main processing (#1316 ) In runHot(), it's safer to use a separate channel between monitoring go-routine and processing go-routine to handle cancellations triggered by monitorin go-routine.	2018-11-15 16:57:16 -08:00
Tolga Ceylan	6eaf1578e6	fn: container initialization monitoring (#1288 ) Container initialization phase consumes resource tracker resources (token), during lengthy operations. In order for agent stability/liveness, this phase has to be evictable/cancelable and time bounded. With this change, introducing a new system wide environment setting to bound the time spent in container initialization phase. This phase includes docker-pull, docker-create, docker-attach, docker-start and UDS wait operations. This initialization period is also now considered evictable.	2018-11-15 13:37:43 -08:00
Tolga Ceylan	fe2b9fb53d	fn: cookie and driver api changes (#1312 ) Now obsoleted driver.PrepareCookie() call handled image and container creation. In agent, going forward we will need finer grained control over the timeouts implied by the contexts. For this reason, with this change, we split PrepareCookie() into Validate/Pull/Create calls under Cookie interface.	2018-11-14 16:51:05 -08:00
Tolga Ceylan	8ee4c1098b	fn: correct typo in docker command tag (#1311 )	2018-11-14 11:38:48 -08:00
Eric Fode	90e39c8fd3	initial addition of the diskfree op (#1308 ) * initial addition of the diskfree op fixing up some typos last of fmt errors * fixed up some feedbacks	2018-11-14 09:22:07 -08:00
Andrea Rosa	182db94fad	Feature/acksync response writer (#1267 ) This implements a "detached" mechanism to get an ack from the runner once it actually starts to run a function. In this scenario the response returned back is just a 202 if we placed the function in a specific time-frame. If we hit some errors or we fail to place the fn in time we return back different errors.	2018-11-09 10:25:43 -08:00
Tolga Ceylan	25afb2f478	fn: remove tini option & env variable (#1301 )	2018-11-07 12:35:19 -08:00
Tolga Ceylan	975b780695	fn: tests for hung and bad docker repo during docker-pull (#1298 ) * fn: tests for hung and bad docker repo during docker-pull	2018-11-05 16:01:42 -08:00
Tolga Ceylan	5415b2bc38	fn: move UDS client into container to keep runHot() simpler (#1297 )	2018-11-02 14:03:09 -07:00
Tolga Ceylan	ac17825a36	fn: add container state to eviction stats (#1296 )	2018-11-02 13:32:13 -07:00
Tolga Ceylan	de9c2cbb63	fn: cleanup of docker timeouts and docker health check (#1292 ) Moving the timeout management of various docker operations to agent. This allows for finer control over what operation should use. For instance, for pause/unpause our tolerance is very low to avoid resource issues. For docker remove, the consequences of failure will lead to potential agent failure and therefore we wait up to 10 minute. For cookie create/prepare (which includes docker-pull) we cap this at 10 minutes by default. With new UDS/FDK contract, health check is now obsoleted as container advertise health using UDS availibility.	2018-11-01 14:22:47 -07:00
Tolga Ceylan	e227802512	fn: Remove error channel for container exits (#1287 ) The channel is unnecessary and unreliable since exits trigger I/O failure on UDS earlier than we detect the exit.	2018-10-30 12:11:23 -07:00
Reed Allman	e13a6fd029	death to format (#1281 ) * get rid of old format stuff, utils usage, fix up for fdk2.0 interface * pure agent format removal, TODO remove format field, fix up all tests * shitter's clogged * fix agent tests * start rolling through server tests * tests compile, some failures * remove json / content type detection on invoke/httptrigger, fix up tests * remove hello, fixup system tests the fucking status checker test just hangs and it's testing that it doesn't work so the test passes but the test doesn't pass fuck life it's not worth it * fix migration * meh * make dbhelper shut up about dbhelpers not being used * move fail status at least into main thread, jfc * fix status call to have FN_LISTENER also turns off the stdout/stderr blocking between calls, because it's impossible to debug without that (without syslog), now that stdout and stderr go to the same place (either to host stderr or nowhere) and isn't used for function output this shouldn't be a big fuss really * remove stdin * cleanup/remind: fixed bug where watcher would leak if container dies first * silence system-test logs until fail, fix datastore tests postgres does weird things with constraints when renaming tables, took the easy way out system-tests were loud as fuck and made you download a circleci text file of the logs, made them only yell when they goof * fix fdk-go dep for test image. fun * fix swagger and remove test about format * update all the gopkg files * add back FN_FORMAT for fdks that assert things. pfft * add useful error for functions that exit this error is really confounding because containers can exit for all manner of reason, we're just guessing that this is the most likely cause for now, and this error message should very likely change or be removed from the client path anyway (context.Canceled wasn't all that useful either, but anyway, I'd been hunting for this... so found it). added a test to avoid being publicly shamed for 1 line commits (beware...).	2018-10-26 10:43:04 -07:00
Tolga Ceylan	241d3fede1	fn: blocking mode should not emit 503 if can't evict (#1283 )	2018-10-25 12:17:26 -07:00
Tolga Ceylan	bf41789af2	fn: eviction resource correction (#1282 ) Previously evictor did not perform an eviction if total cpu/mem of evictable containers was less than requested cpu/mem. With this change, we try to perform evictions based on actual needed cpu & mem reported by resource tracker.	2018-10-25 11:10:19 +01:00
Tolga Ceylan	8fe1c9a07c	fn: reduce logging for evicted containers (#1276 ) Let's not log evicted containers which would be context canceled.	2018-10-18 15:10:15 -07:00
Tolga Ceylan	44e366d195	fn: add details to runner finish logging (#1271 ) Adding http-status/fn-http-status details in runner finish logger.	2018-10-15 12:15:08 -07:00
Tolga Ceylan	f10fab21bc	fn: fixup possible go-routine leak (#1265 )	2018-10-05 17:02:18 -07:00
Reed Allman	e6eec186d0	small tweaks to dispatch (#1264 ) * the dispatch span actually encloses dispatch and gives an accurate span now * turning a call into an http request can't fail unless it's our fault, if tests don't catch this, we don't deserve money * moved http req creation inside of dispatch goroutine there's further work to do cleaning up dispatch... removing the old formats will make this slightly more clear, waiting for that. this was bugging me anyway after seeing something else and was easy to fix up.	2018-10-05 16:32:01 -07:00
Tolga Ceylan	29dcf0a791	fn: adding docker events to stats (#1262 ) Streaming docker events is useful as we can record/capture some asynchronous containers events such as out-of-memory. For now, we record these in opencensus/prometheus stats.	2018-10-04 18:54:09 -07:00
Tolga Ceylan	f132bba3fb	fn: adding hot launcher eviction waiting (#1257 ) If checkLaunch triggers evictions, it must wait for these eviction to complete before returning. Premature returning from checkLaunch will cause checkLaunch to be called again by hot launcher. This causes checkLaunch to receive an out of capacity error and causes a 503. The evictor is also improved with this PR and it provides a slice of channels to wait on if evictions are taking place. Eviction token deletion is performed after resource token close to ensure that once an eviction is done, resource token is also free.	2018-10-01 16:16:29 -07:00
Tolga Ceylan	2e610a264a	fn: remove async+sync seperation in resource tracker (#1254 ) This simplifies resource tracker. Originally, logically we had split the cpu/mem into two pools where a 20% was kept specifically for sync calls to avoid async calls dominating the system. However, resource tracker should not handle such call prioritization. Given the improvements to the evictor, I think we can get rid of this code in resource tracker for time being.	2018-10-01 10:46:32 -07:00
Dario Domizioli	4a862212a2	Limit connection pool size on UDS: we should only need one per container (#1252 ) Hopefully this reduces FD usage even further.	2018-09-28 11:07:31 -07:00
Tolga Ceylan	a256d96f1e	fn: keepalives timeout for UDS http-stream client (#1253 )	2018-09-28 10:59:22 -07:00
Dario Domizioli	5aabdae26a	Fix missing context on request sent through UDS (#1251 )	2018-09-28 14:54:05 +01:00
Richard Connon	8d8c7df569	Log failure to close fsnotify handle (#1250 )	2018-09-28 12:06:43 +01:00
Owen Cliffe	53d4be00ca	Add checks for unix socket destination to avoid FDK tricking agent into talking to non-relative dirs (#1247 ) * Add checks for unix socket destination to avoid leaking access to host OS * style, typos	2018-09-27 18:20:03 -07:00
Reed Allman	319e0af41c	we shouldn't log tokens, this shouldn't have been info either and was noisy (#1249 ) * we shouldn't log tokens, this shouldn't have been info either and was noisy * simplify logic too	2018-09-27 23:37:35 +01:00
Reed Allman	01b8e8679d	HTTP trigger http-stream tests (#1241 )	2018-09-26 13:25:48 +01:00
Tom Coupland	d454ff9aa4	Initial Refactor (#1234 ) * Inital Refactor Removing the repeated logic exposed some problems with the reponse writers. Currently, the trigger writer was overlaid on part of the header writing. The main invoke blog writing into the different levels of the overlays at different points in the logic. Instead, by extending the types and embedded structs, the writer is more transparent. So, at the end of the flow it goes over all the headers available and removes our prefixes. This lets the invoke logic just write to the top level. Going to continue after lunch to try and remove some of the layers and param passing. * Try and repeat concurrency failure * Nested FromHTTPFnRequest inside FromHTTPTriggerRequest * Consolidate buffer pooling logic * go fmt yourself * fix import	2018-09-24 12:20:30 +01:00
Tolga Ceylan	a994b57d9a	fn: freezer/evictor adjustments (#1233 ) ) removed faulty Idle state setter in runHot() since with UDS wait, we need to wait until we can determine if a container is idle. This is now moved to runHotReq(). ) evictor now more aggresive and no longer tied to pause timer/configuration. *) removed unnecessary optimization on timer=0 case for immediate pause.	2018-09-20 14:13:11 -07:00
Vijay Krishnan	b2f85b70ea	Use registry auth token from Call extensions to pull images (#1228 )	2018-09-20 13:57:41 -07:00
Owen Cliffe	d9b74cfd14	Gateway trigger support (#1225 ) * initial gateway trigger support * Pass Content-Type down to wrapped writer * Move req header setting * Adding call id to responses * add dupe Fn-Call-Id headers	2018-09-20 11:30:28 -07:00
Reed Allman	87e2562db9	Http stream invoke tests (#1231 ) * adds parity level of testing http-stream invoke the other formats had a gamut of tests, now http-stream does too. this makes obvious some of its behaviors. some things changed / can change now that we don't have pipes to worry about, the main one being that when containers blow up now the uds client will get an EOF/ECONNREFUSED instead of the pipe getting wedged up (allowing us to get the container error easily, previously). I made my best 50% effort to make a reasonable error for when this happens (similar to when http/json received garbage errors), open to ideas on verbiage / policy there. should be pretty straightforward. one thing to notice is that http/json/default don't return our fancy new Fn-Http-Status or Fn-Http-H headers... it's relatively easy to go add this to fdk-go just to test this, but for invoke I'm really not sure we care (?) and for the gateway, the output will be identical with the old formats bypassing the header decap. if anybody has any feelings, feel free to express them. * fix oomer up for new error * Adding http header stripping to agent Adding the header stripping into the agent, this should be low enough that all routes to fns get treated the same.	2018-09-20 18:52:20 +01:00
Reed Allman	485fa465a0	Stream test commence (#1224 ) * initial invoke testing this assures that Content-Type and Fn-Http-Status are set for an http-stream function. it took some fixing up of the test utils code for the plumbing to work, looking forward to deleting most stuff in fn-test-utils.go file around each format -- had to update fdk-go to latest for http-stream support. this only adds 1 test, since there's some machinery here, and would like to unblock working on the http gateway simultaneously while adding a full suite of invoke tests (this work can be parallelized)... i added debug logs back to the debugging output. turns out this is useful, but it can get noisy (only when things fail, hopefully). * fix oom tests?	2018-09-19 08:48:48 -07:00
Tolga Ceylan	a9bba2c3a8	fn: remove eviction timer to simplify eviction logic (#1223 ) We tie container pausing with evictions, where if a container is paused, then it is also eligible for eviction.	2018-09-18 15:20:39 -07:00
Reed Allman	3a82790d99	clean up hardcoded lsnr.sock refs, move iofs to /tmp (#1221 ) * clean up hardcoded lsnr.sock refs because what drivers.ContainerTask needs is another method, and we all know it atoning for my sins the first time around. and yes, i refuse to use a cross package exported constant (just think of the dep graphs) * fix tests	2018-09-18 08:12:44 -07:00
Tolga Ceylan	893ff1e6fc	fn: add missing dequeue in agent Submit (#1220 )	2018-09-17 17:58:12 -07:00
Richard Connon	493790dbd2	Add tmpfs IOFS (#1212 ) * Define an interface for IOFS handling. Add no-op and temporary directory implementations. * Move IOFS stuff out into separate file, add basic tmpfs implementation for linux only * Switch between directory and tmpfs based on platform and config * Respect FN_IOFS_OPTS * Make directory iofs default on all platforms * At least try to clean up a bit on failure * Add backout if IOFS creation fails * Add comment about iofs.Close	2018-09-17 11:50:43 -07:00
Tolga Ceylan	b0c93dbd82	fn: new agent resource tracker metrics (#1215 ) New metrics for agent resource tracker: CpuUsed, CpuAvail, MemUsed, MemAvail.	2018-09-17 10:31:17 -07:00
Tom Coupland	d56a49b321	Remove V1 endpoints and Routes (#1210 ) Largely a removal job, however many tests, particularly system level ones relied on Routes. These have been migrated to use Fns. * Add 410 response to swagger * No app names in log tags * Adding constraint in GetCall for FnID * Adding test to check FnID is required on call * Add fn_id to call selector * Fix text in docker mem warning * Correct buildConfig func name * Test fix up * Removing CPU setting from Agent test CPU setting has been deprecated, but the code base is still riddled with it. This just removes it from this layer. Really we need to remove it from Call. * Remove fn id check on calls * Reintroduce fn id required on call * Adding fnID to calls for execute test * Correct setting of app id in middleware * Removes root middlewares ability to redirect fun invocations * Add over sized test check * Removing call fn id check	2018-09-17 16:44:51 +01:00
Owen Cliffe	6567f6e8ef	support configuration-based relative dirs (host and agent) for iofs (#1213 ) * support configuration-based relative dirs (host and agent) for iofs mounts * Send UDS requests as POST to <UDS>/call	2018-09-17 11:59:16 +01:00
Tolga Ceylan	aa13a40168	fn: agent/lb/runner error handling adjustments (#1214 ) 1) Early call validation and return due to cpu/mem impossible to meet (eg. request cpu/mem larger than max-mem or max-cpu on server) now emits HTTP Bad Request (400) instead of 503. This case is most likely due to client/service configuration and/or validation issue. 2) 'failed' metric is now removed. 'failed' versus 'errors' were too confusing. 'errors' is now a catch all error case. 3) new 'canceled' counter for client side cancels. 4) 'server_busy' now covers more cases than it previously did.	2018-09-14 16:50:14 -07:00
Reed Allman	2b797a556a	update docs with pro tips for fdk http stream people (#1211 ) * update docs with pro tips for fdk http stream people * fix bug where container could die before uds wait we used to hang out for an hour. oopsie, thanks Owen	2018-09-14 16:54:18 +01:00
Reed Allman	3a9c48b8a3	http-stream format (#1202 ) * POC code for inotify UDS-io-socket * http-stream format introducing the `http-stream` format support in fn. there are many details for this, none of which can be linked from github :( -- docs are coming (I could even try to add some here?). this is kinda MVP-ish level, but does not implement the remaining spec, ie 'headers' fixing up / invoke fixing up. the thinking being we can land this to test fdks / cli with and start splitting work up on top of this. all other formats work the same as previous (no breakage, only new stuff) with the cli you can set `format: http-stream` and deploy, and then invoke a function via the `http-stream` format. this uses unix domain socket (uds) on the container instead of previous stdin/stdout, and fdks will have to support this in a new fashion (will see about getting docs on here). fdk-go works, which is here: https://github.com/fnproject/fdk-go/pull/30 . the output looks the same as an http format function when invoking a function. wahoo. there's some amount of stuff we can clean up here, enumerated: * the cleanup of the sock files is iffy, high pri here * permissions are a pain in the ass and i punted on dealing with them. you can run `sudo ./fnserver` if running locally, it may/may not work in dind(?) ootb * no pipe usage at all (yay), still could reduce buffer usage around the pipe behavior, we could clean this up potentially before removal (and tests) * my brain can’t figure out if dispatchOldFormats changes pipe behavior, but tests work * i marked XXX to do some clean up which will follow soon… need this to test fdk tho so meh, any thoughts on those marked would be appreciated however (1 less decision for me). mostly happy w/ general shape/plumbing tho * there are no tests atm, this is a tricky dance indeed. attempts were made. need to futz with the permission stuff before committing to adding any tests here, which I don't like either. also, need to get the fdk-go based test image updated according to the fdk-go, and there's a dance there too. rumba time.. * delaying the big big cleanup until we have good enough fdk support to kill all the other formats. open to ideas on how to maneuver landing stuff... * fix unmount * see if the tests work on ci... * add call id header * fix up makefile * add configurable iofs opts * add format file describing http-stream contract * rm some cruft * default iofs to /tmp, remove mounting out of the box fn we can't mount. /tmp will provide a memory backed fs for us on most systems, this will be fine for local developing and this can be configured to be wherever for anyone that wants to make things more difficult for themselves. also removes the mounting, this has to be done as root. we can't do this in the oss fn (short of requesting root, but no). in the future, we may want to have a knob here to have a function that can be configured in fn that allows further configuration here. since we don't know what we need in this dept really, not doing that yet (it may be the case that it could be done operationally outside of fn, eg, but not if each directory needs to be configured itself, which seems likely, anyway...) * add WIP note just in case...	2018-09-14 10:59:12 +01:00
Tolga Ceylan	4dcdb7d982	fn: paused and evicted container stats (#1209 ) * fn: paused and evicted container stats With this change, now stats reports paused state as well as incidents of container exit due to evictions. * fn: update/document state transitions in state tracker There's no case of a transition moving from done to waiting. This must be deprecated behavior.	2018-09-13 16:24:26 -07:00
Tolga Ceylan	586d5c4735	fn: make call.End() to blocking to reduce complexity (#1208 ) agent/lb-agent/runner roles execute call.End() in the background in some cases to reduce latency. With this change, we simplify this and switch to non-background execution of call.End(). This fixes hard to detect issues such as non-deterministic calculation of call.CompletedAt or incomplete Call.Stats in runners. Downstream projects if impacted by the now blocking call.End() latency should take steps to handle this according to their requirements.	2018-09-13 11:28:11 +01:00
Tom Coupland	a0ccc4d7c4	Copy logs up to v2 endpoints (#1207 ) Copies the log endpoints up to the V2 endpoints, in a similar way to the call endpoints. The main change is to when logs are inserted into S3. The signature of the function has been changed to take the whole call object, rather than just the app and call id's. This allows the function to switch between calls for Routes and those for Fns. Obviously this switching can be removed when v1 is removed. In the sql implementation it inserts with both appID and fnID, this allows the two get's to work, and the down grade of the migration. When the v1 logs are removed, the appId can be dropped. The log fetch test and error messages have been changed to be FnID specific.	2018-09-13 10:30:10 +01:00
Tolga Ceylan	aabbe0fba5	fn: check context timeout when waiting for non-blocking attach (#1201 ) * fn: check context timeout when waiting for non-blocking attach With this change, we no longer allow docker client AttachToContainerNonBlocking to block on Success channel more than our context deadline/timeout. * fn: move nbio chan handling in attach to docker from docker-client	2018-09-12 13:01:51 -07:00

1 2 3 4 5 ...

318 Commits