fn-serverless

mirror of https://github.com/fnproject/fn.git synced 2022-10-28 21:29:17 +03:00

Author	SHA1	Message	Date
Tolga Ceylan	bf41789af2	fn: eviction resource correction (#1282 ) Previously evictor did not perform an eviction if total cpu/mem of evictable containers was less than requested cpu/mem. With this change, we try to perform evictions based on actual needed cpu & mem reported by resource tracker.	2018-10-25 11:10:19 +01:00
Tolga Ceylan	8fe1c9a07c	fn: reduce logging for evicted containers (#1276 ) Let's not log evicted containers which would be context canceled.	2018-10-18 15:10:15 -07:00
Tolga Ceylan	f10fab21bc	fn: fixup possible go-routine leak (#1265 )	2018-10-05 17:02:18 -07:00
Reed Allman	e6eec186d0	small tweaks to dispatch (#1264 ) * the dispatch span actually encloses dispatch and gives an accurate span now * turning a call into an http request can't fail unless it's our fault, if tests don't catch this, we don't deserve money * moved http req creation inside of dispatch goroutine there's further work to do cleaning up dispatch... removing the old formats will make this slightly more clear, waiting for that. this was bugging me anyway after seeing something else and was easy to fix up.	2018-10-05 16:32:01 -07:00
Tolga Ceylan	f132bba3fb	fn: adding hot launcher eviction waiting (#1257 ) If checkLaunch triggers evictions, it must wait for these eviction to complete before returning. Premature returning from checkLaunch will cause checkLaunch to be called again by hot launcher. This causes checkLaunch to receive an out of capacity error and causes a 503. The evictor is also improved with this PR and it provides a slice of channels to wait on if evictions are taking place. Eviction token deletion is performed after resource token close to ensure that once an eviction is done, resource token is also free.	2018-10-01 16:16:29 -07:00
Tolga Ceylan	2e610a264a	fn: remove async+sync seperation in resource tracker (#1254 ) This simplifies resource tracker. Originally, logically we had split the cpu/mem into two pools where a 20% was kept specifically for sync calls to avoid async calls dominating the system. However, resource tracker should not handle such call prioritization. Given the improvements to the evictor, I think we can get rid of this code in resource tracker for time being.	2018-10-01 10:46:32 -07:00
Dario Domizioli	4a862212a2	Limit connection pool size on UDS: we should only need one per container (#1252 ) Hopefully this reduces FD usage even further.	2018-09-28 11:07:31 -07:00
Tolga Ceylan	a256d96f1e	fn: keepalives timeout for UDS http-stream client (#1253 )	2018-09-28 10:59:22 -07:00
Dario Domizioli	5aabdae26a	Fix missing context on request sent through UDS (#1251 )	2018-09-28 14:54:05 +01:00
Richard Connon	8d8c7df569	Log failure to close fsnotify handle (#1250 )	2018-09-28 12:06:43 +01:00
Owen Cliffe	53d4be00ca	Add checks for unix socket destination to avoid FDK tricking agent into talking to non-relative dirs (#1247 ) * Add checks for unix socket destination to avoid leaking access to host OS * style, typos	2018-09-27 18:20:03 -07:00
Reed Allman	319e0af41c	we shouldn't log tokens, this shouldn't have been info either and was noisy (#1249 ) * we shouldn't log tokens, this shouldn't have been info either and was noisy * simplify logic too	2018-09-27 23:37:35 +01:00
Reed Allman	01b8e8679d	HTTP trigger http-stream tests (#1241 )	2018-09-26 13:25:48 +01:00
Tolga Ceylan	a994b57d9a	fn: freezer/evictor adjustments (#1233 ) ) removed faulty Idle state setter in runHot() since with UDS wait, we need to wait until we can determine if a container is idle. This is now moved to runHotReq(). ) evictor now more aggresive and no longer tied to pause timer/configuration. *) removed unnecessary optimization on timer=0 case for immediate pause.	2018-09-20 14:13:11 -07:00
Vijay Krishnan	b2f85b70ea	Use registry auth token from Call extensions to pull images (#1228 )	2018-09-20 13:57:41 -07:00
Reed Allman	87e2562db9	Http stream invoke tests (#1231 ) * adds parity level of testing http-stream invoke the other formats had a gamut of tests, now http-stream does too. this makes obvious some of its behaviors. some things changed / can change now that we don't have pipes to worry about, the main one being that when containers blow up now the uds client will get an EOF/ECONNREFUSED instead of the pipe getting wedged up (allowing us to get the container error easily, previously). I made my best 50% effort to make a reasonable error for when this happens (similar to when http/json received garbage errors), open to ideas on verbiage / policy there. should be pretty straightforward. one thing to notice is that http/json/default don't return our fancy new Fn-Http-Status or Fn-Http-H headers... it's relatively easy to go add this to fdk-go just to test this, but for invoke I'm really not sure we care (?) and for the gateway, the output will be identical with the old formats bypassing the header decap. if anybody has any feelings, feel free to express them. * fix oomer up for new error * Adding http header stripping to agent Adding the header stripping into the agent, this should be low enough that all routes to fns get treated the same.	2018-09-20 18:52:20 +01:00
Tolga Ceylan	a9bba2c3a8	fn: remove eviction timer to simplify eviction logic (#1223 ) We tie container pausing with evictions, where if a container is paused, then it is also eligible for eviction.	2018-09-18 15:20:39 -07:00
Reed Allman	3a82790d99	clean up hardcoded lsnr.sock refs, move iofs to /tmp (#1221 ) * clean up hardcoded lsnr.sock refs because what drivers.ContainerTask needs is another method, and we all know it atoning for my sins the first time around. and yes, i refuse to use a cross package exported constant (just think of the dep graphs) * fix tests	2018-09-18 08:12:44 -07:00
Tolga Ceylan	893ff1e6fc	fn: add missing dequeue in agent Submit (#1220 )	2018-09-17 17:58:12 -07:00
Richard Connon	493790dbd2	Add tmpfs IOFS (#1212 ) * Define an interface for IOFS handling. Add no-op and temporary directory implementations. * Move IOFS stuff out into separate file, add basic tmpfs implementation for linux only * Switch between directory and tmpfs based on platform and config * Respect FN_IOFS_OPTS * Make directory iofs default on all platforms * At least try to clean up a bit on failure * Add backout if IOFS creation fails * Add comment about iofs.Close	2018-09-17 11:50:43 -07:00
Tolga Ceylan	b0c93dbd82	fn: new agent resource tracker metrics (#1215 ) New metrics for agent resource tracker: CpuUsed, CpuAvail, MemUsed, MemAvail.	2018-09-17 10:31:17 -07:00
Tom Coupland	d56a49b321	Remove V1 endpoints and Routes (#1210 ) Largely a removal job, however many tests, particularly system level ones relied on Routes. These have been migrated to use Fns. * Add 410 response to swagger * No app names in log tags * Adding constraint in GetCall for FnID * Adding test to check FnID is required on call * Add fn_id to call selector * Fix text in docker mem warning * Correct buildConfig func name * Test fix up * Removing CPU setting from Agent test CPU setting has been deprecated, but the code base is still riddled with it. This just removes it from this layer. Really we need to remove it from Call. * Remove fn id check on calls * Reintroduce fn id required on call * Adding fnID to calls for execute test * Correct setting of app id in middleware * Removes root middlewares ability to redirect fun invocations * Add over sized test check * Removing call fn id check	2018-09-17 16:44:51 +01:00
Owen Cliffe	6567f6e8ef	support configuration-based relative dirs (host and agent) for iofs (#1213 ) * support configuration-based relative dirs (host and agent) for iofs mounts * Send UDS requests as POST to <UDS>/call	2018-09-17 11:59:16 +01:00
Tolga Ceylan	aa13a40168	fn: agent/lb/runner error handling adjustments (#1214 ) 1) Early call validation and return due to cpu/mem impossible to meet (eg. request cpu/mem larger than max-mem or max-cpu on server) now emits HTTP Bad Request (400) instead of 503. This case is most likely due to client/service configuration and/or validation issue. 2) 'failed' metric is now removed. 'failed' versus 'errors' were too confusing. 'errors' is now a catch all error case. 3) new 'canceled' counter for client side cancels. 4) 'server_busy' now covers more cases than it previously did.	2018-09-14 16:50:14 -07:00
Reed Allman	2b797a556a	update docs with pro tips for fdk http stream people (#1211 ) * update docs with pro tips for fdk http stream people * fix bug where container could die before uds wait we used to hang out for an hour. oopsie, thanks Owen	2018-09-14 16:54:18 +01:00
Reed Allman	3a9c48b8a3	http-stream format (#1202 ) * POC code for inotify UDS-io-socket * http-stream format introducing the `http-stream` format support in fn. there are many details for this, none of which can be linked from github :( -- docs are coming (I could even try to add some here?). this is kinda MVP-ish level, but does not implement the remaining spec, ie 'headers' fixing up / invoke fixing up. the thinking being we can land this to test fdks / cli with and start splitting work up on top of this. all other formats work the same as previous (no breakage, only new stuff) with the cli you can set `format: http-stream` and deploy, and then invoke a function via the `http-stream` format. this uses unix domain socket (uds) on the container instead of previous stdin/stdout, and fdks will have to support this in a new fashion (will see about getting docs on here). fdk-go works, which is here: https://github.com/fnproject/fdk-go/pull/30 . the output looks the same as an http format function when invoking a function. wahoo. there's some amount of stuff we can clean up here, enumerated: * the cleanup of the sock files is iffy, high pri here * permissions are a pain in the ass and i punted on dealing with them. you can run `sudo ./fnserver` if running locally, it may/may not work in dind(?) ootb * no pipe usage at all (yay), still could reduce buffer usage around the pipe behavior, we could clean this up potentially before removal (and tests) * my brain can’t figure out if dispatchOldFormats changes pipe behavior, but tests work * i marked XXX to do some clean up which will follow soon… need this to test fdk tho so meh, any thoughts on those marked would be appreciated however (1 less decision for me). mostly happy w/ general shape/plumbing tho * there are no tests atm, this is a tricky dance indeed. attempts were made. need to futz with the permission stuff before committing to adding any tests here, which I don't like either. also, need to get the fdk-go based test image updated according to the fdk-go, and there's a dance there too. rumba time.. * delaying the big big cleanup until we have good enough fdk support to kill all the other formats. open to ideas on how to maneuver landing stuff... * fix unmount * see if the tests work on ci... * add call id header * fix up makefile * add configurable iofs opts * add format file describing http-stream contract * rm some cruft * default iofs to /tmp, remove mounting out of the box fn we can't mount. /tmp will provide a memory backed fs for us on most systems, this will be fine for local developing and this can be configured to be wherever for anyone that wants to make things more difficult for themselves. also removes the mounting, this has to be done as root. we can't do this in the oss fn (short of requesting root, but no). in the future, we may want to have a knob here to have a function that can be configured in fn that allows further configuration here. since we don't know what we need in this dept really, not doing that yet (it may be the case that it could be done operationally outside of fn, eg, but not if each directory needs to be configured itself, which seems likely, anyway...) * add WIP note just in case...	2018-09-14 10:59:12 +01:00
Tolga Ceylan	4dcdb7d982	fn: paused and evicted container stats (#1209 ) * fn: paused and evicted container stats With this change, now stats reports paused state as well as incidents of container exit due to evictions. * fn: update/document state transitions in state tracker There's no case of a transition moving from done to waiting. This must be deprecated behavior.	2018-09-13 16:24:26 -07:00
Tolga Ceylan	586d5c4735	fn: make call.End() to blocking to reduce complexity (#1208 ) agent/lb-agent/runner roles execute call.End() in the background in some cases to reduce latency. With this change, we simplify this and switch to non-background execution of call.End(). This fixes hard to detect issues such as non-deterministic calculation of call.CompletedAt or incomplete Call.Stats in runners. Downstream projects if impacted by the now blocking call.End() latency should take steps to handle this according to their requirements.	2018-09-13 11:28:11 +01:00
Tolga Ceylan	6226af933a	fn: slot metrics/stats should be in stats/metrics removing logging (#1200 ) Slot stats are too noisy. These should be (or shortly will be) in metrics/stats/tracing.	2018-09-10 16:30:25 -07:00
Reed Allman	7638b31e11	use tini to run every container (#1195 ) fixes #1101 additional context: * this was introduced in docker 1.13 (1/2017), we require docker 17.10 (10/2017), this should not have any issues dependency-wise, as `docker-init` is in the docker install from that point in time. unless explicitly removed, it should be in the dind container we use as well... * the PR that introduced this to docker is https://github.com/moby/moby/pull/26061 for additional context * it may be wise to put this through some paces, if anybody has any... interesting... function containers. the tests seem to work fine, however, and this shouldn't be something users have to think about (?) at all, just something that we are doing. this isn't the default in docker for compatibility reasons, which is maybe a yellow flag but I am not sure tbh	2018-09-04 15:41:30 -07:00
Tolga Ceylan	ad011fde7f	fn: introducing docker-syslog driver as default logger (#1189 ) * fn: introducing docker-syslog driver as default logger With this change, fn-agent prefers RFC2454 docker-syslog driver for logging stdout/stderr from containers. The advantage of this is to offload it to docker itself instead of streaming stderr along with stdout, which gets multiplexed through single connection via docker-API. The change will need support from FDKs in order to log correct call-id and supress '\n' that splits syslog lines.	2018-08-29 13:08:02 -07:00
Peter Jausovec	35408ac949	Change the syslog format to use app_name instead of app_id (#1166 ) * Add AppName to the models.Call, so we can include it in the syslog * Replace the app_id with app_name	2018-08-09 12:06:19 -07:00
Reed Allman	409c104df3	make agent options/config pass lint checks (#1144 )	2018-07-30 16:04:27 -07:00
Tolga Ceylan	1258baeb7f	fn: agent eviction revisited (#1131 ) * fn: agent eviction revisited Previously, the hot-container eviction logic used number of waiters of cpu/mem resources to decide to evict a container. An ejection ticker used to wake up its associated container every 1 sec to reasses system load based on waiter count. However, this does not work for non-blocking agent since there are no waiters for non-blocking mode. Background on blocking versus non-blocking agent: ) Blocking agent holds a request until the the request is serviced or client times out. It assumes the request can be eventually serviced when idle containers eject themselves or busy containers finish their work. ) Non-blocking mode tries to limit this wait time. However non-blocking agent has never been truly non-blocking. This simply means that we only make a request wait if we take some action in the system. Non-blocking agents are configured with a much higher hot poll frequency to make the system more responsive as well as to handle cases where an too-busy event is missed by the request. This is because the communication between hot-launcher and waiting requests are not 1-1 and lossy if another request arrives for the same slot queue and receives a too-busy response before the original request. Introducing an evictor where each hot container can register itself, if it is idle for more than 1 seconds. Upon registry, these idle containers become eligible for eviction. In hot container launcher, in non-blocking mode, before we attempt to emit a too-busy response, now we attempt an evict. If this is successful, then we wait some more. This could result in requests waiting for more than they used to only if a container was evicted. For blocking-mode, the hot launcher uses hot-poll period to assess if a request has waited for too long, then eviction is triggered.	2018-07-19 15:04:15 -07:00
Tolga Ceylan	5dc5740a54	fn: runner status and docker load images (#1116 ) * fn: runner status and docker load images Introducing a function run for pure runner Status calls. Previously, Status gRPC calls returned active inflight request counts with the purpose of a simple health checker. However this is not sufficient since it does not show if agent or docker is healthy. With this change, if pure runner is configured with a status image, that image is executed through docker. The call uses zero memory/cpu/tmpsize settings to ensure resource tracker does not block it. However, operators might not always have a docker repository accessible/available for status image. Or operators might not want the status to go over the network. To allow such cases, and in general possibly caching docker images, added a new environment variable FN_DOCKER_LOAD_FILE. If this is set, fn-agent during startup will load these images that were previously saved with 'docker save' into docker.	2018-07-12 13:58:38 -07:00
Owen Cliffe	fff95e7992	Clean up/make consistent the APIs for registering core components, make Docker an optional component at compile time (#1111 )	2018-07-07 10:37:19 +01:00
Owen Cliffe	b8b544ed25	HTTP Triggers hookup (#1086 ) * Initial suypport for invoking tiggers * dupe method * tighten server constraints * runner tests not working yet * basic route tests passing * post rebase fixes * add hybrid support for trigger invoke and tests * consoloidate all hybrid evil into one place * cleanup and make triggers unique by source * fix oops with Agent * linting * review fixes	2018-07-05 12:56:07 -05:00
Reed Allman	51ff7caeb2	Bye bye openapi (#1081 ) * add DateTime sans mgo * change all uses of strfmt.DateTime to common.DateTime, remove test strfmt usage * remove api tests, system-test dep on api test multiple reasons to remove the api tests: * awkward dependency with fn_go meant generating bindings on a branched fn to vendor those to test new stuff. this is at a minimum not at all intuitive, worth it, nor a fun way to spend the finite amount of time we have to live. * api tests only tested a subset of functionality that the server/ api tests already test, and we risk having tests where one tests some thing and the other doesn't. let's not. we have too many test suites as it is, and these pretty much only test that we updated the fn_go bindings, which is actually a hassle as noted above and the cli will pretty quickly figure out anyway. * fn_go relies on openapi, which relies on mgo, which is deprecated and we'd like to remove as a dependency. openapi is a _huge_ dep built in a NIH fashion, that cannot simply remove the mgo dep as users may be using it. we've now stolen their date time and otherwise killed usage of it in fn core, for fn_go it still exists but that's less of a problem. * update deps removals: * easyjson * mgo * go-openapi * mapstructure * fn_go * purell * go-validator also, had to lock docker. we shouldn't use docker on master anyway, they strongly advise against that. had no luck with latest version rev, so i locked it to what we were using before. until next time. the rest is just playing dep roulette, those end up removing a ton tho * fix exec test to work * account for john le cache	2018-06-21 11:09:16 -07:00
Tolga Ceylan	881a0ba1db	fn: agent call overrider (#1080 ) Similar to LB Agent call overrider, this PR adds Agent overrider for Agents to modify/analyze a Call/Extensions during GetCall().	2018-06-20 16:21:09 -07:00
Tolga Ceylan	e67d0e5f3f	fn: Call extensions/overriding and more customization friendly docker driver (#1065 ) In pure-runner and LB agent, service providers might want to set specific driver options. For example, to add cpu-shares to functions, LB can add the information as extensions to the Call and pass this via gRPC to runners. Runners then pick these extensions from gRPC call and pass it to driver. Using a custom driver implementation, pure-runners can process these extensions to modify docker.CreateContainerOptions. To achieve this, LB agents can now be configured using a call overrider. Pure-runners can be configured using a custom docker driver. RunnerCall and Call interfaces both expose call extensions. An example to demonstrate this is implemented in test/fn-system-tests/system_test.go which registers a call overrider for LB agent as well as a simple custom docker driver. In this example, LB agent adds a key-value to extensions and runners add this key-value as an environment variable to the container.	2018-06-18 14:42:28 -07:00
Peter Jausovec	bd5150f1ac	Extract register view functionality (#1056 ) * WIP * Create separate Register*Views functions that are called from main.	2018-06-12 17:24:21 +01:00
Owen Cliffe	c6abc8bf64	Use context logging more to ensure context vars are present in log lines (#1039 )	2018-06-06 15:14:29 +01:00
Tolga Ceylan	a57907eed0	fn: user friendly timeout handling changes (#1021 ) * fn: user friendly timeout handling changes Timeout setting in routes now means "maximum amount of time a function can run in a container". Total wait time for a given http request is now expected to be handled by the client. As long as the client waits, the LB, runner or agents will search for resources to schedule it.	2018-06-01 13:18:13 -07:00
Tolga Ceylan	d190167580	fn: read-only root fs becomes default (#1019 ) * fn: read-only root fs becomes default Set root fs as read-only by default. * fn: update doc for FN_DISABLE_READONLY_ROOTFS	2018-05-30 18:17:28 -07:00
Tolga Ceylan	9584643142	fn: size restricted tmpfs /tmp and read-only / support (#1012 ) * fn: size restricted tmpfs /tmp and read-only / support ) read-only Root Fs Support ) removed CPUShares from docker API. This was unused. ) docker.Prepare() refactoring ) added docker.configureTmpFs() for size limited tmpfs on /tmp ) tmpfs size support in routes and resource tracker ) fix fn-test-utils to handle sparse files better in create file * test typo fix	2018-05-25 14:12:29 -07:00
Gerardo Viedma	ea1f94253f	Implement graceful shutdown of agent.DataAccess (#1008 ) * Implements graceful shutdown of agent.DataAccess and underlying Datastore/Logstore/MessageQueue * adds tests for closing agent.DataAccess and Datastore	2018-05-21 11:28:21 +01:00
Reed Allman	cbe0d5e9ac	add user syslog writers to app (#970 ) * add user syslog writers to app users may specify a syslog url[s] on apps now and all functions under that app will spew their logs out to it. the docs have more information around details there, please review those (swagger and operating/logging.md), tried to implement to spec in some parts and improve others, open to feedback on format though, lots of liberty there. design decision wise, I am looking to the future and ignoring cold containers. the overhead of the connections there will not be worth it, so this feature only works for hot functions, since we're killing cold anyway (even if a user can just straight up exit a hot container). syslog connections will be opened against a container when it starts up, and then the call id that is logged gets swapped out for each call that goes through the container, this cuts down on the cost of opening/closing connections significantly. there are buffers to accumulate logs until we get a `\n` to actually write a syslog line, and a buffer to save some bytes when we're writing the syslog formatting as well. underneath writers re-use the line writer in certain scenarios (swapper). we could likely improve the ease of setting this up, but opening the syslog conns against a container seems worth it, and is a different path than the other func loggers that we create when we make a call object. the Close() stuff is a little tricky, not sure how to make it easier and have the ^ benefits, open to idears. this does add another vector of 'limits' to consider for more strict service operators. one being how many syslog urls can a user add to an app (infinite, atm) and the other being on the order of number of containers per host we could run out of connections in certain scenarios. there may be some utility in having multiple syslog sinks to send to, it could help with debugging at times to send to another destination or if a user is a client w/ someone and both want the function logs, e.g. (have used this for that in the past, specifically). this also doesn't work behind a proxy, which is something i'm open to fixing, but afaict will require a 3rd party dependency (we can pretty much steal what docker does). this is mostly of utility for those of us that work behind a proxy all the time, not really for end users. there are some unit tests. integration tests for this don't sound very fun to maintain. I did test against papertrail with each protocol and it works (and even times out if you're behind a proxy!). closes #337 * add trace to syslog dial	2018-05-15 11:00:26 -07:00
Tolga Ceylan	508d9e18c7	fn: nonblocking resource manager tests (#987 )	2018-05-09 19:23:10 -07:00
Tolga Ceylan	0f50537150	fn: allow specified docker networks in functions (#982 ) * fn: allow specified docker networks in functions If FN_DOCKER_NETWORK is specified with a list of networks, then agent driver picks the least used network to place functions on. * add mutex comment	2018-05-09 12:24:15 -07:00
jan grant	91e58afa55	The opencensus API changes between 0.6.0 and 0.9.0 (#980 ) We get some useful features in later versions; update so as to not pin downstream consumers (extensions) to an older version.	2018-05-09 14:55:00 +01:00

1 2 3

139 Commits