fn-serverless

mirror of https://github.com/fnproject/fn.git synced 2022-10-28 21:29:17 +03:00

Author	SHA1	Message	Date
Srinidhi Chokkadi Puranik	bb84ed35de	Revert "safe responsewriter usage in TryExec (#1490 )" (#1522 ) This reverts commit `1fb78ed836`.	2019-07-04 09:51:06 +01:00
Richard Connon	8b32ba2697	Fix typos in Makefile from go mod migration (#1508 ) When we migrated from dep to mod we updated the .PHONY targets but not the actual target names in the Makefile. Fix this.	2019-06-10 08:16:53 -07:00
Reed Allman	e8931d28c8	remove unused extensions cruft (#1481 ) * changes v1 to v2 for adding an endpoint * removed the handler funcs for adding handlers onto eg /apps/:app_id/x, we don't have them for funcs or triggers, and they honestly seem useless as it's easy to build it with the ability to add a handler and to access the fnext datastore which we offer, as they were they are really expensive since they yank the app out of the db even if the operation may not even need it in that handler. so instead of adding for the rest, remove all of these. none of our example extensions, which aren't working at the moment, use these either (that's why I'm here anyway). * removes dead code helpers and references to app name url param which is no longer a thing. these were just hanging around bugging me when I ran into them, so killing them..	2019-06-04 10:38:46 +03:00
Reed Allman	1fb78ed836	safe responsewriter usage in TryExec (#1490 ) inside of TryExec we were writing directly to the response writer inside of a goroutine, but TryExec can timeout and then get called again to a different runner or even have the front end writing headers while TryExec is writing headers. one way to make this safe is to make a new response writer for TryExec to write the response into, and only after the goroutine handling the response has returned, from the TryExec goroutine we can copy the response back up as the caller will not call TryExec again until it has returned (this is seemingly part of the placer contract). unfortunately, we're already buffering the response writer in the front end, too - it's possible we can get rid of that but it may need further testing. this adds an optimization when copying the request body from the LB to a runner, since we're using request.GetBody() and returning a reader we are familiar with that happens to just wrap a buffer's bytes (which we just need multiple readers on, but the data doesn't change). anyway, this whole interaction is unfortunate but kind of necessary due to needing to maneuver into a protobuf, it seems like a worth it and somewhat ok abstraction wise optimization. additionally, this gets rid of passing the client response headers down into the agent for detached functions. we don't need these since detached functions are not responding with the functions response to the client, only a 202, this was leading to races around writing the headers in retries too, but this is just for posterity/correctness now. updated the makefile/system test script so that I could run these faster to repro, pretty handy, should add to other stuff too... closes #1484	2019-05-01 17:56:13 -07:00
Reed Allman	a0f92abdcf	remove logs/calls apis, async, and most of hybrid (#1458 ) * start from the top remove runner configuration mode * remove async, logs, calls... hybrid still has one use * add note * fix tests * remove all async verbiage / cruft * fix test * remove logs and calls from swagger * fix rebase * fix system tests * remove calls/logs from sql db adds migration and removes datastore methods * go mod tidy && go mod vendor * remove unused env vars * remove stale server options	2019-04-08 15:11:22 -07:00
Tomas Knappek	27c1814cee	Prevent in-built docker VOLUME commands (#1378 )	2019-02-05 12:01:49 -08:00
Reed Allman	d85fadb142	add gosec scanning to ci (#1349 ) gosec severity=medium passes, all severity=low errors are from unhandled errors, we have 107 of them. tbh it doesn't look worth it to me, but maybe there are a few assholes even itchier than mine out there. medium has some good stuff in it, and of course high makes sense if we're gonna do this at all. this adds some nosec annotations for some things like sql sprintfs where we know it's clean (we're constructing the strings with variables in them). fixed up other spots where we were sprinting without need. some stuff like filepath.Clean when opening a file from a variable, and file permissions, easy stuff... I can't get the CI build to shut up, but I can locally get it to be pretty quiet about imports and it just outputs the gosec output. fortunately, it still works as expected even when it's noisy. I got it to shut up by unsetting some of the go mod flags locally, but that doesn't seem to quite do it in circle, printed the env out and don't see them, so idk... i give up, this works closes #1303	2018-12-13 17:57:25 -08:00
Eric Fode	8de5aef09d	go modifyed (#1284 ) * go modified fiddling with vendor got rid of the vendor directory revendored but with the exact same versions of things maybe better added mods for the images revendored using `GOFLAGS` instead of repeating my self vendor everything to the exact same commit hash as before and fixed ugorji Delete Deproxy.toml empty file cleaned up some file cleaned up some cruft get rid of some unused packages and exclude some Microsoft packages added flags to the variables that get pushed into docker in the makefile It works I suppose added noop excluded what we did not want even less hacky reverted to a version that has not been mangled * get rid of my experiment	2018-11-07 11:10:22 -08:00
Reed Allman	e13a6fd029	death to format (#1281 ) * get rid of old format stuff, utils usage, fix up for fdk2.0 interface * pure agent format removal, TODO remove format field, fix up all tests * shitter's clogged * fix agent tests * start rolling through server tests * tests compile, some failures * remove json / content type detection on invoke/httptrigger, fix up tests * remove hello, fixup system tests the fucking status checker test just hangs and it's testing that it doesn't work so the test passes but the test doesn't pass fuck life it's not worth it * fix migration * meh * make dbhelper shut up about dbhelpers not being used * move fail status at least into main thread, jfc * fix status call to have FN_LISTENER also turns off the stdout/stderr blocking between calls, because it's impossible to debug without that (without syslog), now that stdout and stderr go to the same place (either to host stderr or nowhere) and isn't used for function output this shouldn't be a big fuss really * remove stdin * cleanup/remind: fixed bug where watcher would leak if container dies first * silence system-test logs until fail, fix datastore tests postgres does weird things with constraints when renaming tables, took the easy way out system-tests were loud as fuck and made you download a circleci text file of the logs, made them only yell when they goof * fix fdk-go dep for test image. fun * fix swagger and remove test about format * update all the gopkg files * add back FN_FORMAT for fdks that assert things. pfft * add useful error for functions that exit this error is really confounding because containers can exit for all manner of reason, we're just guessing that this is the most likely cause for now, and this error message should very likely change or be removed from the client path anyway (context.Canceled wasn't all that useful either, but anyway, I'd been hunting for this... so found it). added a test to avoid being publicly shamed for 1 line commits (beware...).	2018-10-26 10:43:04 -07:00
Reed Allman	01b8e8679d	HTTP trigger http-stream tests (#1241 )	2018-09-26 13:25:48 +01:00
Reed Allman	3a9c48b8a3	http-stream format (#1202 ) * POC code for inotify UDS-io-socket * http-stream format introducing the `http-stream` format support in fn. there are many details for this, none of which can be linked from github :( -- docs are coming (I could even try to add some here?). this is kinda MVP-ish level, but does not implement the remaining spec, ie 'headers' fixing up / invoke fixing up. the thinking being we can land this to test fdks / cli with and start splitting work up on top of this. all other formats work the same as previous (no breakage, only new stuff) with the cli you can set `format: http-stream` and deploy, and then invoke a function via the `http-stream` format. this uses unix domain socket (uds) on the container instead of previous stdin/stdout, and fdks will have to support this in a new fashion (will see about getting docs on here). fdk-go works, which is here: https://github.com/fnproject/fdk-go/pull/30 . the output looks the same as an http format function when invoking a function. wahoo. there's some amount of stuff we can clean up here, enumerated: * the cleanup of the sock files is iffy, high pri here * permissions are a pain in the ass and i punted on dealing with them. you can run `sudo ./fnserver` if running locally, it may/may not work in dind(?) ootb * no pipe usage at all (yay), still could reduce buffer usage around the pipe behavior, we could clean this up potentially before removal (and tests) * my brain can’t figure out if dispatchOldFormats changes pipe behavior, but tests work * i marked XXX to do some clean up which will follow soon… need this to test fdk tho so meh, any thoughts on those marked would be appreciated however (1 less decision for me). mostly happy w/ general shape/plumbing tho * there are no tests atm, this is a tricky dance indeed. attempts were made. need to futz with the permission stuff before committing to adding any tests here, which I don't like either. also, need to get the fdk-go based test image updated according to the fdk-go, and there's a dance there too. rumba time.. * delaying the big big cleanup until we have good enough fdk support to kill all the other formats. open to ideas on how to maneuver landing stuff... * fix unmount * see if the tests work on ci... * add call id header * fix up makefile * add configurable iofs opts * add format file describing http-stream contract * rm some cruft * default iofs to /tmp, remove mounting out of the box fn we can't mount. /tmp will provide a memory backed fs for us on most systems, this will be fine for local developing and this can be configured to be wherever for anyone that wants to make things more difficult for themselves. also removes the mounting, this has to be done as root. we can't do this in the oss fn (short of requesting root, but no). in the future, we may want to have a knob here to have a function that can be configured in fn that allows further configuration here. since we don't know what we need in this dept really, not doing that yet (it may be the case that it could be done operationally outside of fn, eg, but not if each directory needs to be configured itself, which seems likely, anyway...) * add WIP note just in case...	2018-09-14 10:59:12 +01:00
Tolga Ceylan	5dc5740a54	fn: runner status and docker load images (#1116 ) * fn: runner status and docker load images Introducing a function run for pure runner Status calls. Previously, Status gRPC calls returned active inflight request counts with the purpose of a simple health checker. However this is not sufficient since it does not show if agent or docker is healthy. With this change, if pure runner is configured with a status image, that image is executed through docker. The call uses zero memory/cpu/tmpsize settings to ensure resource tracker does not block it. However, operators might not always have a docker repository accessible/available for status image. Or operators might not want the status to go over the network. To allow such cases, and in general possibly caching docker images, added a new environment variable FN_DOCKER_LOAD_FILE. If this is set, fn-agent during startup will load these images that were previously saved with 'docker save' into docker.	2018-07-12 13:58:38 -07:00
Tolga Ceylan	f8d737dd46	fn: api-tests are decommissioned: cleanup Makefile (#1082 ) * fn: api-tests are decommissioned: cleanup Makefile * fn: increase mem in system-tests due to fn-test-utils image	2018-06-21 12:33:20 -07:00
Tolga Ceylan	c73d3f362e	fn: remove confusing parallelism in test scripts (#1079 ) * fn: remove confusing parallelism in test scripts ) Tests should be consistent when run from makefile versus running these test scripts from command line. Let go use GOMAXPROCS instead of hardcoded 4 cpus in Makefile. ) Moved docker pull for specific image versions into helpers scripts as well. Easier to maintain image version for tests in the same place. ) Minor Makefile cleanup: removed unused makefile targets. fn: git-diff rename limit increase	2018-06-20 13:49:31 -07:00
Tolga Ceylan	bd7f67a74a	fn: test scripts should use well defined ports (#1077 ) * fn: test scripts should use well defined ports Moved allocation of listener ports for mysql/minio/postgres to helper script with a list of service list names. * fn: makefile docker pull mysql version must match tests	2018-06-20 10:55:05 -07:00
Reed Allman	00c29b8bf3	datastore no longer implements logstore (#1013 ) * datastore no longer implements logstore the underlying implementation of our sql store implements both the datastore and the logstore interface, however going forward we are likely to encounter datastore implementers that would mock out the logstore interface and not use its methods - signalling a poor interface. this remedies that, now they are 2 completely separate things, which our sqlstore happens to implement both of. related to some recent changes around wrapping, this keeps the imposed metrics and validation wrapping of a servers logstore and datastore, just moving it into New instead of in the opts - this is so that a user can have the underlying datastore in order to set the logstore to it, since wrapping it in a validator/metrics would render it no longer a logstore implementer (i.e. validate datastore doesn't implement the logstore interface), we need to do this after setting the logstore to the datastore if one wasn't provided explicitly. * splits logstore and datastore metrics & validation logic * `make test` should be `make full-test` always. got rid of that so that nobody else has to wait for CI to blow up on them after the tests pass locally ever again. * fix new tests	2018-06-04 00:08:16 -07:00
Travis Reeder	999820d15b	Moves main into cmd dir. (#977 )	2018-05-09 10:52:52 +03:00
Justin Ko	2e0a22a3e0	Make sure to rebuild protobuf files during build (#908 ) I noticed that the Makefile rule to build protobuf files was listing a non-existent file and that the protobuf files were not getting rebuilt during builds.	2018-03-30 12:56:28 +01:00
Tolga Ceylan	0addcb8911	fn: pre-fork pool for namespace/network speedup (#874 ) * fn: pre-fork pool experimental implementation	2018-03-23 16:35:35 -07:00
Tolga Ceylan	b74db6762b	fn: remove pre go 1.10 optimization of build install (#859 ) This speeded up tests before go 1.10 improvements, but also clashed with fn cli tool, which gets installed as 'fn'.	2018-03-14 14:14:31 -07:00
Gerardo Viedma	8af57da7b2	Support load-balanced runner groups for multitenant compute isolation (#814 ) * Initial stab at the protocol * initial protocol sketch for node pool manager * Added http header frame as a message * Force the use of WithAgent variants when creating a server * adds grpc models for node pool manager plus go deps * Naming things is really hard * Merge (and optionally purge) details received by the NPM * WIP: starting to add the runner-side functionality of the new data plane * WIP: Basic startup of grpc server for pure runner. Needs proper certs. * Go fmt * Initial agent for LB nodes. * Agent implementation for LB nodes. * Pass keys and certs to LB node agent. * Remove accidentally left reference to env var. * Add env variables for certificate files * stub out the capacity and group membership server channels * implement server-side runner manager service * removes unused variable * fixes build error * splits up GetCall and GetLBGroupId * Change LB node agent to use TLS connection. * Encode call model as JSON to send to runner node. * Use hybrid client in LB node agent. This should provide access to get app and route information for the call from an API node. * More error handling on the pure runner side * Tentative fix for GetCall problem: set deadlines correctly when reserving slot * Connect loop for LB agent to runner nodes. * Extract runner connection function in LB agent. * drops committed capacity counts * Bugfix - end state tracker only in submit * Do logs properly * adds first pass of tracking capacity metrics in agent * maked memory capacity metric uint64 * maked memory capacity metric uint64 * removes use of old capacity field * adds remove capacity call * merges overwritten reconnect logic * First pass of a NPM Provide a service that talks to a (simulated) CP. - Receive incoming capacity assertions from LBs for LBGs - expire LB requests after a short period - ask the CP to add runners to a LBG - note runner set changes and readvertise - scale down by marking runners as "draining" - shut off draining runners after some cool-down period * add capacity update on schedule * Send periodic capcacity metrics Sending capcacity metrics to node pool manager * splits grpc and api interfaces for capacity manager * failure to advertise capacity shouldn't panic * Add some instructions for starting DP/CP parts. * Create the poolmanager server with TLS * Use logrus * Get npm compiling with cert fixups. * Fix: pure runner should not start async processing * brings runner, nulb and npm together * Add field to acknowledgment to record slot allocation latency; fix a bug too * iterating on pool manager locking issue * raises timeout of placement retry loop * Fix up NPM Improve logging Ensure that channels etc. are actually initialised in the structure creation! * Update the docs - runners GRPC port is 9120 * Bugfix: return runner pool accurately. * Double locking * Note purges as LBs stop talking to us * Get the purging of old LBs working. * Tweak: on restart, load runner set before making scaling decisions. * more agent synchronization improvements * Deal with teh CP pulling out active hosts from under us. * lock at lbgroup level * Send request and receive response from runner. * Add capacity check right before slot reservation * Pass the full Call into the receive loop. * Wait for the data from the runner before finishing * force runner list refresh every time * Don't init db and mq for pure runners * adds shutdown of npm * fixes broken log line * Extract an interface for the Predictor used by the NPM * purge drained connections from npm * Refactor of the LB agent into the agent package * removes capacitytest wip * Fix undefined err issue * updating README for poolmanager set up * ues retrying dial for lb to npm connections * Rename lb_calls to lb_agent now that all functionality is there * Use the right deadline and errors in LBAgent * Make stream error flag per-call rather than global otherwise the whole runner is damaged by one call dropping * abstracting gRPCNodePool * Make stream error flag per-call rather than global otherwise the whole runner is damaged by one call dropping * Add some init checks for LB and pure runner nodes * adding some useful debug * Fix default db and mq for lb node * removes unreachable code, fixes typo * Use datastore as logstore in API nodes. This fixes a bug caused by trying to insert logs into a nil logstore. It was nil because it wasn't being set for API nodes. * creates placement abstraction and moves capacity APIs to NodePool * removed TODO, added logging * Dial reconnections for LB <-> runners LB grpc connections to runners are established using a backoff stategy in event of reconnections, this allows to let the LB up even in case one of the runners go away and reconnect to it as soon as it is back. * Add a status call to the Runner protocol Stub at the moment. To be used for things like draindown, health checks. * Remove comment. * makes assign/release capacity lockless * Fix hanging issue in lb agent when connections drop * Add the CH hash from fnlb Select this with FN_PLACER=ch when launching the LB. * small improvement for locking on reloadLBGmembership * Stabilise the list of Runenrs returned by NodePool The NodePoolManager makes some attempt to keep the list of runner nodes advertised as stable as possible. Let's preserve this effort in the client side. The main point of this is to attempt to keep the same runner at the same inxed in the []Runner returned by NodePool.Runners(lbgid); the ch algorithm likes it when this is the case. * Factor out a generator function for the Runners so that mocks can be injected * temporarily allow lbgroup to be specified in HTTP header, while we sort out changes to the model * fixes bug with nil runners * Initial work for mocking things in tests * fix for anonymouse go routine error * fixing lb_test to compile * Refactor: internal objects for gRPCNodePool are now injectable, with defaults for the real world case * Make GRPC port configurable, fix weird handling of web port too * unit test reload Members * check on runner creation failure * adding nullRunner in case of failure during runner creation * Refactored capacity advertisements/aggregations. Made grpc advertisement post asynchronous and non-blocking. * make capacityEntry private * Change the runner gRPC bind address. This uses the existing `whoAmI` function, so that the gRPC server works when the runner is running on a different host. * Add support for multiple fixed runners to pool mgr * Added harness for dataplane system tests, minor refactors * Add Dockerfiles for components, along with docs. * Doc fix: second runner needs a different name. * Let us have three runners in system tests, why not * The first system test running a function in API/LB/PureRunner mode * Add unit test for Advertiser logic * Fix issue with Pure Runner not sending the last data frame * use config in models.Call as a temporary mechanism to override lb group ID * make gofmt happy * Updates documentation for how to configure lb groups for an app/route * small refactor unit test * Factor NodePool into its own package * Lots of fixes to Pure Runner - concurrency woes with errors and cancellations * New dataplane with static runnerpool (#813) Added static node pool as default implementation * moved nullRunner to grpc package * remove duplication in README * fix go vet issues * Fix server initialisation in api tests * Tiny logging changes in pool manager. Using `WithError` instead of `Errorf` when appropriate. * Change some log levels in the pure runner * fixing readme * moves multitenant compute documentation * adds introduction to multitenant readme * Proper triggering of system tests in makefile * Fix insructions about starting up the components * Change db file for system tests to avoid contention in parallel tests * fixes revisions from merge * Fix merge issue with handling of reserved slot * renaming nulb to lb in the doc and images folder * better TryExec sleep logic clean shutdown In this change we implement a better way to deal with the sleep inside the for loop during the attempt for placing a call. Plus we added a clean way to shutdown the connections with external component when we shut down the server. * System_test mysql port set mysql port for system test to a different value to the one set for the api tests to avoid conflicts as they can run in parallel. * change the container name for system-test * removes flaky test TestRouteRunnerExecution pending resolution by issue #796 * amend remove_containers to remove new added containers * Rework capacity reservation logic at a higher level for now * LB agent implements Submit rather than delegating. * Fix go vet linting errors * Changed a couple of error levels * Fix formatting * removes commmented out test * adds snappy to vendor directory * updates Gopkg and vendor directories, removing snappy and addhing siphash * wait for db containers to come up before starting the tests * make system tests start API node on 8085 to avoid port conflict with api_tests * avoid port conflicts with api_test.sh which are run in parallel * fixes postgres port conflict and issue with removal of old containers * Remove spurious println	2018-03-08 14:45:19 -08:00
Tolga Ceylan	0bdd0b45a7	fn: remove fnlb from Makefile image list (#774 )	2018-02-14 20:38:06 -08:00
Tolga Ceylan	fdf5a67f6f	fn: error image is now deprecated (#737 ) Please use fn-test-utils instead for testing.	2018-02-05 11:12:27 -08:00
Tolga Ceylan	6b5486c699	fn: sleeper image is now deprecated (#736 ) Please use fn-test-utils instead for testing.	2018-02-05 10:01:11 -08:00
jan grant	025e598c4b	Selective releasing (#708 ) * Rejig the build process During a build, we check and rebuild any dependencies prior to potentially using them. Build: - DIND (this only produces a new docker image, no local code changes) - fnserver (built as part of the testing) On master, if everything works, then we release the built artifacts, if necessary: - DIND (this pushes a docker image and a tag) - fnserver (this builds the docker image and releases it, if necessary). Fnserver is dealt with last by the release script: all previous steps in CI use locally-run go tests rather than a docker file. When a commit happens, we need to know (a) if we need to rebuild a set of tools and artifacts (or whether we can continue to use published ones); and (b) if we need to release new versions of those tools, if all tests pass. We do this by identifying the previous release tag on origin/master (which is the release branch), then checking for changes between that point at the current one. Those changes may appear in various places in the tree: some simple boolean rules work out whether the change means we need to rebuild and rerelease. * Make the fnproject/fnserver build use the latest dind As docker bumps from 17.12.x, use whatever dind we just built. * Use bash	2018-02-01 12:43:43 +00:00
jan grant	d85e6bd61b	Ensure we have the latest fnproject/dind (#687 ) If we need to reissue fnproject/dind:17.12 (which fnproject/fnserver is based upon) then let's make sure we're using the latest one when cutting a release. To ensure we don't accidentally use stale images lying around in the docker cache (there probably shouldn't be any), call make clear-images before running the build.	2018-01-19 10:02:25 -08:00
Travis Reeder	3b9818bc58	Switch to dep from glide (#664 )	2018-01-09 14:11:08 -08:00
Denis Makogon	4bb0744853	Use alpine images to make tests take less time (#629 ) * Use retry func while trying to ping SQL datastore - implements retry func specifically for SQL datastore ping - fmt fixes - using sqlx.Db.PingContext instead of sqlx.Db.Ping - propogate context to SQL datastore * Use alpine images to make tests take less time * use PG alpine * use Minio alpine * no official alpine distro for MySQL, uhhh :( * install swagger tool instead of docker image * use retry func to confirm that datastore is okay before running tests * Store swagger tool at Fn during CI time somehow it's a problem to put binary to ${GOPATH}/bin * Adjust swagger tool reference path * Revert minio image * Use amd64/alpine-based swagger tool image for API spec validation * Cleanup	2018-01-02 14:56:38 -06:00
Denis Makogon	9d6f0b2a05	Speed up API tests (#624 ) * Adjust API tests internal API * Refactor API tests to take less time - sqlite: tests 15s, overall time: 1m - mysql: tests 15s, overall time: 59s * Use retry func to survive in faulty places * Use retry func while trying to ping SQL datastore - implements retry func specifically for SQL datastore ping - fmt fixes - using sqlx.Db.PingContext instead of sqlx.Db.Ping - propogate context to SQL datastore * Simplify TestCanCauseTimeout retry loop * Call retry with sane timeout * Fix TestOversizedLog, use retry func * Increase number of attempts 2 test cases are really faulty in CI, so they need a lot more time to finish. * Increase TestCanCauseTimeout timeout * Use retry at TestMultiLog to speed it up * Use retry at TestCanWriteLogs to speed it up * Use retry at TestGetCallsSuccess to speed it up * Use retry at TestCanGetAsyncState to speed it up * Use retry at TestListCallsSuccess to speed it up * Remove sleep calls * Remove dup test case * Cleaup Calls API test * Build API tests binary once This patch lets CI to build API tests binary once and reuse that whenever it needs it * Swap API tests checks * Build API test binary by default dirty fix for CircleCI * Use retry func to determine if datastore is alive in tests * go install should also reduce build time * Fix rebase issues	2018-01-02 13:29:49 -06:00
Tolga Ceylan	d329e0ef5b	fn: circleci and makefile adjustments (#625 ) * fn: circleci and makefile adjustments ) Moved more tasks into Makefile to allow for parallelism and dependency checks. ) Added cpu count in circleci make invocations for parallelism * fn: typo sqlite => sqlite3 * fn: removed unnecessary make pull & install	2017-12-23 10:12:18 -06:00
Denis Makogon	5c68a88599	Fn-prefix everything (#545 ) * Fn-prefix everything Closes: #492 * Global replacement * missed one fn_	2017-11-29 17:50:24 -08:00
Travis Reeder	ab18e467fa	updates functions -> fnserver (#516 ) * updates functions -> fn-server and fnlb -> fn-lb * changed to fnserver and fnlb	2017-11-17 15:53:44 -08:00
Travis Reeder	1ba8620035	FIx release issue	2017-11-17 11:44:52 -08:00
Travis Reeder	96cfc9f5c1	Update json (#463 ) * wip * wip * Added more fields to JSON and added blank line between objects. * Update tests. * wip * Updated to represent recent discussions. * Fixed up the json test * More docs * Changed from blank line to bracket, newline, open bracket. * Blank line added back, easier for delimiting.	2017-11-16 09:59:13 -08:00
Reed Allman	8a59654582	go vet yourself (#397 ) go vet caught some nifty bugs. so fixed those here, and also made it so that we vet everything from now on since the robots seem to do a better job of vetting than we have managed to. also adds gofmt check to circle. could move this to the test.sh script (didn't want a script calling a script, because $reasons) and it's nice and isolated in its own little land as it is. side note, changed the script so it runs in 100ms instead of 3s, i think find is a lot faster than go list. attempted some minor cleanup of various scripts	2017-10-06 08:42:33 -07:00
James Jeffrey	c7f3066c75	Update references remove refs to treeder oracle funcy (#376 ) * Remove lots of refs to iron and funcy oracle etc.. * more ref replacements * Replacing more refs. Treeder * Use Fn not FN	2017-09-29 16:22:15 -07:00
Reed Allman	caba9e0ec6	more strict configuration of routes * idle_timeout max of 1h * timeout max of 120s for sync, 1h for async * max memory of 8GB * do full route validation before call invocation * ensure that idle_timeout >= timeout we are now doing validation of updating route inside of the database transaction, which is what we should have been doing all along really. we need this behavior to ensure that the idle timeout is longer than the timeout, among other benefits (like not updating the most recent version of the existing struct and overwriting previous updates, yay). since we have this, we can get rid of the weird skipZero behavior on validate too and validate the real deal holyfield. validating the route before making the call is handy so that we don't do weird things like run a func that wants to use 300GB of RAM and run for 3 weeks. closes #192 closes #344 closes #162	2017-09-21 04:04:34 -07:00
Reed Allman	71a88a991c	hang the runner, agent=new sheriff (#270 ) * fix docker build this is trivially incorrect since glide doesn't actually provide reproducible builds. the idea is to build with the deps that we have checked into git, so that we actually know what code is executing so that we might debug it... all for multi stage build instead of what we had, but adding the glide step is wrong. i added a loud warning so as to discourage this behavior in the future. * hang the runner, agent=new sheriff tl;dr agent is now runner, with a hopefully saner api the general idea is get rid of all the various 'task' structs now, change our terminology to only be 'calls' now, push a lot of the http construction of a call into the agent, allow calls to mutate their state around their execution easily and to simplify the number of code paths, channels and context timeouts in something [hopefully] easy to understand. this introduces the idea of 'slots' which are either hot or cold and are separate from reserving memory (memory is denominated in 'tokens' now). a 'slot' is essentially a container that is ready for execution of a call, be it hot or cold (it just means different things based on hotness). taking a look into Submit should make these relatively easy to grok. sorry, things were pretty broken especially wrt timings. I tried to keep good notes (maybe too good), to highlight stuff so that we don't make the same mistakes again (history repeating itself blah blah quote). even now, there is lots of work to do :) I encourage just reading the agent.go code, Submit is really simple and there's a description of how the whole thing works at the head of the file (after TODOs). call.go contains code for constructing calls, as well as Start / End (small atm). I did some amount of code massaging to try to make things simple / straightforward / fit reasonable mental model, but as always am open to critique (the more negative the better) as I'm just one guy and wth do i know... ----------------------------------------------------------------------------- below enumerates a number of changes as briefly as possible (heh..): models.Call all the things removes models.Task as models.Call is now what it previously was. models.FnCall is now rid of in favor of models.Call, despite the datastore only storing a few fields of it [for now]. we should probably store entire calls in the db, since app & route configurations can change at any given moment, it would be nice to see the parameters of each call (costs db space, obviously). this removes the endpoints for getting & deleting messages, we were just looping back to localhost to call the MQ (wtf? this was for iron integration i think) and just calls the MQ. changes the name of the FnLog to LogStore, confusing cause there's also a `FuncLogger` which uses the Logstore (punting). removes other `Fn` prefixed structs (redundant naming convention). removes some unused and/or weird structs (IDStatus, CompleteTime) updates the swagger makes the db methods consistent to use 'Call' nomenclature. remove runner nuisances: * push down registry stuff to docker driver * remove Environment / Stats stuff of yore * remove unused writers (now in FuncLogger) * remove 2 of the task types, old hot stuff, runner, etc fixes ram available calculation on startup to not always be 300GB (helps a lot on a laptop!) format for DOCKER_AUTH env now is not a list but a map (there are no docs, would prefer to get rid of this altogether anyway). the ~/.docker/cfg expected format is unchanged. removes arbitrary task queue, if a machine is out of ram we can probably just time out without queueing... (can open separate discussion) in any case the old one didn't really account well for hot tasks, it just lined everyone up in the task queue if there wasn't a place to run hot and then timed them out [even if a slot became free]. removes HEADER_ prefixing on any headers in the request to a invoke a call. (this was inconsistent with cli for test anyway) removes TASK_ID header sent in to hot only (this is a dupe of FN_CALL_ID, which has not been removed) now user functions can reply directly to the client. this means that for cold containers if they write to stdout it will send a 200 + headers. for hot containers, the user can reply directly to the client from the container, i.e. with its preferred status code / headers (vs. always getting a 200). the dispatch itself is a little http specific atm, i think we can add an interchange format but the current version is easily extended to add json for now, separate discussion. this eliminates a lot of the request/response rewriting and buffering we were doing (yey). now Dispatch ONLY does input and output, vs. managing the call timeout and having access to a call's fields. cache is pushed down into agent now instead of in the front end, would like to push it down to the datastore actually but it's here for now anyway. cache delete functions removed (b/c fn is distributed anyway?). added app caching, should help with latency. in general, a lot of server/runner.go got pushed down into the agent. i think it will be useful in testing to be able to construct calls without having to invoke http handlers + async also needs to construct calls without a handler. safe shutdown actually works now for everything (leaked / didn't wait on certain things before) now we're waiting for hot slots to open up while we're attempting to get ram to launch a container if we didn't find any hot slots to run the call in immediately. we can change this policy really easily now (no more channel jungle; still some channels). also looking for somewhere else to go while the container is launching now. slots now get sent _out_ of a container, vs. a container receiving calls, which makes this kind of policy easier to implement. this fixes a number of bugs around things like trying to execute calls against containers that have not and may never start and trying to launch a bazillion containers when there are no free containers. the driver api underwent some changes to make this possible (relatively minimal, added Wait). the easiest way to think about this is that allocating ram has moved 'up' instead of just wrapping launching containers, so that we can select on a channel trying to find ram. not dispatching hot calls to containers that died anymore either... the timeout is now started at the beginning of Submit, rather than Dispatch or the container itself having to manage the call timeout, which was an inaccurate way of doing things since finding a slot / allocating ram / pulling image can all take a non-trivial (timeout amount, even!) amount of time. this makes for much more reasonable response times from fn under load, there's still a little TODO about handling cold+timeout container removal response times but it's much improved. if call.Start is called with < call.timeout/2 time left, then the call will not be executed and return a timeout. we can discuss. this makes async play _a lot_ nicer, specifically. for large timeouts / 2 makes less sense. env is no longer getting upper cased (admittedly, this can look a little weird now). our whole route.Config/app.Config/env/headers stuff probably deserves a whole discussion... sync output no longer has the call id in json if there's an error / timeout. we could add this back to signify that it's _us_ writing these but this was out of place. FN_CALL_ID is still shipped out to get the id for sync calls, and async [server] output remains unchanged. async logs are now an entire raw http request (so that a user can write a 400 or something from their hot async container) async hot now 'just works' cold sync calls can now reply to the client before container removal, which shaves a lot of latency off of those (still eat start). still need to figure out async removal if timeout or something. ----------------------------------------------------------------------------- i've located a number of bugs that were generally inherited, and also added a number of TODOs in the head of the agent.go file according to robustness we probably need to add. this is at least at parity with the previous implementation, to my knowledge (hopefully/likely a good bit ahead). I can memorialize these to github quickly enough, not that anybody searches before adding bugs anyway (sigh). the big thing to work on next imo is async being a lot more robust, specifically to survive fn server failures / network issues. thanks for review (gulp)	2017-09-05 20:32:51 +03:00
Travis Reeder	f559acd7ed	Renamed a bunch of images to use fnproject org. (#239 ) * Renamed a bunch of images to use fnproject org. * Multi-stage build for Docker. * Added tmp vendor dirs to gitignore. * Run docker-build at beginning of test.	2017-08-23 22:43:53 +03:00
Denis Makogon	bb8f12ece9	Fixing tests and CI file	2017-07-31 21:14:11 +03:00
Travis Reeder	48e3781d5e	Rename to GitHub (#3 ) * circle * Rename to github and fn->cli * Rename to github and fn->cli	2017-07-26 10:50:19 -07:00
Denis Makogon	5b41fe2dc7	Improving API tests	2017-07-25 10:29:20 -07:00
Travis Reeder	053c7cb0e6	Added gomega and updated deps.	2017-07-17 13:01:00 -07:00
Reed Allman	4e52c595d2	merge datastores into sqlx package replace default bolt option with sqlite3 option. the story here is that we just need a working out of the box solution, and sqlite3 is just fine for that (actually, likely better than bolt). with sqlite3 supplanting bolt, we mostly have sql databases. so remove redis and then we just have one package that has a `sql` implementation of the `models.Datastore` and lean on sqlx to do query rewriting. this does mean queries have to be formed a certain way and likely have to be ANSI-SQL (no special features) but we weren't using them anyway and our base api is basically done and we can easily extend this api as needed to only implement certain methods in certain backends if we need to get cute. * remove bolt & redis datastores (can still use as mqs) * make sql queries work on all 3 (maybe?) * remove bolt log store and use sqlite3 * shove the FnLog shit into the datastore shit for now (free pg/mysql logs... just for demos, etc, not prod) * fix up the docs to remove bolt references * add sqlite3, sqlx dep * fix up tests & mock stuff, make validator less insane * remove put & get in datastore layer as nobody is using. this passes tests which at least seem like they test all the different backends. if we trust our tests then this seems to work great. (tests `make docker-test-run-with-*` work now too)	2017-07-07 01:30:02 -07:00
Denis Makogon	adf61c77be	Full stack tests	2017-07-05 12:38:09 -07:00
James	8a3edb8309	All of the changes for func logs	2017-06-19 11:38:11 -07:00
Travis Reeder	9a8ff408b5	Fixes scary output on docker startup.	2017-06-15 15:48:34 -07:00
Shaun Smith	a31bbdc676	Added -e NO_PROXY and -e HTTP_PROXY to docker-run to fix docker failure to connect to host unix.sock	2017-06-13 11:15:21 -07:00
James Jeffrey	79f1dab007	Deploy sh	2017-06-09 13:42:59 -07:00
James Jeffrey	c7a5bae587	Merge branch 'chad-gitlab-url-change' into 'master' Chad gitlab url change See merge request !28	2017-05-30 11:34:22 -07:00

1 2

79 Commits