fn-serverless

mirror of https://github.com/fnproject/fn.git synced 2022-10-28 21:29:17 +03:00

Author	SHA1	Message	Date
Reed Allman	e637f9736e	back the lb with a db for scale now we can run multiple lbs in the same 'cluster' and they will all point to the same nodes. all lb nodes are not guaranteed to have the same set of functions nodes to route to at any point in time since each lb node will perform its own health checks independently, but they will all be backed by the same list from the db to health check at least. in cases where there will be more than a few lbs we can rethink this strategy, we mostly need to back the lbs with a db so that they persist nodes and remain fault tolerant in that sense. the strategy of independent health checks is useful to reduce thrashing the db during network partitions between lb and fn pairs. it would be nice to have gossip health checking to reduce network traffic, but this works too, and we'll need to seed any gossip protocol with a list from a db anyway. db_url is the same format as what functions takes. i don't have env vars set up for fnlb right now (low hanging fruit), the flag is `-db`, it defaults to in memory sqlite3 so nodes will be forgotten between reboots. used the sqlx stuff, decided not to put the lb stuff in the datastore stuff as this was easy enough to just add here to get the sugar, and avoid bloating the datastore interface. the tables won't collide, so can just use same pg/mysql as what the fn servers are running in prod even, db load is low from lb (1 call every 1s per lb). i need to add some tests, touch testing worked as expected.	2017-07-07 07:45:17 -07:00
Reed Allman	9cc5fb8784	remove traces of iron	2017-06-28 21:25:56 -07:00
Reed Allman	cc8194d015	add docs for running docker local/not, operation of api, usage w/ cli, remove docker-run from makefile (impossibru)	2017-06-28 20:41:16 -07:00
Reed Allman	68a79eb7b8	left align	2017-06-28 20:41:16 -07:00
Reed Allman	bcd9f1253e	adds docker & release stuff for fnlb	2017-06-28 20:41:16 -07:00
Reed Allman	398ecc388e	move the lb stuff around in lego form this structure should allow us to keep the consistent hash code and just use consistent hashing on a subset of nodes, then in order to satisfy the oracle service stuff in functions-service we can just implement a different "Grouper" that does vm allocation and whatever other magic we need to manage nodes and poop out sets of nodes based on tenant id / func. for the suga... see main.go and proxy.go, the rest is basically renaming / moving stuff (not easy to follow changes, nature of the beast). the only 'issues' i can think of is that down in the ch stuff (or Router) we will need a back channel to tell the 'Grouper' to add a node (i.e. all nodes for that shard are currently loaded) which isn't great and also the grouper has no way of knowing that a node in the given set may not be being used anymore. still thinking about how to couple those two. basically don't want to have to just copy that consistent hash code but after munging with stuff i'm almost at 'fuck it' level and maybe it's worth it to just copy and hack it up in functions-service for what we need. we'll also need to have different key funcs for groupers and routers eventually (grouper wants tenant id, router needs tenant id + router). anyway, open to any ideas, i haven't come up with anything great. feedback on interface would be great after this can plumb the datastore stuff into the allGrouper pretty easily	2017-06-10 15:21:23 -07:00
Reed Allman	75c5e83936	adds wait time based scaling across nodes this works by having every request from the functions server kick back a FXLB-WAIT header on every request with the wait time for that function to start. the lb then keeps track on a per node+function basis an ewma of the last 10 request's wait times (to reduce jitter). now that we don't have max concurrency it's actually pretty challenging to get the wait time stuff to tick. i expect in the near future we will be throttling functions on a given node in order to induce this, but that is for another day as that code needs a lot of reworking. i tested this by introducing some arbitrary throttling (not checked in) and load spreads over nodes correctly (see images). we will also need to play with the intervals we want to use, as if you have a func with 50ms run time then basically 10 of those will rev up another node (this was before removing max_c, with max_c=1) but in any event this wires in the basic plumbing. * make docs great again. renamed lb dir to fnlb * added wait time to dashboard * wires in a ready channel to await the first pull for hot images to count in the wait time (should be otherwise useful) future: TODO rework lb code api to be pluggable + wire in data store TODO toss out first data point containing pull to not jump onto another node immediately (maybe this is actually a good thing?)	2017-06-09 16:30:34 -07:00

7 Commits