Commit Graph

12 Commits

Author SHA1 Message Date
jan grant
8fc4bdcf97 Fnlb/k8s grouper (#563)
* WIP: add k8s grouper

- This shares a great deal of behaviour with allGrouper. Once it's
tested, refactor that to share as much as possible

- Glide hell. Checked in the yaml and lock files but a glide i -v
will be required to bring vendor/ up-to-date. Will address once this
is ready.

* Update README. Make the watch tracking work.

(To follow: add the junk that was pulled in via the glide update.)

* Vendor updates.

* go fmt

* Use the allGrouper with a k8s-backed DBStore instead.

This is much tidier :-)

* Fix up go vet
2017-12-06 10:45:27 -08:00
Tolga Ceylan
104babf26d fn: fnlb: fix tests if behind http proxy (#529)
*) localhost to avoid http proxy interception
2017-11-21 10:43:45 -08:00
Tolga Ceylan
4edda24703 fn: fnlb: default health state for new nodes (#508)
* fn: fnlb: default health state for new nodes

*) Any new node now by default is in unknown state.
*) One health check is required for unknown state to move in
   to healthy/unhealthy states, then actual interval and
   thresholds apply.
*) add() no longer runs health check as this is now handled
   with the new logic.

This means during restarts fnlb will run one health check
immediately for nodes to switch to healthy/unhealthy state
to ensure speedy start, but guard against routing traffic to
unhealthy servers. After this initial state, nodes are
subjected to regular interval and thresholds.

* *) style fixes
2017-11-16 15:35:33 -08:00
Tolga Ceylan
657afd5838 fn: fnlb: enhancements and new grouper tests (#493)
* fn: fnlb: enhancements and new grouper tests

*) added healthy threshold (default: 1)
*) grouper is now using configured hcEndpoint for version checks
*) grouper now logs when servers switch between healthy/unhealthy status
*) moved DB code out of grouper
*) run health check immediately at start (don't wait until hcInterval)
*) optional shutdown timeout (default: 0) & mgmt port (default: 8081)
*) hot path List() in grouper now uses atomic ptr Load
*) consistent router: moved closure to a new function
*) bugfix: version parsing from fn servers should not panic fnlb
*) bugfix: servers removed from DB, stayed in healthy list
*) bugfix: if DB is down, health checker stopped monitoring
*) basic new tests for grouper (add/rm/unhealthy/healthy) server
2017-11-16 11:35:30 -08:00
Reed Allman
8a59654582 go vet yourself (#397)
go vet caught some nifty bugs. so fixed those here, and also made it so that
we vet everything from now on since the robots seem to do a better job of
vetting than we have managed to.

also adds gofmt check to circle. could move this to the test.sh script (didn't
want a script calling a script, because $reasons) and it's nice and isolated
in its own little land as it is. side note, changed the script so it runs in
100ms instead of 3s, i think find is a lot faster than go list.

attempted some minor cleanup of various scripts
2017-10-06 08:42:33 -07:00
Denis Makogon
6ac579f296 Formatting issues
Aren't we running go-fmt.sh in CI?
2017-09-06 21:48:28 +03:00
Travis Reeder
d7bf64bf66 Big dependency update, all lowercase sirupsen's for all dependencies. 2017-08-23 19:52:56 -07:00
Reed Allman
b533350855 add traces to the lb
also fixed the broken version checking stuff so this works again
2017-08-02 13:51:10 -07:00
Denis Makogon
da0ab23f63 Adding per-node version check
Version check happens at start and every time attempting to add new node via API

Implements: #153
2017-08-01 20:37:54 +03:00
Reed Allman
e637f9736e back the lb with a db for scale
now we can run multiple lbs in the same 'cluster' and they will all point to
the same nodes. all lb nodes are not guaranteed to have the same set of
functions nodes to route to at any point in time since each lb node will
perform its own health checks independently, but they will all be backed by
the same list from the db to health check at least. in cases where there will
be more than a few lbs we can rethink this strategy, we mostly need to back
the lbs with a db so that they persist nodes and remain fault tolerant in that
sense. the strategy of independent health checks is useful to reduce thrashing
the db during network partitions between lb and fn pairs. it would be nice to
have gossip health checking to reduce network traffic, but this works too, and
we'll need to seed any gossip protocol with a list from a db anyway.

db_url is the same format as what functions takes. i don't have env vars set
up for fnlb right now (low hanging fruit), the flag is `-db`, it defaults to
in memory sqlite3 so nodes will be forgotten between reboots. used the sqlx
stuff, decided not to put the lb stuff in the datastore stuff as this was easy
enough to just add here to get the sugar, and avoid bloating the datastore
interface. the tables won't collide, so can just use same pg/mysql as what the
fn servers are running in prod even, db load is low from lb (1 call every 1s
per lb).

i need to add some tests, touch testing worked as expected.
2017-07-07 07:45:17 -07:00
Reed Allman
bcd9f1253e adds docker & release stuff for fnlb 2017-06-28 20:41:16 -07:00
Reed Allman
398ecc388e move the lb stuff around in lego form
this structure should allow us to keep the consistent hash code and just use
consistent hashing on a subset of nodes, then in order to satisfy the oracle
service stuff in functions-service we can just implement a different "Grouper"
that does vm allocation and whatever other magic we need to manage nodes and
poop out sets of nodes based on tenant id / func.

for the suga... see main.go and proxy.go, the rest is basically renaming /
moving stuff (not easy to follow changes, nature of the beast).

the only 'issues' i can think of is that down in the ch stuff (or Router) we
will need a back channel to tell the 'Grouper' to add a node (i.e. all nodes for
that shard are currently loaded) which isn't great and also the grouper has no
way of knowing that a node in the given set may not be being used anymore.
still thinking about how to couple those two. basically don't want to have to
just copy that consistent hash code but after munging with stuff i'm almost at
'fuck it' level and maybe it's worth it to just copy and hack it up in
functions-service for what we need. we'll also need to have different key
funcs for groupers and routers eventually (grouper wants tenant id, router
needs tenant id + router). anyway, open to any ideas, i haven't come up with
anything great. feedback on interface would be great

after this can plumb the datastore stuff into the allGrouper pretty easily
2017-06-10 15:21:23 -07:00