Commit Graph

51 Commits

Author SHA1 Message Date
Reed Allman
75c5e83936 adds wait time based scaling across nodes
this works by having every request from the functions server kick back a
FXLB-WAIT header on every request with the wait time for that function to
start. the lb then keeps track on a per node+function basis an ewma of the
last 10 request's wait times (to reduce jitter).  now that we don't have max
concurrency it's actually pretty challenging to get the wait time stuff to
tick. i expect in the near future we will be throttling functions on a given
node in order to induce this, but that is for another day as that code needs a
lot of reworking. i tested this by introducing some arbitrary throttling (not
checked in) and load spreads over nodes correctly (see images). we will also
need to play with the intervals we want to use, as if you have a func with
50ms run time then basically 10 of those will rev up another node (this was
before removing max_c, with max_c=1) but in any event this wires in the basic
plumbing.

* make docs great again. renamed lb dir to fnlb
* added wait time to dashboard
* wires in a ready channel to await the first pull for hot images to count in
the wait time (should be otherwise useful)

future:
TODO rework lb code api to be pluggable + wire in data store
TODO toss out first data point containing pull to not jump onto another node
immediately (maybe this is actually a good thing?)
2017-06-09 16:30:34 -07:00
Reed Allman
9edacae928 clean up hotf(x) concurrency, rm max c
this patch gets rid of max concurrency for functions altogether, as discussed,
since it will be challenging to support across functions nodes. as a result of
doing so, the previous version of functions would fall over when offered 1000
functions, so there was some work needed in order to push this through.
further work is necessary as docker basically falls over when trying to start
enough containers at the same time, and with this patch essentially every
function can scale infinitely. it seems like we could add some kind of
adaptive restrictions based on task run length and configured wait time so
that fast running functions will line up to run in a hot container instead of
them all creating new hot containers.

this patch takes a first cut at whacking out some of the insanity that was the
previous concurrency model, which was problematic in that it limited
concurrency significantly across all functions since every task went through
the same unbuffered channel, which could create blocking issues for all
functions if the channel is not picked off fast enough (it's not apparent that
this was impossible in the previous implementation). in any event, each
request has a goroutine already, there's no reason not to use it. not too hard
to wrap a map in a lock, not sure what the benefits were (added insanity?) in effect
this is marginally easier to understand and less insane (marginally). after
getting rid of max c this adds a blocking mechanism for the first invocation
of any function so that all other hot functions will wait on the first one to
finish to avoid a herd issue (was making docker die...) -- this could be
slightly improved, but works in a pinch. reduced some memory usage by having
redundant maps of htfnsvr's and task.Requests (by a factor of 2!). cleaned up
some of the protocol stuff, need to clean this up further. anyway, it's a
first cut. have another patch that rewrites all of it but was getting into
rabbit hole territory, would be happy to oblige if anybody else has problems
understanding this rat's nest of channels. there is a good bit of work left to
make this prod ready (regardless of removing max c).

a warning that this will break the db schemas, didn't put the effort in to add
migration stuff since this isn't deployed anywhere in prod...

TODO need to clean out the htfnmgr bucket with LRU
TODO need to clean up runner interface
TODO need to unify the task running paths across protocols
TODO need to move the ram checking stuff into worker for noted reasons
TODO need better elasticity of hot f(x) containers
2017-06-05 20:04:13 -07:00
Chad Arimura
49d397293b global url replace 2017-05-29 17:10:47 -07:00
Travis Reeder
69f0201818 Some small cleanup to docs. 2017-05-26 18:54:26 +00:00
James
e4bb04887e Rewrite imports to use forks files on gitlab not use githubs. 2017-05-16 11:06:32 -07:00
Travis Reeder
4b9bba352d Rename location. 2017-05-15 11:00:15 -07:00
Travis Reeder
d0ca2f9228 Moved runner into this repo, update dep files and now builds. 2017-04-21 07:42:42 -07:00
Travis Reeder
615ae5c36f Mass s&r: iron-io -> kumokit 2017-04-19 09:49:12 -06:00
Travis Reeder
10f3178ae9 Switching to new dep tool (#616)
* making things work

* #506 - Add ability to login to a private docker registry

* Rolling back "make things work" to test them out more.

* Rolling back "make things work" to test them out more.

* credentials from docker/config.json if ENV is missing

* should get docker auth info just in the init

* update glide lock

* update glide

* Switched to new go dep tool, glide is too frikin annoying.

* Updated circle builds to use dep

* Added GOPATH/bin to path.

* Added GOPATH/bin to path.

* Using regular make test, instead of docker one (not sure why it was using the docker one?).
2017-04-07 11:22:08 -07:00
C Cirello
c48bd95fa6 server: stats endpoint (#468)
fixes #389
2017-01-03 21:39:29 +01:00
C Cirello
1dc3145045 functions: upgrade runner to latest (#434)
* functions: upgrade runner

* functions: update to latest runner

Supercedes and fixes #433
2016-12-14 00:10:24 +01:00
C Cirello
0cdd1db3e1 functions: fix goroutine leak in runner (#394)
* functions: fix goroutine leak in runner

* functions: ensure taskQueue is consumed after context cancellation
2016-12-06 16:11:06 +01:00
C Cirello
ac0044f7d9 functions: hot containers (#332)
* functions: modify datastore to accomodate hot containers support

* functions: protocol between functions and hot containers

* functions: add hot containers clockwork

* fn: add hot containers support
2016-11-28 15:45:35 -02:00
Pedro Nasser
867eb4b176 Changes on function/metric loggers (#343)
* initial fix logger

* dix DefaultFuncLogger

* fix runner and tests

* reverting: sending async task stdout to func logger
2016-11-27 16:36:40 -02:00
C Cirello
9d06b6e687 functions: common concurrency stream for sync and async (#314)
* functions: add bounded concurrency

* functions: plug runners to sync and async interfaces

* functions: update documentation about the new env var

* functions: fix test flakiness

* functions: the runner is self-regulated, no need to set a number of runners

* functions: push the execution to the background on incoming requests

* functions: ensure async tasks are always on

* functions: add prioritization to tasks consumption

Ensure that Sync tasks are consumed before Async tasks. Also, fixes
termination races problems for free.

* functions: remove stale comments

* functions: improve mem availability calculation

* functions: parallel run for async tasks

* functions: check for memory availability before pulling async task

* functions: comment about rnr.hasAvailableMemory and sync.Cond

* functions: implement memory check for async runners using Cond vars

* functions: code grooming

- remove unnecessary goroutines
- fix stale docs
- reorganize import group

* Revert "functions: implement memory check for async runners using Cond vars"

This reverts commit 922e64032201a177c03ce6a46240925e3d35430d.

* Revert "functions: comment about rnr.hasAvailableMemory and sync.Cond"

This reverts commit 49ad7d52d341f12da9603b1a1df9d145871f0e0a.

* functions: set a minimum memory availability for sync

* functions: simplify the implementation by removing the priority queue

* functions: code grooming

- code deduplication
- review waitgroups Waits
2016-11-18 18:23:26 +01:00
Carlos C
d5fb1afda7 Revert "Assert License (#224)"
This reverts commit a61c4dab78.
2016-11-06 09:25:12 -08:00
C Cirello
a61c4dab78 Assert License (#224)
* license: assert license for Go code
* license: add in shell scripts
* license: assert license for Ruby code
* license: assert license to individual cases
* license: assert license to Dockerfile
2016-11-05 23:33:07 +01:00
Nikhil Marathe
1397899358 Fix max memory on non-linux machines and memory decrement after failures.
* Always decrement memory even if task preparation or execution fails.

* Fall back to max 2GB memory on non-Linux. 300GB is ridiculous.

* Simplify loop
2016-10-31 17:33:04 -07:00
Travis Reeder
41c06644d9 Docs related to running in production. (#174)
* Fixed up api.md, removed Titan references.

* Adding more documentation on running in production.

* Update deps for ironmq.
2016-10-17 11:31:58 -07:00
Seif Lotfy سيف لطفي
064d597b60 Fix runner changes (#135)
* Upgrade iron-io/runner to 165c16a9

* fix support for Stdin to work
2016-10-07 21:17:40 +02:00
Pedro Nasser
52f78eb601 fix runner changes (#132)
Fix runner changes
2016-10-07 18:49:16 +02:00
Seif Lotfy سيف لطفي
52cab30056 Change PAYLOAD input to STDIN (#111)
* change to iron-io/runner dependency
* Fix runner dependency
* Change PAYLOAD input to STDIN, fixes #40
2016-10-06 18:44:58 -03:00
Seif Lotfy سيف لطفي
b7bf73f5d2 Makefile (#122)
* Update Readme and add Makefile
* Skip stale tests (in wait for stdin support)

* Revert "Skip stale tests (in wait for stdin support)"

This reverts commit 228da3776503f40ca53df70a79a9e4a9c73fd8b5.
2016-10-06 20:46:29 +02:00
C Cirello
3ca137a01c Upgrade to Go 1.7 (#128)
* Upgrade to stdlib context package
* Modernized syntax
2016-10-06 20:10:00 +02:00
Seif Lotfy سيف لطفي
fbcec6bf40 Depend on iron-io/runner instead of iron-io/worker (#124) 2016-10-05 20:42:12 +02:00
Henrique Chehad
ba3f0b360b removed "reserved_memory" metric 2016-09-21 22:35:07 -03:00
Henrique Chehad
6b910d0b75 added wait time total and reserved memory metrics 2016-09-21 20:25:37 -03:00
Henrique Chehad
06294b4b77 updated worker repository ref 2016-09-19 20:41:35 -03:00
Pedro Nasser
b867b20cfd fix sleep time 2016-09-17 12:11:04 -03:00
Pedro Nasser
853c8b4534 prevent zero memory requirement 2016-09-13 23:44:16 -03:00
Pedro Nasser
688a6a0718 invalid method 2016-09-13 23:40:30 -03:00
Pedro Nasser
89a4092dc1 merge with master 2016-09-13 23:25:06 -03:00
Pedro Nasser
da1746dc97 improvements 2016-09-13 23:22:00 -03:00
Pedro Nasser
e6d0079051 add IGNORE_MEMORY 2016-09-12 14:44:11 -03:00
Pedro Nasser
a98b7e25d0 metric logger 2016-09-12 11:46:21 -03:00
Pedro Nasser
81a3394317 add queue.full and timeout count 2016-09-09 01:03:27 -03:00
Pedro Nasser
5d50721db1 add initial queue to runner 2016-09-09 00:54:00 -03:00
Henrique Chehad
615b421dfa migrated EnsureUsableImage to EnsureImageExists 2016-08-30 11:08:54 -03:00
Pedro Nasser
6a2e9b29be update titan, other deps and minor changes 2016-08-24 16:11:21 -03:00
Henrique Chehad
148d52c890 updates after runner factored 2016-08-22 19:17:58 -03:00
Pedro Nasser
8b0d0f1e13 refactor runner 2016-08-21 19:40:08 -03:00
Evan Shaw
6a369eb23a Update deps; add container label for logs 2016-08-16 15:19:16 +12:00
Pedro Nasser
06c3bb3949 added env REQUEST_URL 2016-08-15 23:37:30 -03:00
Travis Reeder
40e1ebd434 Updates to fix against Titan changes and what not. 2016-08-07 14:10:31 -04:00
Pedro Nasser
4fa9380ecc added request uid 2016-08-05 19:04:17 -03:00
Pedro Nasser
905f085c23 clean unnecessary code 2016-07-31 17:28:17 -03:00
Pedro Nasser
d7f1ea037b multiple ajustments
- renamed WrapperJob (not exported anymore)
- removed need for temp log file
- not using titan models
- using gin.Context as runner context
2016-07-28 17:45:57 -03:00
Pedro Nasser
a92dffb3fc changing titan from API to interface 2016-07-28 01:02:22 -03:00
Pedro Nasser
14cc57fd9c refactor runner using titan 2016-07-24 17:46:08 -03:00
Pedro Nasser
5a13e2c0cc improv api, datastore, postgres, runner 2016-07-21 21:18:02 -03:00