Commit Graph

20 Commits

Author SHA1 Message Date
Reed Allman
30f3c45dbc update vendor/ dir to latest w/o heroku, moby
had to lock a lot of things in place
2017-08-03 03:52:14 -07:00
Reed Allman
780791da1c replace heroku registry client w/ docker one
the docker one actually sucks to use, as apparent here. had to add a lot of
shit just for it to work, but the heroku registry client has gone stale and is
causing issues with our dependencies. didn't expect this to be that hard, but
it's over now. good thing we aren't even using this code ^_^. we need to at
some point start limiting image sizes and remember this not being fun to
figure out to begin with, so anyway, now this works and we don't have to
wrestle with different versions of docker in deps (hopefully) anymore.
2017-08-03 03:50:55 -07:00
Reed Allman
3ff28163db fix task memory
prior to this patch we were allowing 256MB for every function run, just
because that was the default for the docker driver and we were not using the
memory field on any given route configuration. this fixes that, now docker
containers will get the correct memory limit passed into the container from
the route. the default is still 128.

there is also an env var now, `MEMORY_MB` that is set on each function call,
see the linked issue below for rationale.

closes #186

ran the given function code from #186, and now i only see allocations up to
32MB before the function is killed. yay.

notes:

there is no max for memory. for open source fn i'm not sure we want to
cap it, really. in the services repo we probably should add a cap before prod.
since we don't know any given fn server's ram, we can't try to make sure the
setting on any given route is something that can even be run.

remove envconfig & bytefmt

this updates the glide.yaml file to remove the unused deps, but trying to
install fresh is broken atm so i couldn't remove from vendor/, going to fix
separately (next update we just won't get these). also changed the skip dir to
be the cli dir now that its name has changed (related to brokenness).

fix how ram slots were being allocated. integer division is significantly
slower than subtraction.
2017-08-02 19:09:16 -07:00
Denis Makogon
49fe3eb11a Fixing FMT errors
Do we run go-fmt in CI?
2017-07-31 21:14:11 +03:00
Travis Reeder
48e3781d5e Rename to GitHub (#3)
* circle

* Rename to github and fn->cli

*  Rename to github and fn->cli
2017-07-26 10:50:19 -07:00
Reed Allman
dc5e67b6d2 add opentracing spans for metrics 2017-07-25 08:55:22 -07:00
Reed Allman
c215dcf5dd remove docker inspect container
we had the inspect container here for 3 reasons:

1) get exit code
2) see if container is still running (debugging madness)
3) see if docker thinks it was an OOM

1) is something wait returns, but due to 2) and 3) we just delayed it until
inspection

2) was really just for debugging since we had 3)

3) seems unnecessary. to me, an OOM is an OOM is an OOM. so why have a whole
docker inspect call just to find out? (we could move this down, since it's a
sad path, and make the call only when necessary, but are we really getting any
value from this distinction anyway? i've never ran into it, myself)

inspect was actually causing tasks to time out, since the call to inspect
could put us over our task timeout, even though our container ran to
completion. we could have fixed this by checking the context earlier, but we
don't really need inspect either, which will reduce the docker calls we make,
which will make more unicorn puppers. now tasks should have more 'true'
timeouts.

tried to boy scout, but tracing patch also cleans this block up too.
2017-07-24 13:37:29 -07:00
Reed Allman
afcec04c24 remove the nanny
we finally graduated high school and can make our own ramen

we no longer need this since fn appears to have no concept of canceling tasks
through an api we need to watch, and the context is plumbed if the request is
canceled. since tasks are short, we may never need to do cancellation of
running tasks like we had with iron worker. this was an added docker call
that's unnecessary since we are doing force removal of the container at the
end anyway.
2017-07-24 11:56:58 -07:00
Reed Allman
c0aed2fbb0 mask errors in api response, log real error
we had this _almost_ right, in that we were trying, but we weren't masking the
error from the user response for any error we don't intend to show. this also
adds a stack trace from any internal server errors, so that we might be able
to track them down in the future (looking at you, 'context deadline
exceeded'). in addition, this adds a new `models.APIError` interface which all
of the errors in `models` now implement, and can be caught easily / added to
easily.

the front end now does no status rewriting based on api errors, now when we
get a non-nil error we can call `handleResponse(c, err)` with it and if it's a
proper error, return it to the user with the right status code, otherwise log
a stack trace and return `internal server error`. this cleans up a lot of the
front end code.

also rewrites start task ctx deadline exceeded as timeout. with iw we had
async tasks so we could start the clock later and it didn't matter, but now
with sync tasks time out sometimes just making docker calls, and we want the
task status to show up as timed out. we may want to just catch all this above
in addition to this, but this seems like the right thing to do.

remove squishing together errors. this was weird, now we return the first
error for the purposes of using the new err interface.

removed a lot of 5xx errors that really should have been 4xx errors. changed
some of the 400 errors to 409 errors, since they are from sending in
conflicting info and not a malformed request.

removed unused errors / useless errors (many were used for logging, and didn't
provide any context. now with stack traces we don't need context as much in
the logs).
2017-07-14 03:44:16 -07:00
Travis Reeder
e56ac42bc2 Using ctx logger in more places to get more context in the logs - ie: call_id 2017-07-10 16:13:51 -07:00
Reed Allman
75c5e83936 adds wait time based scaling across nodes
this works by having every request from the functions server kick back a
FXLB-WAIT header on every request with the wait time for that function to
start. the lb then keeps track on a per node+function basis an ewma of the
last 10 request's wait times (to reduce jitter).  now that we don't have max
concurrency it's actually pretty challenging to get the wait time stuff to
tick. i expect in the near future we will be throttling functions on a given
node in order to induce this, but that is for another day as that code needs a
lot of reworking. i tested this by introducing some arbitrary throttling (not
checked in) and load spreads over nodes correctly (see images). we will also
need to play with the intervals we want to use, as if you have a func with
50ms run time then basically 10 of those will rev up another node (this was
before removing max_c, with max_c=1) but in any event this wires in the basic
plumbing.

* make docs great again. renamed lb dir to fnlb
* added wait time to dashboard
* wires in a ready channel to await the first pull for hot images to count in
the wait time (should be otherwise useful)

future:
TODO rework lb code api to be pluggable + wire in data store
TODO toss out first data point containing pull to not jump onto another node
immediately (maybe this is actually a good thing?)
2017-06-09 16:30:34 -07:00
Denis Makogon
3f065ce6bf [Feature] Function status 2017-06-06 14:12:50 -07:00
James Jeffrey
c7a5bae587 Merge branch 'chad-gitlab-url-change' into 'master'
Chad gitlab url change

See merge request !28
2017-05-30 11:34:22 -07:00
Denis Makogon
31b4ac4516 Address broken tests 2017-05-30 08:50:53 -07:00
Chad Arimura
49d397293b global url replace 2017-05-29 17:10:47 -07:00
Travis Reeder
9cc12b4b12 Remove iron... 2017-05-18 18:59:34 +00:00
James
e4bb04887e Rewrite imports to use forks files on gitlab not use githubs. 2017-05-16 11:06:32 -07:00
Travis Reeder
7cfd7d413f Fixed up build and updated dependencies. 2017-05-15 15:40:36 -07:00
Travis Reeder
4b9bba352d Rename location. 2017-05-15 11:00:15 -07:00
Travis Reeder
d0ca2f9228 Moved runner into this repo, update dep files and now builds. 2017-04-21 07:42:42 -07:00