Commit Graph

16 Commits

Author SHA1 Message Date
Reed Allman
dc5e67b6d2 add opentracing spans for metrics 2017-07-25 08:55:22 -07:00
Reed Allman
c215dcf5dd remove docker inspect container
we had the inspect container here for 3 reasons:

1) get exit code
2) see if container is still running (debugging madness)
3) see if docker thinks it was an OOM

1) is something wait returns, but due to 2) and 3) we just delayed it until
inspection

2) was really just for debugging since we had 3)

3) seems unnecessary. to me, an OOM is an OOM is an OOM. so why have a whole
docker inspect call just to find out? (we could move this down, since it's a
sad path, and make the call only when necessary, but are we really getting any
value from this distinction anyway? i've never ran into it, myself)

inspect was actually causing tasks to time out, since the call to inspect
could put us over our task timeout, even though our container ran to
completion. we could have fixed this by checking the context earlier, but we
don't really need inspect either, which will reduce the docker calls we make,
which will make more unicorn puppers. now tasks should have more 'true'
timeouts.

tried to boy scout, but tracing patch also cleans this block up too.
2017-07-24 13:37:29 -07:00
Reed Allman
afcec04c24 remove the nanny
we finally graduated high school and can make our own ramen

we no longer need this since fn appears to have no concept of canceling tasks
through an api we need to watch, and the context is plumbed if the request is
canceled. since tasks are short, we may never need to do cancellation of
running tasks like we had with iron worker. this was an added docker call
that's unnecessary since we are doing force removal of the container at the
end anyway.
2017-07-24 11:56:58 -07:00
Reed Allman
c0aed2fbb0 mask errors in api response, log real error
we had this _almost_ right, in that we were trying, but we weren't masking the
error from the user response for any error we don't intend to show. this also
adds a stack trace from any internal server errors, so that we might be able
to track them down in the future (looking at you, 'context deadline
exceeded'). in addition, this adds a new `models.APIError` interface which all
of the errors in `models` now implement, and can be caught easily / added to
easily.

the front end now does no status rewriting based on api errors, now when we
get a non-nil error we can call `handleResponse(c, err)` with it and if it's a
proper error, return it to the user with the right status code, otherwise log
a stack trace and return `internal server error`. this cleans up a lot of the
front end code.

also rewrites start task ctx deadline exceeded as timeout. with iw we had
async tasks so we could start the clock later and it didn't matter, but now
with sync tasks time out sometimes just making docker calls, and we want the
task status to show up as timed out. we may want to just catch all this above
in addition to this, but this seems like the right thing to do.

remove squishing together errors. this was weird, now we return the first
error for the purposes of using the new err interface.

removed a lot of 5xx errors that really should have been 4xx errors. changed
some of the 400 errors to 409 errors, since they are from sending in
conflicting info and not a malformed request.

removed unused errors / useless errors (many were used for logging, and didn't
provide any context. now with stack traces we don't need context as much in
the logs).
2017-07-14 03:44:16 -07:00
Travis Reeder
e56ac42bc2 Using ctx logger in more places to get more context in the logs - ie: call_id 2017-07-10 16:13:51 -07:00
James Jeffrey
81e39b210d Add go fmt 2017-07-07 10:14:08 -07:00
Reed Allman
75c5e83936 adds wait time based scaling across nodes
this works by having every request from the functions server kick back a
FXLB-WAIT header on every request with the wait time for that function to
start. the lb then keeps track on a per node+function basis an ewma of the
last 10 request's wait times (to reduce jitter).  now that we don't have max
concurrency it's actually pretty challenging to get the wait time stuff to
tick. i expect in the near future we will be throttling functions on a given
node in order to induce this, but that is for another day as that code needs a
lot of reworking. i tested this by introducing some arbitrary throttling (not
checked in) and load spreads over nodes correctly (see images). we will also
need to play with the intervals we want to use, as if you have a func with
50ms run time then basically 10 of those will rev up another node (this was
before removing max_c, with max_c=1) but in any event this wires in the basic
plumbing.

* make docs great again. renamed lb dir to fnlb
* added wait time to dashboard
* wires in a ready channel to await the first pull for hot images to count in
the wait time (should be otherwise useful)

future:
TODO rework lb code api to be pluggable + wire in data store
TODO toss out first data point containing pull to not jump onto another node
immediately (maybe this is actually a good thing?)
2017-06-09 16:30:34 -07:00
Denis Makogon
3f065ce6bf [Feature] Function status 2017-06-06 14:12:50 -07:00
James Jeffrey
c7a5bae587 Merge branch 'chad-gitlab-url-change' into 'master'
Chad gitlab url change

See merge request !28
2017-05-30 11:34:22 -07:00
Denis Makogon
31b4ac4516 Address broken tests 2017-05-30 08:50:53 -07:00
Chad Arimura
49d397293b global url replace 2017-05-29 17:10:47 -07:00
Travis Reeder
9cc12b4b12 Remove iron... 2017-05-18 18:59:34 +00:00
James
e4bb04887e Rewrite imports to use forks files on gitlab not use githubs. 2017-05-16 11:06:32 -07:00
Travis Reeder
7cfd7d413f Fixed up build and updated dependencies. 2017-05-15 15:40:36 -07:00
Travis Reeder
4b9bba352d Rename location. 2017-05-15 11:00:15 -07:00
Travis Reeder
d0ca2f9228 Moved runner into this repo, update dep files and now builds. 2017-04-21 07:42:42 -07:00