Commit Graph

1247 Commits

Author SHA1 Message Date
CI
4e989b0789 fnserver: 0.3.316 release [skip ci] 2018-01-31 12:32:20 +00:00
Dario Domizioli
e753732bd8 Hot protocols improvements (for 662) (#724)
* Improve deadline handling in streaming protocols

* Move special headers handling down to the protocols

* Adding function format documentation for JSON changes

* Add tests for request url and method in JSON protocol

* Fix  protocol missing fn-specific info

* Fix import

* Add panic for something that should never happen
2018-01-31 12:26:43 +00:00
CI
02c8aa1998 fnserver: 0.3.315 release [skip ci] 2018-01-31 12:13:33 +00:00
Dario Domizioli
e2dad00a83 Add simple test for calling several hot functions in parallel (#675)
* Add test for calling several hot functions in parallel
2018-01-31 12:08:05 +00:00
CI
5f8736500d fnserver: 0.3.314 release [skip ci] 2018-01-26 20:26:48 +00:00
Tolga Ceylan
97d78c584b fn: better slot/container/request state tracking (#719)
* fn: better slot/container/request state tracking
2018-01-26 12:21:11 -08:00
CI
a7223437df fnserver: 0.3.313 release [skip ci] 2018-01-24 17:36:52 +00:00
CI
cc4c157d56 fnserver: 0.3.312 release [skip ci] 2018-01-24 15:43:41 +00:00
CI
665997078e fnserver: 0.3.311 release [skip ci] 2018-01-24 03:57:41 +00:00
Reed Allman
bbd50a0e02 additional ctx spans / maid service (#716)
* add spans to async

* clean up / add spans to agent

* there were a few methods which had multiple contexts which existed in the same
scope (this doesn't end well, usually), flattened those out.
* loop bound context cancels now rely on defer (also was brittle)
* runHot had a lot of ctx shuffling, flattened that.
* added some additional spans in certain paths for added granularity
* linked up the hot launcher / run hot / wait hot to _a_ root span, the first
2 are follows from spans, but at least we can see the source of these and also
can see containers launched over a hot launcher's lifetime

I left TODO around the FollowsFrom because OpenCensus doesn't, at least at the
moment, appear to have any idea of FollowsFrom and it was an extra OpenTracing
method (we have to get the span out, start a new span with the option, then
add it to the context... some shuffling required). anyway, was on the fence
about adding at least.

* resource waiters need to manage their own goroutine lifecycle

* if we get an impossible memory request, bail instead of infinite loop

* handle timeout slippery case

* still sucks, but hotLauncher doesn't leak anything. even the time.After timer goroutines

* simplify GetResourceToken

GetCall can guard against the impossible to allocate resource tasks entering
the system by erroring instead of doling them out. this makes GetResourceToken
logic more straightforward for callers, who now simply have the contract that
they won't ever get a token if they let tasks into the agent that can't run
(but GetCall guards this, and there's a test for it).

sorry, I was going to make this only do that, but when I went to fix up the
tests, my last patch went haywire so I fixed that too. this also at least
tries to simplify the hotLaunch loop, which will now no longer leak time.After
timers (which were long, and with signaller, they were many -- I got a stack
trace :) -- this breaks out the bottom half of the logic to check to see if we
need to launch into its own function, and handles the cleaning duties only in
the caller instead of in 2 different select statements. played with this a
bit, no doubt further cleaning could be done, but this _seems_ better.

* fix vet

* add units to exported method contract docs

* oops
2018-01-23 19:52:22 -08:00
CI
ccd95b6f72 fnserver: 0.3.310 release [skip ci] 2018-01-24 03:35:45 +00:00
Tolga Ceylan
ee59361bda fn: added server too busy stats (#717) 2018-01-23 19:30:01 -08:00
CI
6873ed1fc1 fnserver: 0.3.309 release [skip ci] 2018-01-23 21:20:57 +00:00
CI
5fc01d6974 fnserver: 0.3.308 release [skip ci] 2018-01-22 22:23:01 +00:00
CI
2fa754ff48 fnserver: 0.3.307 release [skip ci] 2018-01-22 20:09:10 +00:00
CI
81e015d008 fnserver: 0.3.306 release [skip ci] 2018-01-22 19:48:28 +00:00
Reed Allman
bae13d6c29 fix the http protocol dumper (#705)
we were using the httputil.DumpRequest when there is a perfectly good
req.Write method hanging out in the stdlib, that even does the chunked thing
that a few people ran into if they don't provide a content length:
https://golang.org/pkg/net/http/#Request.Write -- so we shouldn't run into
that issue again. I hit this in testing and it was not very fun to debug, so
added a test that repro'd it on master and fixes it here. of course, adding a
content length works too. tested this and it appears to work pretty well, also
cleaned up the control flow a little bit in http protocol.
2018-01-22 11:41:04 -08:00
CI
4d7c951f76 fnserver: 0.3.305 release [skip ci] 2018-01-22 17:02:09 +00:00
Nigel Deakin
e1df053de9 Change timedout to timeouts (#709) 2018-01-22 16:55:30 +00:00
CI
5c7a21b59e fnserver: 0.3.304 release [skip ci] 2018-01-19 20:43:29 +00:00
Tolga Ceylan
8c31e47c01 fn: agent slot improvements (#704)
*) Stopped using latency previous/current stats, this
was not working as expected. Fresh starts usually have
these stats zero for a long time, and initial samples
are high due to downloads, caches, etc.

*) New state to track: containers that are idle. In other
words, containers that have an unused token in the slot
queue.

*) Removed latency counts since these are not used in
container start decision anymore. Simplifies logs.

*) isNewContainerNeeded() simplified to use idle count
to estimate effective waiters. Removed speculative
latency based logic and progress check comparison.
In agent, waitHot() delayed signalling compansates
for these changes. If the estimation may fail, but
this should correct itself in the next 200 msec
signal.
2018-01-19 12:35:52 -08:00
CI
31fd1276fb fnserver: 0.3.303 release [skip ci] 2018-01-19 18:09:40 +00:00
CI
1549534c3c fnserver: 0.3.302 release [skip ci] 2018-01-19 03:45:21 +00:00
Tolga Ceylan
2f0de2b574 fn: resource and slot cancel and broadcast improvements (#696)
* fn: resource and slot cancel and broadcast improvements

*) Context argument does not wake up the waiters correctly upon
cancellation/timeout.
*) Avoid unnecessary broadcasts in slot and resource.

* fn: limit scope of context in resource/slot calls in agent
2018-01-18 13:43:56 -08:00
Reed Allman
c9e995292c if a slot is available, don't launch more (#701)
since we were sending a signal before checking if a slot was available, even
in the case of serial calls locally I was seeing 2 containers launch. if we
only send a signal after first checking if a slot is available, this goes
away. 1 usec should not be too offensive of an additional wait, all things
considered here.
2018-01-18 13:19:25 -08:00
CI
3e2debae07 fnserver: 0.3.301 release [skip ci] 2018-01-18 00:16:12 +00:00
Tolga Ceylan
5a7778a656 fn: cancellations in WaitAsyncResource (#694)
* fn: cancellations in WaitAsyncResource

Added go context with cancel to wait async resource. Although
today, the only case for cancellation is shutdown, this cleans
up agent shutdown a little bit.

* fn: locked broadcast to avoid missed wake-ups

* fn: removed ctx arg to WaitAsyncResource and startDequeuer

This is confusing and unnecessary.
2018-01-17 16:08:54 -08:00
CI
65592c9d26 fnserver: 0.3.300 release [skip ci] 2018-01-17 15:23:53 +00:00
CI
88072eba6e fnserver: 0.3.299 release [skip ci] 2018-01-16 22:52:35 +00:00
CI
f0aeb815b6 fnserver: 0.3.298 release [skip ci] 2018-01-15 16:42:49 +00:00
CI
02768f6539 fnserver: 0.3.297 release [skip ci] 2018-01-15 14:56:00 +00:00
Nigel Deakin
8bf26efa29 Add new Prom metrics fn_timeout and fn_errors (#679)
* Add new Prom metric fn_timedout

* Add new Prometheus metric fn_errors

* Tidy up variable name

* Add new Prometheus metric fn_errors

* gofmt
2018-01-15 14:49:33 +00:00
CI
ea97bea22e fnserver: 0.3.296 release [skip ci] 2018-01-15 10:48:20 +00:00
Gerardo Viedma
966ce58525 Use new metrics API for s3 log metrics (#680)
* use new metrics API for histogram metrics

* Avoid creating an extra tracing span

* use new metrics api for histograms

* fix minor formatting issue
2018-01-15 10:09:03 +00:00
CI
9678ca4735 fnserver: 0.3.295 release [skip ci] 2018-01-12 22:56:37 +00:00
CI
23b267b208 fnserver: 0.3.294 release [skip ci] 2018-01-12 22:03:11 +00:00
Reed Allman
0bde666395 clean up agent.Submit (#681)
this was getting bloated with various contexts and spans and stats
administrivia that obfuscated what was going on a lot. this makes some helper
methods to shove most of that stuff into, and simplifies the context handling
around getting a slot by moving it inside of slot acquisition code. also
removed most uses of `call.Model()` -- I'll kill this thing some day, but if a
reason is needed, then the overhead of dynamic dispatch is unnecessary, we're
inside of the implementee for the agent, we don't want to use the interface
methods inside of that.
2018-01-12 13:56:17 -08:00
CI
342460f242 fnserver: 0.3.293 release [skip ci] 2018-01-12 19:45:33 +00:00
Tolga Ceylan
39b2cb2d9b Cpu resources (#642)
* fn: cpu quota implementation
2018-01-12 11:38:28 -08:00
CI
5d500124e0 fnserver: 0.3.292 release [skip ci] 2018-01-12 00:07:38 +00:00
Tolga Ceylan
1c8029e4f1 fn: more tests for hot container launch logic (#678) 2018-01-11 16:00:37 -08:00
CI
7693c6f9d1 fnserver: 0.3.291 release [skip ci] 2018-01-11 22:16:17 +00:00
Tolga Ceylan
db159e595f fn: new container lauch adjustments (#677)
*) revert executor wait queue size comparison. This is too
   aggresive and with stall check below, now unnecessary.
*) new container logic now checks if stats are constant, if
   this is the case, then we assume the system is stalled (eg
   running functions that take long time), this means we need
   to make progress and spin up a new container.
2018-01-11 14:09:21 -08:00
CI
25e8ce34b7 fnserver: 0.3.290 release [skip ci] 2018-01-11 19:20:58 +00:00
CI
6c2cfa155a fnserver: 0.3.289 release [skip ci] 2018-01-11 17:41:33 +00:00
Nigel Deakin
ac2bfd3462 Change basic stats to use opentracing rather than Prometheus API (#671)
* Change basic stats to use opentracing rather than Prometheus API directly

* Just ran gofmt

* Extract opentracing access for metrics to common/metrics.go

* Replace quotes strings with constants where possible
2018-01-11 17:34:51 +00:00
CI
9b6cdd8009 fnserver: 0.3.288 release [skip ci] 2018-01-11 01:02:43 +00:00
CI
b19e147c2f fnserver: 0.3.287 release [skip ci] 2018-01-10 22:21:02 +00:00
Tolga Ceylan
7c91b98a72 fn: hot container launcher adjustment (#673)
Latency stats are not always read-time updated and
if calls are stuck in waiting state, isNewContainerNeeded()
needs to be a bit more aggresive if the wait queue grows.
2018-01-10 14:14:19 -08:00
CI
797e4c65c0 fnserver: 0.3.286 release [skip ci] 2018-01-10 21:22:00 +00:00