3393 Commits

Author SHA1 Message Date
CI
d55e01ab7d fnserver: v0.3.749 release [skip ci] v0.3.749 2019-12-19 00:01:15 +00:00
Tolga Ceylan
0500ad5cfc fn: handle go-lang time.Now() skew (#1572)
Even for CLOCK_MONOTONIC, NTP adjustments can be made for advancing
the clock forward. When reporting metrics, let's handle this as
callLatency zero (in other words, execution latency is almost
same as overall latency.)
2019-12-18 15:53:30 -08:00
CI
06fbb7f491 fnserver: v0.3.748 release [skip ci] v0.3.748 2019-12-09 01:26:38 +00:00
Krister Johansen
0017e0e67e Let runner clients specify a timeout at creation. (#1569)
Add a new function, NewgRPCRunnerWithTimeout, that allows the caller to
define the grpc connection timeout.  Modify NewgRPCRunner to be
implemented in terms of NewgRPCRunnerWithTimeout, passing its existing
default timeout to the new function.

Addition of this function was necessary because the timeout here is
rather aggressive.  A test tool that leverages the existing
NewgRPCRunner was moved into an environment where its test target was in
a different region from the origin, which required a higher timeout to
cope with the test endpoint being physically farther away.
2019-12-08 17:19:15 -08:00
CI
a2974dc882 fnserver: v0.3.747 release [skip ci] v0.3.747 2019-11-19 06:58:46 +00:00
Dhiraj Mutreja
c93b01eae2 Revert client to use the older(v1) Status method. (#1567) 2019-11-18 22:27:04 -08:00
CI
9bf2c07590 fnserver: v0.3.746 release [skip ci] v0.3.746 2019-11-04 18:29:34 +00:00
Tolga Ceylan
850508a9bd fn: naive placer rr fixup with tests (#1566)
If naive placer is not instantiated per call/runner group (aka LBG),
then the rr index will not trigger an round-robin behavior since
the index is initialized and stored in the placer configuration.

With this PR, moving rr index to per RunnerPool.Runners() inner loop
to ensure a round robin within that set. Each time we fetch a set,
since the set might be different, we reset our rr index. This means
we rr within that set once, then randomly start from another node
for the next RunnerPool.Runners() iteration. In busy systems,
no significant behavior change is expected (accept the removal of
atomic operations with respect to performance), but in idle systems
round robin behavior should be more observable and simple to follow and can
reduce same hit cases for the given RunnerPool.Runners().

In addition, introducing naive placer tests to ensure we observe
this behavior.
2019-11-04 10:21:13 -08:00
CI
02445ef54d fnserver: v0.3.745 release [skip ci] v0.3.745 2019-11-04 17:19:34 +00:00
Tim Langford
d71d071d35 Update 'RetryAllBackoff' to always update the 'emptyPoolCountMeasure' even if the invocation is going to 'fail fast'. (#1558) 2019-11-04 09:11:39 -08:00
CI
39fa1be6d3 fnserver: v0.3.744 release [skip ci] v0.3.744 2019-10-30 21:21:40 +00:00
Tolga Ceylan
bddf5ec2f3 fn: LB runner client header processing fix (#1565)
When LB is processing http headers from flat array representation
in gRPC, it should use http.Header.Add() to grow http headers to
handle header keys with multiple values. Set() overrides the
previous entries.
2019-10-30 14:14:21 -07:00
CI
6502947958 fnserver: v0.3.743 release [skip ci] v0.3.743 2019-10-30 20:24:00 +00:00
Michael J Williams
119848f4c8 Add Windows CLI install to Quickstart (#1564) 2019-10-30 14:15:44 -06:00
CI
cad827b8f3 fnserver: v0.3.742 release [skip ci] v0.3.742 2019-10-30 18:05:26 +00:00
Tolga Ceylan
79afc63cfa fn: log container/image latencies in call finished logger (#1563)
Adding useful metrics to the existing logger in logCallFinish:

*) execution duration - fn exec duration
*) scheduler duration - general runner scheduler wait
*) image wait duration - latency due to image pulling
*) container preparaton duration - initial setup for containers (eg. tmpfs)
*) container creation duration - latency due to container create/attach/start
*) container init duration - latency in FDK init, UDS wait, etc.
2019-10-30 10:57:35 -07:00
CI
da49be605b fnserver: v0.3.741 release [skip ci] v0.3.741 2019-10-25 17:44:30 +00:00
Owen Strain
e352d449c3 Decrease log level of failure to find runners message to WARN (#1562) 2019-10-25 10:36:30 -07:00
CI
52db27788e fnserver: v0.3.740 release [skip ci] v0.3.740 2019-10-21 22:16:53 +00:00
Rohit Kumar
b15d697f0e add Fn-Fdk-Version to metrics (#1560)
* add fn fdk version to default tags in api response metrics and fields in api response logging

* add how to build fn docker image, and specify it to fn start command
2019-10-21 15:08:18 -07:00
CI
484e39d3ba fnserver: v0.3.739 release [skip ci] v0.3.739 2019-10-18 04:50:17 +00:00
Dhiraj Mutreja
a6dfa2aff7 Add status v2 (#1556)
Added the ability to supply input to status checker
2019-10-17 21:42:24 -07:00
CI
86a808dd21 fnserver: v0.3.738 release [skip ci] v0.3.738 2019-10-17 23:54:46 +00:00
Tolga Ceylan
77e7798a54 fn: container before/after callback error handling (#1559)
If container before/after callbacks result in errors, the container
should not continue as these errors usually are not recoverable.
2019-10-17 16:47:08 -07:00
CI
d29a62f3e4 fnserver: v0.3.737 release [skip ci] v0.3.737 2019-10-03 09:19:58 +00:00
Tim Langford
10b3265a62 Prevent long timeout when invoking a function and the user has misconfigured the environment. (#1543)
Updates the RetryAllBackOff function to fail early when a request to obtain Runners cannot suceed due to a human Fn misconfiguration error. This prevents a user having to wait for the maximum timeout when the system is misconfigured.
2019-10-03 10:12:07 +01:00
CI
34c1ada12f fnserver: v0.3.736 release [skip ci] v0.3.736 2019-09-30 16:21:56 +00:00
Shreya Garge
f118aa8d2a request limit exceeded added to user errors (#1555) 2019-09-30 17:14:26 +01:00
CI
b1e8adbd13 fnserver: v0.3.735 release [skip ci] v0.3.735 2019-09-27 17:23:12 +00:00
Dhiraj Mutreja
e686ac5c3b Refactor pure_runner.go (#1554)
* Refactor pure_runner.go
  - Extract type status_tracker to status_tracker.go
  - Add initial set of tests for status_tracker

Closes #1553
2019-09-27 10:15:08 -07:00
CI
093a205269 fnserver: v0.3.734 release [skip ci] v0.3.734 2019-09-25 17:58:45 +00:00
Tolga Ceylan
08895e1ac1 fn: slot error retention is suboptimal (#1550)
* fn: agent slot error handling improvements

* fn: slot error retention is suboptimal

Current fn agent (aka runner) tries to communicate
all container start errors to clients. In order to
achieve this, the errors are retained and could be
delivered to clients that did not spawn that container.
This is OK as it tries to be transparent to callers
instead of suppressing or hiding errors. However,
these errors are retained without a time bound which
is not ideal. An old request could trigger an error
and this error can be sent to a client much later time.

With this change, the error retention semantics change.
If an errors occurs during container start and the client
which triggered the container is no longer present, we
log and discard the result.

Broadly, any error that occurs when a request is not 1-1
bound to a container is logged and discarded.

For testing these scenarios, a debug option is added
to agent to allow passing request id that triggered
the spawn of the container as an environment variable.
2019-09-25 10:50:35 -07:00
CI
7bb5c8eb05 fnserver: v0.3.733 release [skip ci] v0.3.733 2019-09-25 17:21:41 +00:00
Tolga Ceylan
7e1a5a61c1 fn: simplify CPU/Memory acquisition to avoid race condition (#1551)
Before this change, agent used GetResourceToken() calls
for both non-blocking and blocking mode. However, for
nested calls such as in agent checkLaunch(), a token might
already be queued via go-routine in getResourceTokenNBChan().
If the consuming code in checkLaunch() runs faster, than
it could place another GetResourceToken() call while the
active token is not yet closed. This can result in momentarily
double cpu/mem allocation resulting in 503 or excess wait.

With this PR, resource allocation blocking and non-blocking
interface is changed and ResourceToken is directly returned
without a channel.
2019-09-25 10:10:10 -07:00
CI
37c2c1aedc fnserver: v0.3.732 release [skip ci] v0.3.732 2019-09-23 15:20:40 +00:00
Michael J Williams
adf7528a19 Small update for homebrew install. Now consistent in all repos (#1547) 2019-09-23 09:12:50 -06:00
CI
4ccad68c80 fnserver: v0.3.731 release [skip ci] v0.3.731 2019-08-27 10:53:09 +00:00
Shreya Garge
7d0c8cd420 linear and log scale histogram bucket generators (#1542)
* linear and log scale histogram bucket generators

* moved histogram bucket generators to stats_utils package
2019-08-27 11:45:47 +01:00
CI
b5d9dad5ee fnserver: v0.3.730 release [skip ci] v0.3.730 2019-08-22 23:02:54 +00:00
Tolga Ceylan
bd1c435ceb fn: metrics cleanup (#1544)
Duration units don't need msec conversion for consistency
across, we currently do not do any conversion for execution
duration for example. Newer metrics, ctrPrepTime, imagePullWaitTime,
ctrCreateTime, initStartTime can follow the same.

Remove legacy runner reported duration calculations.
2019-08-22 15:55:32 -07:00
CI
0bff197dcd fnserver: v0.3.729 release [skip ci] v0.3.729 2019-08-15 18:08:24 +00:00
Rtvik Sriram Bharadwaj
b20aef2a37 remote runner support for trace exporters (#1538) (#1540)
* opencensus attribs for remote runner (#1538)

adding span attrib support for remote runner metrics
adding ochttp filter to group invoke calls in server.go
changing up the grpc caallfinished for display on jaeger
changed call.go, including imgpull time (if applicable)
span annotation in runner_client
add imagepulltime in protobuf
Adding span.setstatus to some spans
Moving dockerwait to mcall

* adding ctr metrics to rpcs (#1538)
Adding container create metric to grpc
adding metrics for ctr preparation duration, creation duration, init start, using atomic.load and store to avoid concurrent thread faults
adding atomic store for stats generated in runhot function
excluding / path in webserver, removing ochttp for adminserver
adding spandata in runner client to handle new proto data
2019-08-15 11:00:14 -07:00
CI
fb32eae480 fnserver: v0.3.728 release [skip ci] v0.3.728 2019-07-31 18:34:04 +00:00
Krister Johansen
e96cc07f98 Enforce more rigorous handling of input data and pipes. (#1536)
Catches and generates function errors for two new cases.  The first
occurs due to a function/FDK error.  If the function closes the read end
of the pipe that the hostagent uses to write data before the hostagent
has finished writing the data, generate an error.  This ensures that
any premature close of the input stream is detected and handled by the
hostagent.

Second, catch any cases where the function attempts to respond before
reading all of the input data from the stream.  This is safe because we
already enforce a maximum upper bound on the request body, so a function
or fdk will not have to read for an unbounded amount of time to consume
the outstanding data.  Since the container contract enforces HTTP-like
semantics, and HTTP expects the server side to wait for the response
body to arrive before responding, this is not unreasonable.  If the
function or fdk attempts to write before the end of the input stream has
been processed by the hostagent, return a different function error
indicating that a premature write has been detected.
2019-07-31 11:26:21 -07:00
CI
6cf4799410 fnserver: v0.3.727 release [skip ci] v0.3.727 2019-07-18 18:30:30 +00:00
Tolga Ceylan
7b6c3057a0 fn: switch to extensions for slot queue auth token (#1535) 2019-07-18 21:22:38 +03:00
CI
193ce7d889 fnserver: v0.3.726 release [skip ci] v0.3.726 2019-07-18 15:41:14 +00:00
Tolga Ceylan
dd0149f789 fn: provide update method for hot slot auther (#1534) 2019-07-18 18:23:02 +03:00
CI
f1de7d7a07 fnserver: v0.3.725 release [skip ci] v0.3.725 2019-07-17 10:30:29 +00:00
Andrea Rosa
e4c5cffb16 Add SyslogUnavailable as new codified error (#1532)
Add SyslogUnavailable as new codified error

We codified the SyslogUnavailable error into a public error exposed by
the models package
2019-07-17 11:23:03 +01:00