1946 Commits

Author SHA1 Message Date
Tolga Ceylan
f826233b84 fn: log streamer gRPC interface for pure runner (#1437)
* fn: log streamer gRPC interface for pure runner

New gRPC interface to stream back logs with empty implementation.

* remove compartment id
2019-03-20 22:10:52 -07:00
CI
695bfbc676 fnserver: 0.3.675 release [skip ci] 2019-03-15 17:55:20 +00:00
Andrea Rosa
387c2099d5 Don't log stack trace at the top handler error if we have gRPG errors (#1424)
* Don't log stacktrace when we get gRPC errors

In the top error handler we log the stacktrace if we get back a non-API errors. Under that category we will have gRPC errors as well, there is no point to log a stacktrace for a gRPC errors where those could happen on the server side or due to some temporary network blips.
In this change we skip the stack trace log in presence of gRPC errors, plus we change some log level from Error down to Info for some retryable errors.
2019-03-15 17:46:41 +00:00
CI
2f0b4d8f0b fnserver: 0.3.674 release [skip ci] 2019-03-15 00:03:07 +00:00
Tolga Ceylan
27a1c79232 fn: add repository pull tests (#1436)
Fake repository that returns 429 once for each http get. This
verifies our expectations of docker behavior when upstream
returns 429.
2019-03-14 16:55:09 -07:00
CI
a1b3046491 fnserver: 0.3.673 release [skip ci] 2019-03-14 21:16:14 +00:00
Tolga Ceylan
96377f23da fn: docker pull retry / backoff options (#1434)
* fn: docker pull retry / backoff options

Introducing SetPullImageRetryPolicy() for docker driver to allow
customizations of docker pull behavior.

Replacing old backoff code with a more formal exponential backoff
with policy options.
2019-03-14 14:08:13 -07:00
CI
99a17649db fnserver: 0.3.672 release [skip ci] 2019-03-14 00:28:07 +00:00
Tolga Ceylan
b774e7a459 fn: introducing a new image pull layer (#1433)
* fn: introducing a new image pull layer

Due to unnecessary http traffic caused by docker internally, with
this PR a new image pull layer is introduced. This allows us to
serialize same image docker pulls using a simple active transfer
with list of listener model.

The timeout behavior is slightly different when multiple listeners
are waiting. The timeout from first listener is emitted to all
listeners. However since the docker-pull timeout is globally
configured, overall timeout behavior is essentially the same.
2019-03-13 17:20:14 -07:00
CI
c938172fc7 fnserver: 0.3.671 release [skip ci] 2019-03-13 21:47:44 +00:00
Srinidhi Chokkadi Puranik
cafbf463d5 Add API to configure runner (#1432)
We need the ability to configure a remote runner. The data we pass is implementation specific, so we keep it generic enough. The pure_runner implementation dumps the data into a file. The file path is configurable.
2019-03-13 14:39:52 -07:00
CI
e018231212 fnserver: 0.3.670 release [skip ci] 2019-03-08 09:39:51 +00:00
Andrea Rosa
77d97d7def Pass Request to CallOverrider (#1430)
With this change CallOverrider get the request as its first parameter.
We know that we want to get rid of it but there are some cases where the
CallOverrider at the moment is the only place where we can customize a
call before it gets executed until we find a solution for the problem
reported in issue #1426.

Having access to the original request it could be handy during in
CallOverrider.
2019-03-08 09:31:08 +00:00
CI
67abf38b48 fnserver: 0.3.669 release [skip ci] 2019-03-04 21:37:56 +00:00
Tolga Ceylan
30f5489d68 fn: make evictor less aggressive (#1422)
* fn: make evictor less aggressive

Experiments in load testing revealed that eviction of
containers that are starting is detrimental since this
increases churn and acts as a feedback mechanism
increasing docker API rates. Starting and deleting
a container are one of the most expensive docker API
calls and traffic arriving on a busy/loaded server
can trigger flood of container create actions that
get canceled with evictions. Previous code tried
to avoid this by tracking original request that
triggered the container start, but based on the
data this does not seem enough to avoid flurry
of evictions. In other words, in a busy system,
the likelihood of the original request getting
quickly serviced by another container is not
that rare.

With this change, we restrict evictability of
a container to Idle state exclusively. This makes
the backpressure (503) more likely since it
allows starting containers to initialize.

A race condition that occasionally causes busy
container eviction is also fixed. The fix proposed
here is that we unblock the listeners as if busy-container
really got evicted, but in busy container we simply
refresh the eviction token and get rid of the evicted
old token.

In future, based on empirical data, we may consider
introducing evictions for slow docker pulls.
2019-03-04 13:29:19 -08:00
CI
767d5d3d39 fnserver: 0.3.668 release [skip ci] 2019-03-01 01:02:20 +00:00
Andrea Rosa
4f5537c9d7 Adding callOpts to lbAgent (#1423)
In the lbAgent at the moment it is not possible to add additional call
options other than the ones supplied to the GetCall.
This change adds a new WtihLBCallOptions functions to add that
capability to the lbAgent.
2019-02-28 16:53:58 -08:00
CI
a7459ee0fc fnserver: 0.3.667 release [skip ci] 2019-02-28 09:37:14 +00:00
Andrea Rosa
20e551b033 Send RunnerMsg_ResultStart message (#1414)
* Send RunnerMsg_ResultStart message

This change adds a call to a function to send the RunnerMsg_ResultStart
message during the enqueueCallResponse function.
The RunnerMsg_ResultStart contains any headers set by the function and
the status code.
This fixes the case where we don't send custom headers if a function
doesn't return a body.

Fixes #1413
2019-02-28 09:28:40 +00:00
CI
197ce419ca fnserver: 0.3.666 release [skip ci] 2019-02-25 21:40:32 +00:00
Reed Allman
a36188bdcb don't log db password (#1420)
closes #1419
2019-02-25 13:30:56 -08:00
CI
16e4e2d7ab fnserver: 0.3.665 release [skip ci] 2019-02-22 19:41:09 +00:00
CI
c834c053f7 fnserver: 0.3.664 release [skip ci] 2019-02-22 19:17:56 +00:00
Tolga Ceylan
7612ec2651 fn: set headers before write header in http response (#1415) 2019-02-22 11:08:30 -08:00
CI
3e4bd6b5aa fnserver: 0.3.663 release [skip ci] 2019-02-15 18:25:05 +00:00
Tomas Knappek
1dca2b5219 Handle syslog error as an user error (#1402) 2019-02-15 10:17:09 -08:00
CI
63e9faed06 fnserver: 0.3.662 release [skip ci] 2019-02-14 20:42:34 +00:00
Tolga Ceylan
da6631b867 fn: build fixup (#1407) 2019-02-14 12:34:33 -08:00
Tolga Ceylan
73778de0ae fn: clean docker retry logic and legacy code (#1401)
* fn: clean docker retry logic and legacy code

Cleanup of docker retry error handling logic. The code
is likely due to older less stable docker versions. Instead,
we would like to expose any underlying issues and fix them
as necessary.

Examination of logs/events from last 3-4 weeks do not
show any occurrence of retries/errors and on the contrary
causing more issues (eg. retry of docker-pull if 5xx)
2019-02-14 11:53:13 -08:00
Reed Allman
09977fc156 plumb docker auther to the depths (#1396)
* plumb docker auther to the depths

this patch accomplishes a few things wrt the docker auth interface used for
configuring credentials to be used for pull:

* adds context for timing out the docker auth call
* no longer call docker auth before validating an image, we were using this to
validate that a user still has access to an image that we have locally. this
policy itself could be TBD but we're being flagged for hitting registries too
hard, and runners have bounded lifetimes on the order of hours, so we decided
this is ok. now we only call docker auth function when pulling an image, and
it's time used is included in the pull timeout we've so delicately plumbed
* adds ability for agents to add call configurations to every call created via
getcall, this ought to help allow fn extenders configure calls and puts us on
the path of getting rid of the call overrider (and is enough to be able to
replace our current one that is causing issues)
* adds an option to configure calls to have a docker auth function they
provide, so that fn extenders can plumb this thing into here without having to
fork the driver (which would be absurd!)
* some lints...

* fix tests?

* add the image to docker auth

this can come in handy for a provided docker auth function to get credentials
differently based on registry version, registry host, etc and it makes sense
in all contexts of usage I think here
2019-02-14 09:42:40 -08:00
CI
6c43ac0d58 fnserver: 0.3.661 release [skip ci] 2019-02-05 20:09:46 +00:00
Tomas Knappek
27c1814cee Prevent in-built docker VOLUME commands (#1378) 2019-02-05 12:01:49 -08:00
CI
6792511403 fnserver: 0.3.660 release [skip ci] 2019-01-31 23:15:19 +00:00
Tolga Ceylan
2e4396e253 fn: status health checker stats update (#1393)
Switch all tags to tri-state to reduce confusion and add
network state.
2019-01-31 15:07:15 -08:00
CI
10c018baef fnserver: 0.3.659 release [skip ci] 2019-01-31 20:18:39 +00:00
Tolga Ceylan
a104bf313e fn: track network availability for runner status checks (#1383) 2019-01-31 11:51:19 -08:00
CI
2f5f3ee14b fnserver: 0.3.658 release [skip ci] 2019-01-30 19:02:23 +00:00
Tolga Ceylan
6bf4da0faa fn: docker-pull are user errors (#1391)
Classify docker pull errors as user errors and
log with info log level.
2019-01-30 10:53:49 -08:00
Tolga Ceylan
67634b104b fn: remove old test related images (#1390)
Removing hello-world, fnproject/hello images from tests. Replacing
these with busybox which we use in the project.

In drivers.ContainerTask Timeout is obsoleted code from cold
containers, which we no longer have.
2019-01-25 14:49:38 -08:00
CI
fcd058707c fnserver: 0.3.657 release [skip ci] 2019-01-25 00:14:36 +00:00
CI
d20dbdde6e fnserver: 0.3.656 release [skip ci] 2019-01-24 21:44:05 +00:00
Tolga Ceylan
882fd1d723 fn: retriable error handling in runner client (#1384)
During gRPC communication, previously we assumed that
if Send(Try) message failure in TryExec(), can always
be retried. However, this is not robust as we cannot assume
no data was written to wire. With this change, before we
can conclude that the call can be retried, we also check
Unavailable error code.
2019-01-24 13:35:10 -08:00
CI
2e7f829c4e fnserver: 0.3.655 release [skip ci] 2019-01-24 18:29:31 +00:00
CI
68d6d96254 fnserver: 0.3.654 release [skip ci] 2019-01-23 16:42:21 +00:00
CI
6b8268e19b fnserver: 0.3.653 release [skip ci] 2019-01-22 21:12:28 +00:00
Tolga Ceylan
49c8c98de3 fn: startup container cleanup timeout (#1382)
ListContainers call which is asynchronously spawned
during docker driver start needs to have a reasonable
timeout and should be retried if timeout expires.
2019-01-22 13:04:40 -08:00
CI
6e92377bf1 fnserver: 0.3.652 release [skip ci] 2019-01-17 22:42:06 +00:00
Tolga Ceylan
2539a86652 fn: driver cookie interface changes for finer control (#1380)
* fn: driver cookie interface changes for finer control

Users of driver cookie, as seen in agent.go require specific
timeouts in contexts passed to the different operations. For
example, cookie.PullImage() context shows intent to apply
timeout/duration for the pull operation. To clarify this
and avoid tricky timeout related issues, removing implicit
docker-inspect in cookie.PullImage(). In addition, separating
the Authentication step as Auther interface calls unknown
code.

Additional changes to cookie implementation to clarify
inspected image, created container and authentication
configuration, where each AuthImage(), ValidateImage()
and CreateContainer() steps initialize the corresponding
pointer object in the cookie.

Changes now require ValidateImage() call if a PullImage()
is attempted. In all cases, AuthImage() call is required
before ValidateImage().

An example usage is as follows:

        cookie, err := driver.CreateCookie(ctx, task)
        err = cookie.AuthImage(ctx)
        pull, err := cookie.ValidateImage()
        if pull {
                err = cookie.PullImage(ctx)
                pull, err = cookie.ValidateImage()
        }

        err = cookie.CreateContainer(ctx)
        waiter := cookie.Run(ctx)
        waiter.Wait(ctx)

error handling is omitted for clarity in the above example.
2019-01-17 14:30:41 -08:00
CI
46922908d5 fnserver: 0.3.651 release [skip ci] 2019-01-17 14:06:13 +00:00
Gerardo Viedma
dffbde28b4 Rework call placement/finish logs to take advantage of user blame tag (#1381)
* Improve Call finished logging and remove extraneous 'Failed during call placement' line
2019-01-17 13:58:14 +00:00