Commit Graph

19 Commits

Author SHA1 Message Date
Tolga Ceylan
e67d0e5f3f fn: Call extensions/overriding and more customization friendly docker driver (#1065)
In pure-runner and LB agent, service providers might want to set specific driver options.

For example, to add cpu-shares to functions, LB can add the information as extensions
to the Call and pass this via gRPC to runners. Runners then pick these extensions from
gRPC call and pass it to driver. Using a custom driver implementation, pure-runners can
process these extensions to modify docker.CreateContainerOptions.

To achieve this, LB agents can now be configured using a call overrider.

Pure-runners can be configured using a custom docker driver.

RunnerCall and Call interfaces both expose call extensions.

An example to demonstrate this is implemented in test/fn-system-tests/system_test.go
which registers a call overrider for LB agent as well as a simple custom docker driver.
In this example, LB agent adds a key-value to extensions and runners add this key-value
as an environment variable to the container.
2018-06-18 14:42:28 -07:00
Andrea Rosa
e637661ea2 Adding a way to inject a request ID (#1046)
* Adding a way to inject a request ID

It is very useful to associate a request ID to each incoming request,
this change allows to provide a function to do that via Server Option.
The change comes with a default function which will generate a new
request ID. The request ID is put in the request context along with a
common logger which always logs the request-id

We add gRPC interceptors to the server so it can get the request ID out
of the gRPC metadata and put it in the common logger stored in the
context so as all the log lines using the common logger from the context
will have the request ID logged
2018-06-14 10:40:55 +01:00
Owen Cliffe
c6abc8bf64 Use context logging more to ensure context vars are present in log lines (#1039) 2018-06-06 15:14:29 +01:00
Tolga Ceylan
4af53025d8 fn: lb-agent: Initial TryCall result can be retriable. (#1035)
Before this change, we assumed data may end up in a container
once we placed a TryCall() and if gRPC send failed, we did not
retry. However, a send failure cannot result in data in a
container, since only upon successful receipt of a TryCall can
pure-runner schedule a call into a container. Here we trust
gRPC and if gRPC layer says it could not send a msg, then
the receiver did not receive it.
2018-06-05 14:41:13 -07:00
Tolga Ceylan
1cd5894f41 fn: LB agent: reduce 'Too Busy' error logs (#1033)
With this PR, runner client translates too busy errors
from gRPC session and runner itself into Fn error type.
Placers now ignore this error message to reduce unnecessary
logging.
2018-06-04 12:16:00 -07:00
Tolga Ceylan
7261ddedcc fn: LB agent: EOF from runner is normal in nack cases (#1032) 2018-06-04 12:10:00 -07:00
Tolga Ceylan
7f1d14d21f fn: slot hash id must be utf8 in gRPC (#1016) 2018-05-29 16:26:43 -07:00
Tolga Ceylan
74a5379dec fn: lb & pure-runner slot hash id communication (#1007)
* fn: lb & pure-runner slot hash id communication

With this change, LB can pre-calculate the slot hash
key and pass it to runners. If LB knows/calculates
the slot hash ids, then it can also make better
estimates on which runner can successfully execute
it especially when status messages from runner
include a small summary of idle slots for a given
slot hash id. (TODO)

* fn: fix mock test
2018-05-25 14:12:48 -07:00
Tolga Ceylan
77086ecc24 fn: lb-agent & runner gRPC updates (#1005)
Breaking changes:

*) Removed unused ACK/NACK definitions
*) Extended Finished messages with error code/str
2018-05-17 15:02:15 -07:00
Tolga Ceylan
7cf8e2a61d fn: pure-runner time out while waiting TryCall (#1006)
This should return a retriable error code 503.
2018-05-17 15:00:50 -07:00
Tolga Ceylan
4ccde8897e fn: lb and pure-runner with non-blocking agent (#989)
* fn: lb and pure-runner with non-blocking agent

*) Removed pure-runner capacity tracking code. This did
not play well with internal agent resource tracker.
*) In LB and runner gRPC comm, removed ACK. Now,
upon TryCall, pure-runner quickly proceeds to call
Submit. This is good since at this stage pure-runner
already has all relevant data to initiate the call.
*) Unless pure-runner emits a NACK, LB immediately
streams http body to runners.
*) For retriable requests added a CachedReader for
http.Request Body.
*) Idempotenty/retry is similar to previous code.
After initial success in Engament, after attempting
a TryCall, unless we receive NACK, we cannot retry
that call.
*) ch and naive places now wraps each TryExec with
a cancellable context to clean up gRPC contexts
quicker.

* fn: err for simpler one-time read GetBody approach

This allows for a more flexible approach since we let
users to define GetBody() to allow repetitive http body
read. In default LB case, LB executes a one-time io.ReadAll
and sets of GetBody, which is detected by RunnerCall.RequestBody().

* fn: additional check for non-nil req.body

* fn: attempt to override IO errors with ctx for TryExec

* fn: system-tests log dest

* fn: LB: EOF send handling

* fn: logging for partial IO

* fn: use buffer pool for IO storage in lb agent

* fn: pure runner should use chunks for data msgs

* fn: required config validations and pass APIErrors

* fn: additional tests and gRPC proto simplification

*) remove ACK/NACK messages as Finish message type works
OK for this purpose.
*) return resp in api tests for check for status code
*) empty body json test in api tests for lb & pure-runner

* fn: buffer adjustments

*) setRequestBody result handling correction
*) switch to bytes.Reader for read-only safety
*) io.EOF can be returned for non-nil Body in request.

* fn: clarify detection of 503 / Server Too Busy
2018-05-17 12:09:03 -07:00
Tolga Ceylan
c0ee3ce736 fn: locked mutex while blocked on I/O considered harmful (#935)
* fn: mutex while waiting I/O considered harmful

*) Removed hold mutex while wait I/O cases these
included possible disk I/O and network I/O.

*) Error/Context Close/Shutdown semantics changed since
the context timeout and comments were misleading. Close
always waits for pending gRPC session to complete.
Context usage here was merely 'wait up to x secs to
report an error' which only logs the error anyway.
Instead, the runner can log the error. And context
still can be passed around perhaps for future opencensus
instrumentation.
2018-04-13 11:23:29 -07:00
Tolga Ceylan
623aeb35b2 fn: common.WaitGroup improvements (#940)
* fn: common.WaitGroup improvements

*) Split the API into AddSession/DoneSession
*) Only wake up listeners when session count reaches zero.

* fn: WaitGroup go-routine blast test

* fn: test fix and rebase fixup
2018-04-12 16:21:13 -07:00
Tolga Ceylan
e53d23afc9 fn: sync.WaitGroup replacement common.WaitGroup (#937)
* fn: sync.WaitGroup replacement common.WaitGroup

agent/lb_agent/pure_runner has been incorrectly using
sync.WaitGroup semantics. Switching these components to
use the new common.WaitGroup() that provides a few handy
functionality for common graceful shutdown cases.

From https://golang.org/pkg/sync/#WaitGroup,
    "Note that calls with a positive delta that occur when the counter
     is zero must happen before a Wait. Calls with a negative delta,
     or calls with a positive delta that start when the counter is
     greater than zero, may happen at any time. Typically this means
     the calls to Add should execute before the statement creating
     the goroutine or other event to be waited for. If a WaitGroup
     is reused to wait for several independent sets of events,
     new Add calls must happen after all previous Wait calls have
     returned."

HandleCallEnd introduces some complexity to the shutdowns, but this
is currently handled by AddSession(2) initially and letting the
HandleCallEnd() when to decrement by -1 in addition to decrement -1 in
Submit().

lb_agent shutdown sequence and particularly timeouts with runner pool
needs another look/revision, but this is outside of the scope of this
commit.

* fn: lb-agent wg share

* fn: no need to +2 in Submit with defer.

Removed defer since handleCallEnd already has
this responsibility.
2018-04-12 11:33:01 -07:00
Tolga Ceylan
9b86e3626e fn: avoid go-routine leak (#934) 2018-04-11 12:11:08 -07:00
jan grant
88074a42c0 Bugfix/grpc consume eof (#912)
* GRPC streams end with an EOF

The client should ensure that the final packet is followed by a GRPC
EOF. This has the benefit of permitting the client code to clean up resources.

* Don't require an entire HTTP request in RunnerCall

TryExec needs a handle on an incoming ReadCloser containing the body
of a request; however, everything else will already have been extracted
from the HTTP request in the case of lbAgent use.

(The point of this change is to simplify the interface for other uses.)

* Return error from GRPC layer explicitly

As per review
2018-04-03 15:04:21 +01:00
Gerardo Viedma
348bbaf36b support runner TLS certificates with specified certificate Common Names (#900)
* support runner TLS certificates with specified certificate Common Names

* removes duplicate constant

* run in insecure mode by default but expose ability to create tls-secured runner pools programmatically

* fixes runner tests to use new tls interfaces
2018-03-28 13:57:15 +01:00
Gerardo Viedma
1cae6f988e Make PKI data and RunnerFactory public objects (#865)
* Make PKI data and RunnerFactory public objects

* removes unnecessary nullRunner object

* renames secure factory to point out MTLS
2018-03-16 15:40:58 +00:00
Gerardo Viedma
73ae77614c Moves out node pool manager behind an extension using runner pool abstraction (Part 2) (#862)
* Move out node-pool manager and replace it with RunnerPool extension

* adds extension points for runner pools in load-balanced mode

* adds error to return values in RunnerPool and Runner interfaces

* Implements runner pool contract with context-aware shutdown

* fixes issue with range

* fixes tests to use runner abstraction

* adds empty test file as a workaround for build requiring go source files in top-level package

* removes flappy timeout test

* update docs to reflect runner pool setup

* refactors system tests to use runner abstraction

* removes poolmanager

* moves runner interfaces from models to api/runnerpool package

* Adds a second runner to pool docs example

* explicitly check for request spillover to second runner in test

* moves runner pool package name for system tests

* renames runner pool pointer variable for consistency

* pass model json to runner

* automatically cast to http.ResponseWriter in load-balanced call case

* allow overriding of server RunnerPool via a programmatic ServerOption

* fixes return type of ResponseWriter in test

* move Placer interface to runnerpool package

* moves hash-based placer out of open source project

* removes siphash from Gopkg.lock
2018-03-16 13:46:21 +00:00