With this PR, runner client translates too busy errors
from gRPC session and runner itself into Fn error type.
Placers now ignore this error message to reduce unnecessary
logging.
* fn: user friendly timeout handling changes
Timeout setting in routes now means "maximum amount
of time a function can run in a container".
Total wait time for a given http request is now expected
to be handled by the client. As long as the client waits,
the LB, runner or agents will search for resources to
schedule it.
* fn: lb & pure-runner slot hash id communication
With this change, LB can pre-calculate the slot hash
key and pass it to runners. If LB knows/calculates
the slot hash ids, then it can also make better
estimates on which runner can successfully execute
it especially when status messages from runner
include a small summary of idle slots for a given
slot hash id. (TODO)
* fn: fix mock test
* fn: lb and pure-runner with non-blocking agent
*) Removed pure-runner capacity tracking code. This did
not play well with internal agent resource tracker.
*) In LB and runner gRPC comm, removed ACK. Now,
upon TryCall, pure-runner quickly proceeds to call
Submit. This is good since at this stage pure-runner
already has all relevant data to initiate the call.
*) Unless pure-runner emits a NACK, LB immediately
streams http body to runners.
*) For retriable requests added a CachedReader for
http.Request Body.
*) Idempotenty/retry is similar to previous code.
After initial success in Engament, after attempting
a TryCall, unless we receive NACK, we cannot retry
that call.
*) ch and naive places now wraps each TryExec with
a cancellable context to clean up gRPC contexts
quicker.
* fn: err for simpler one-time read GetBody approach
This allows for a more flexible approach since we let
users to define GetBody() to allow repetitive http body
read. In default LB case, LB executes a one-time io.ReadAll
and sets of GetBody, which is detected by RunnerCall.RequestBody().
* fn: additional check for non-nil req.body
* fn: attempt to override IO errors with ctx for TryExec
* fn: system-tests log dest
* fn: LB: EOF send handling
* fn: logging for partial IO
* fn: use buffer pool for IO storage in lb agent
* fn: pure runner should use chunks for data msgs
* fn: required config validations and pass APIErrors
* fn: additional tests and gRPC proto simplification
*) remove ACK/NACK messages as Finish message type works
OK for this purpose.
*) return resp in api tests for check for status code
*) empty body json test in api tests for lb & pure-runner
* fn: buffer adjustments
*) setRequestBody result handling correction
*) switch to bytes.Reader for read-only safety
*) io.EOF can be returned for non-nil Body in request.
* fn: clarify detection of 503 / Server Too Busy
* fn: mutex while waiting I/O considered harmful
*) Removed hold mutex while wait I/O cases these
included possible disk I/O and network I/O.
*) Error/Context Close/Shutdown semantics changed since
the context timeout and comments were misleading. Close
always waits for pending gRPC session to complete.
Context usage here was merely 'wait up to x secs to
report an error' which only logs the error anyway.
Instead, the runner can log the error. And context
still can be passed around perhaps for future opencensus
instrumentation.
* fn: reduce lbagent and agent dependency
lbagent and agent code is too dependent. This causes
any changed in agent to break lbagent. In reality, for
LB there should be no delegated agent. Splitting these
two will cause some code duplication, but it reduces
dependency and complexity (eg. agent without docker)
* fn: post rebase fixup
* fn: runner/runnercall should use lbDeadline
* fn: fixup ln agent test
* fn: remove agent create option for common.WaitGroup
* GRPC streams end with an EOF
The client should ensure that the final packet is followed by a GRPC
EOF. This has the benefit of permitting the client code to clean up resources.
* Don't require an entire HTTP request in RunnerCall
TryExec needs a handle on an incoming ReadCloser containing the body
of a request; however, everything else will already have been extracted
from the HTTP request in the case of lbAgent use.
(The point of this change is to simplify the interface for other uses.)
* Return error from GRPC layer explicitly
As per review
* support runner TLS certificates with specified certificate Common Names
* removes duplicate constant
* run in insecure mode by default but expose ability to create tls-secured runner pools programmatically
* fixes runner tests to use new tls interfaces
* Move out node-pool manager and replace it with RunnerPool extension
* adds extension points for runner pools in load-balanced mode
* adds error to return values in RunnerPool and Runner interfaces
* Implements runner pool contract with context-aware shutdown
* fixes issue with range
* fixes tests to use runner abstraction
* adds empty test file as a workaround for build requiring go source files in top-level package
* removes flappy timeout test
* update docs to reflect runner pool setup
* refactors system tests to use runner abstraction
* removes poolmanager
* moves runner interfaces from models to api/runnerpool package
* Adds a second runner to pool docs example
* explicitly check for request spillover to second runner in test
* moves runner pool package name for system tests
* renames runner pool pointer variable for consistency
* pass model json to runner
* automatically cast to http.ResponseWriter in load-balanced call case
* allow overriding of server RunnerPool via a programmatic ServerOption
* fixes return type of ResponseWriter in test
* move Placer interface to runnerpool package
* moves hash-based placer out of open source project
* removes siphash from Gopkg.lock