Commit Graph

15 Commits

Author SHA1 Message Date
Tolga Ceylan
9f29d824d6 fn: New timeout for LB Placer (#1137)
* fn: New timeout for LB Placer

Previously, LB Placers worked hard as long as
client contexts allowed for. Adding a Placer
config setting to bound this by 360 seconds by
default.

The new timeout is not accounted during actual
function execution and only applies to the amount
of wait time in Placers when the call is not
being executed.
2018-07-26 10:19:25 -07:00
Tolga Ceylan
5dc5740a54 fn: runner status and docker load images (#1116)
* fn: runner status and docker load images

Introducing a function run for pure runner Status
calls. Previously, Status gRPC calls returned active
inflight request counts with the purpose of a simple
health checker. However this is not sufficient since
it does not show if agent or docker is healthy. With
this change, if pure runner is configured with a status
image, that image is executed through docker. The
call uses zero memory/cpu/tmpsize settings to ensure
resource tracker does not block it.

However, operators might not always have a docker
repository accessible/available for status image. Or
operators might not want the status to go over the
network. To allow such cases, and in general possibly
caching docker images, added a new environment variable
FN_DOCKER_LOAD_FILE. If this is set, fn-agent during
startup will load these images that were previously
saved with 'docker save' into docker.
2018-07-12 13:58:38 -07:00
Tolga Ceylan
e67d0e5f3f fn: Call extensions/overriding and more customization friendly docker driver (#1065)
In pure-runner and LB agent, service providers might want to set specific driver options.

For example, to add cpu-shares to functions, LB can add the information as extensions
to the Call and pass this via gRPC to runners. Runners then pick these extensions from
gRPC call and pass it to driver. Using a custom driver implementation, pure-runners can
process these extensions to modify docker.CreateContainerOptions.

To achieve this, LB agents can now be configured using a call overrider.

Pure-runners can be configured using a custom docker driver.

RunnerCall and Call interfaces both expose call extensions.

An example to demonstrate this is implemented in test/fn-system-tests/system_test.go
which registers a call overrider for LB agent as well as a simple custom docker driver.
In this example, LB agent adds a key-value to extensions and runners add this key-value
as an environment variable to the container.
2018-06-18 14:42:28 -07:00
Tolga Ceylan
a57907eed0 fn: user friendly timeout handling changes (#1021)
* fn: user friendly timeout handling changes

Timeout setting in routes now means "maximum amount
of time a function can run in a container".

Total wait time for a given http request is now expected
to be handled by the client. As long as the client waits,
the LB, runner or agents will search for resources to
schedule it.
2018-06-01 13:18:13 -07:00
Tolga Ceylan
74a5379dec fn: lb & pure-runner slot hash id communication (#1007)
* fn: lb & pure-runner slot hash id communication

With this change, LB can pre-calculate the slot hash
key and pass it to runners. If LB knows/calculates
the slot hash ids, then it can also make better
estimates on which runner can successfully execute
it especially when status messages from runner
include a small summary of idle slots for a given
slot hash id. (TODO)

* fn: fix mock test
2018-05-25 14:12:48 -07:00
Tolga Ceylan
f0f9a6d945 fn: LB ch and naive fixes (#942)
* fn: LB ch and naive fixes

*) Naive is now a naive RR algorithm.
*) Both now checks for ctx/timeout in each attempt.

* fn: test fix
2018-05-07 11:50:16 -07:00
Tolga Ceylan
c0ee3ce736 fn: locked mutex while blocked on I/O considered harmful (#935)
* fn: mutex while waiting I/O considered harmful

*) Removed hold mutex while wait I/O cases these
included possible disk I/O and network I/O.

*) Error/Context Close/Shutdown semantics changed since
the context timeout and comments were misleading. Close
always waits for pending gRPC session to complete.
Context usage here was merely 'wait up to x secs to
report an error' which only logs the error anyway.
Instead, the runner can log the error. And context
still can be passed around perhaps for future opencensus
instrumentation.
2018-04-13 11:23:29 -07:00
Tolga Ceylan
e47d55056a fn: reduce lbagent and agent dependency (#938)
* fn: reduce lbagent and agent dependency

lbagent and agent code is too dependent. This causes
any changed in agent to break lbagent. In reality, for
LB there should be no delegated agent. Splitting these
two will cause some code duplication, but it reduces
dependency and complexity (eg. agent without docker)

* fn: post rebase fixup

* fn: runner/runnercall should use lbDeadline

* fn: fixup ln agent test

* fn: remove agent create option for common.WaitGroup
2018-04-12 15:51:58 -07:00
Tolga Ceylan
600dce5b2c fn: lb_agent_test tweak (#931)
Time.Sleep() blocking fixes in placers (naive and ch) improves
some of the timing in processing, therefore reducing the max
calls settings in mock runner pool.
2018-04-10 18:28:47 -07:00
Tolga Ceylan
e7658db822 Move ch ring placement back from old FnLB. (#930)
* fn: bring back CH ring placer into FN repo based on original FnLB
* fn: move placement code into runnerpool directory
2018-04-10 17:26:24 -07:00
jan grant
88074a42c0 Bugfix/grpc consume eof (#912)
* GRPC streams end with an EOF

The client should ensure that the final packet is followed by a GRPC
EOF. This has the benefit of permitting the client code to clean up resources.

* Don't require an entire HTTP request in RunnerCall

TryExec needs a handle on an incoming ReadCloser containing the body
of a request; however, everything else will already have been extracted
from the HTTP request in the case of lbAgent use.

(The point of this change is to simplify the interface for other uses.)

* Return error from GRPC layer explicitly

As per review
2018-04-03 15:04:21 +01:00
Gerardo Viedma
348bbaf36b support runner TLS certificates with specified certificate Common Names (#900)
* support runner TLS certificates with specified certificate Common Names

* removes duplicate constant

* run in insecure mode by default but expose ability to create tls-secured runner pools programmatically

* fixes runner tests to use new tls interfaces
2018-03-28 13:57:15 +01:00
Gerardo Viedma
101236f7d8 Remove npm remnants (#882)
* create an Annotation map of the right size to avoid resizing

* removes all references to deprecated nodepool manager
2018-03-23 10:29:32 +00:00
Gerardo Viedma
1cae6f988e Make PKI data and RunnerFactory public objects (#865)
* Make PKI data and RunnerFactory public objects

* removes unnecessary nullRunner object

* renames secure factory to point out MTLS
2018-03-16 15:40:58 +00:00
Gerardo Viedma
73ae77614c Moves out node pool manager behind an extension using runner pool abstraction (Part 2) (#862)
* Move out node-pool manager and replace it with RunnerPool extension

* adds extension points for runner pools in load-balanced mode

* adds error to return values in RunnerPool and Runner interfaces

* Implements runner pool contract with context-aware shutdown

* fixes issue with range

* fixes tests to use runner abstraction

* adds empty test file as a workaround for build requiring go source files in top-level package

* removes flappy timeout test

* update docs to reflect runner pool setup

* refactors system tests to use runner abstraction

* removes poolmanager

* moves runner interfaces from models to api/runnerpool package

* Adds a second runner to pool docs example

* explicitly check for request spillover to second runner in test

* moves runner pool package name for system tests

* renames runner pool pointer variable for consistency

* pass model json to runner

* automatically cast to http.ResponseWriter in load-balanced call case

* allow overriding of server RunnerPool via a programmatic ServerOption

* fixes return type of ResponseWriter in test

* move Placer interface to runnerpool package

* moves hash-based placer out of open source project

* removes siphash from Gopkg.lock
2018-03-16 13:46:21 +00:00