fn-serverless

mirror of https://github.com/fnproject/fn.git synced 2022-10-28 21:29:17 +03:00

Author	SHA1	Message	Date
Tolga Ceylan	f57571fb3a	fn: SSL config adjustments (#1160 ) SSL related FN_NODE_CERT (and related) settings are not very clear today. Removing this in favor of a simple map of tls.Config objects. Three keys are provided for this map: TLSGRPCServer TLSAdminServer TLSWebServer which correspond to server TLS settings for the associated services. Operators/implementers can further add more keys to the map and add their own TLS config.	2018-08-06 20:57:03 -07:00
Tolga Ceylan	fc71208063	fn: add context into to logger passed to DialWithBackoff (#1133 )	2018-07-23 13:05:30 -07:00
Tolga Ceylan	db7cbf73e2	fn: add requests received/handled in Status responses (#1132 ) This is useful as additional data to inflight requests. Callers can determine request arrival and processing rate.	2018-07-20 16:00:02 -07:00
Tolga Ceylan	564db4e9d2	fn: Status should expose if data was served from cache. (#1123 ) This is useful in scenarios where gRPC client might want to reliably observe/report the status latency metrics and remove any possible duplicates. If the status query was served from cache, then these latencies show last execution latency.	2018-07-13 17:35:00 -07:00
Tolga Ceylan	5dc5740a54	fn: runner status and docker load images (#1116 ) * fn: runner status and docker load images Introducing a function run for pure runner Status calls. Previously, Status gRPC calls returned active inflight request counts with the purpose of a simple health checker. However this is not sufficient since it does not show if agent or docker is healthy. With this change, if pure runner is configured with a status image, that image is executed through docker. The call uses zero memory/cpu/tmpsize settings to ensure resource tracker does not block it. However, operators might not always have a docker repository accessible/available for status image. Or operators might not want the status to go over the network. To allow such cases, and in general possibly caching docker images, added a new environment variable FN_DOCKER_LOAD_FILE. If this is set, fn-agent during startup will load these images that were previously saved with 'docker save' into docker.	2018-07-12 13:58:38 -07:00
Tolga Ceylan	317de18e6b	fn: lb-agent: Add Runner Scheduler/Execution Stats (#1107 ) LB agent reports lb placer latency. It should also report how long it took for the runner to initiate the call as well as execution time inside the container if the runner has accepted (committed) to the call.	2018-07-02 17:15:43 -07:00
Tolga Ceylan	e67d0e5f3f	fn: Call extensions/overriding and more customization friendly docker driver (#1065 ) In pure-runner and LB agent, service providers might want to set specific driver options. For example, to add cpu-shares to functions, LB can add the information as extensions to the Call and pass this via gRPC to runners. Runners then pick these extensions from gRPC call and pass it to driver. Using a custom driver implementation, pure-runners can process these extensions to modify docker.CreateContainerOptions. To achieve this, LB agents can now be configured using a call overrider. Pure-runners can be configured using a custom docker driver. RunnerCall and Call interfaces both expose call extensions. An example to demonstrate this is implemented in test/fn-system-tests/system_test.go which registers a call overrider for LB agent as well as a simple custom docker driver. In this example, LB agent adds a key-value to extensions and runners add this key-value as an environment variable to the container.	2018-06-18 14:42:28 -07:00
Andrea Rosa	e637661ea2	Adding a way to inject a request ID (#1046 ) * Adding a way to inject a request ID It is very useful to associate a request ID to each incoming request, this change allows to provide a function to do that via Server Option. The change comes with a default function which will generate a new request ID. The request ID is put in the request context along with a common logger which always logs the request-id We add gRPC interceptors to the server so it can get the request ID out of the gRPC metadata and put it in the common logger stored in the context so as all the log lines using the common logger from the context will have the request ID logged	2018-06-14 10:40:55 +01:00
Owen Cliffe	c6abc8bf64	Use context logging more to ensure context vars are present in log lines (#1039 )	2018-06-06 15:14:29 +01:00
Tolga Ceylan	4af53025d8	fn: lb-agent: Initial TryCall result can be retriable. (#1035 ) Before this change, we assumed data may end up in a container once we placed a TryCall() and if gRPC send failed, we did not retry. However, a send failure cannot result in data in a container, since only upon successful receipt of a TryCall can pure-runner schedule a call into a container. Here we trust gRPC and if gRPC layer says it could not send a msg, then the receiver did not receive it.	2018-06-05 14:41:13 -07:00
Tolga Ceylan	1cd5894f41	fn: LB agent: reduce 'Too Busy' error logs (#1033 ) With this PR, runner client translates too busy errors from gRPC session and runner itself into Fn error type. Placers now ignore this error message to reduce unnecessary logging.	2018-06-04 12:16:00 -07:00
Tolga Ceylan	7261ddedcc	fn: LB agent: EOF from runner is normal in nack cases (#1032 )	2018-06-04 12:10:00 -07:00
Tolga Ceylan	7f1d14d21f	fn: slot hash id must be utf8 in gRPC (#1016 )	2018-05-29 16:26:43 -07:00
Tolga Ceylan	74a5379dec	fn: lb & pure-runner slot hash id communication (#1007 ) * fn: lb & pure-runner slot hash id communication With this change, LB can pre-calculate the slot hash key and pass it to runners. If LB knows/calculates the slot hash ids, then it can also make better estimates on which runner can successfully execute it especially when status messages from runner include a small summary of idle slots for a given slot hash id. (TODO) * fn: fix mock test	2018-05-25 14:12:48 -07:00
Tolga Ceylan	77086ecc24	fn: lb-agent & runner gRPC updates (#1005 ) Breaking changes: ) Removed unused ACK/NACK definitions ) Extended Finished messages with error code/str	2018-05-17 15:02:15 -07:00
Tolga Ceylan	7cf8e2a61d	fn: pure-runner time out while waiting TryCall (#1006 ) This should return a retriable error code 503.	2018-05-17 15:00:50 -07:00
Tolga Ceylan	4ccde8897e	fn: lb and pure-runner with non-blocking agent (#989 ) * fn: lb and pure-runner with non-blocking agent ) Removed pure-runner capacity tracking code. This did not play well with internal agent resource tracker. ) In LB and runner gRPC comm, removed ACK. Now, upon TryCall, pure-runner quickly proceeds to call Submit. This is good since at this stage pure-runner already has all relevant data to initiate the call. ) Unless pure-runner emits a NACK, LB immediately streams http body to runners. ) For retriable requests added a CachedReader for http.Request Body. ) Idempotenty/retry is similar to previous code. After initial success in Engament, after attempting a TryCall, unless we receive NACK, we cannot retry that call. ) ch and naive places now wraps each TryExec with a cancellable context to clean up gRPC contexts quicker. * fn: err for simpler one-time read GetBody approach This allows for a more flexible approach since we let users to define GetBody() to allow repetitive http body read. In default LB case, LB executes a one-time io.ReadAll and sets of GetBody, which is detected by RunnerCall.RequestBody(). * fn: additional check for non-nil req.body * fn: attempt to override IO errors with ctx for TryExec * fn: system-tests log dest * fn: LB: EOF send handling * fn: logging for partial IO * fn: use buffer pool for IO storage in lb agent * fn: pure runner should use chunks for data msgs * fn: required config validations and pass APIErrors * fn: additional tests and gRPC proto simplification ) remove ACK/NACK messages as Finish message type works OK for this purpose. ) return resp in api tests for check for status code ) empty body json test in api tests for lb & pure-runner fn: buffer adjustments ) setRequestBody result handling correction ) switch to bytes.Reader for read-only safety ) io.EOF can be returned for non-nil Body in request. fn: clarify detection of 503 / Server Too Busy	2018-05-17 12:09:03 -07:00
Tolga Ceylan	c0ee3ce736	fn: locked mutex while blocked on I/O considered harmful (#935 ) * fn: mutex while waiting I/O considered harmful ) Removed hold mutex while wait I/O cases these included possible disk I/O and network I/O. ) Error/Context Close/Shutdown semantics changed since the context timeout and comments were misleading. Close always waits for pending gRPC session to complete. Context usage here was merely 'wait up to x secs to report an error' which only logs the error anyway. Instead, the runner can log the error. And context still can be passed around perhaps for future opencensus instrumentation.	2018-04-13 11:23:29 -07:00
Tolga Ceylan	623aeb35b2	fn: common.WaitGroup improvements (#940 ) * fn: common.WaitGroup improvements ) Split the API into AddSession/DoneSession ) Only wake up listeners when session count reaches zero. * fn: WaitGroup go-routine blast test * fn: test fix and rebase fixup	2018-04-12 16:21:13 -07:00
Tolga Ceylan	e53d23afc9	fn: sync.WaitGroup replacement common.WaitGroup (#937 ) * fn: sync.WaitGroup replacement common.WaitGroup agent/lb_agent/pure_runner has been incorrectly using sync.WaitGroup semantics. Switching these components to use the new common.WaitGroup() that provides a few handy functionality for common graceful shutdown cases. From https://golang.org/pkg/sync/#WaitGroup, "Note that calls with a positive delta that occur when the counter is zero must happen before a Wait. Calls with a negative delta, or calls with a positive delta that start when the counter is greater than zero, may happen at any time. Typically this means the calls to Add should execute before the statement creating the goroutine or other event to be waited for. If a WaitGroup is reused to wait for several independent sets of events, new Add calls must happen after all previous Wait calls have returned." HandleCallEnd introduces some complexity to the shutdowns, but this is currently handled by AddSession(2) initially and letting the HandleCallEnd() when to decrement by -1 in addition to decrement -1 in Submit(). lb_agent shutdown sequence and particularly timeouts with runner pool needs another look/revision, but this is outside of the scope of this commit. * fn: lb-agent wg share * fn: no need to +2 in Submit with defer. Removed defer since handleCallEnd already has this responsibility.	2018-04-12 11:33:01 -07:00
Tolga Ceylan	9b86e3626e	fn: avoid go-routine leak (#934 )	2018-04-11 12:11:08 -07:00
jan grant	88074a42c0	Bugfix/grpc consume eof (#912 ) * GRPC streams end with an EOF The client should ensure that the final packet is followed by a GRPC EOF. This has the benefit of permitting the client code to clean up resources. * Don't require an entire HTTP request in RunnerCall TryExec needs a handle on an incoming ReadCloser containing the body of a request; however, everything else will already have been extracted from the HTTP request in the case of lbAgent use. (The point of this change is to simplify the interface for other uses.) * Return error from GRPC layer explicitly As per review	2018-04-03 15:04:21 +01:00
Gerardo Viedma	348bbaf36b	support runner TLS certificates with specified certificate Common Names (#900 ) * support runner TLS certificates with specified certificate Common Names * removes duplicate constant * run in insecure mode by default but expose ability to create tls-secured runner pools programmatically * fixes runner tests to use new tls interfaces	2018-03-28 13:57:15 +01:00
Gerardo Viedma	1cae6f988e	Make PKI data and RunnerFactory public objects (#865 ) * Make PKI data and RunnerFactory public objects * removes unnecessary nullRunner object * renames secure factory to point out MTLS	2018-03-16 15:40:58 +00:00
Gerardo Viedma	73ae77614c	Moves out node pool manager behind an extension using runner pool abstraction (Part 2) (#862 ) * Move out node-pool manager and replace it with RunnerPool extension * adds extension points for runner pools in load-balanced mode * adds error to return values in RunnerPool and Runner interfaces * Implements runner pool contract with context-aware shutdown * fixes issue with range * fixes tests to use runner abstraction * adds empty test file as a workaround for build requiring go source files in top-level package * removes flappy timeout test * update docs to reflect runner pool setup * refactors system tests to use runner abstraction * removes poolmanager * moves runner interfaces from models to api/runnerpool package * Adds a second runner to pool docs example * explicitly check for request spillover to second runner in test * moves runner pool package name for system tests * renames runner pool pointer variable for consistency * pass model json to runner * automatically cast to http.ResponseWriter in load-balanced call case * allow overriding of server RunnerPool via a programmatic ServerOption * fixes return type of ResponseWriter in test * move Placer interface to runnerpool package * moves hash-based placer out of open source project * removes siphash from Gopkg.lock	2018-03-16 13:46:21 +00:00

25 Commits