Commit Graph

274 Commits

Author SHA1 Message Date
Reed Allman
2b797a556a update docs with pro tips for fdk http stream people (#1211)
* update docs with pro tips for fdk http stream people

* fix bug where container could die before uds wait

we used to hang out for an hour. oopsie, thanks Owen
2018-09-14 16:54:18 +01:00
Reed Allman
3a9c48b8a3 http-stream format (#1202)
* POC code for inotify UDS-io-socket

* http-stream format

introducing the `http-stream` format support in fn. there are many details for
this, none of which can be linked from github :( -- docs are coming (I could
even try to add some here?). this is kinda MVP-ish level, but does not
implement the remaining spec, ie 'headers' fixing up / invoke fixing up. the
thinking being we can land this to test fdks / cli with and start splitting
work up on top of this. all other formats work the same as previous (no
breakage, only new stuff)

with the cli you can set `format: http-stream` and deploy, and then invoke a
function via the `http-stream` format. this uses unix domain socket (uds) on
the container instead of previous stdin/stdout, and fdks will have to support
this in a new fashion (will see about getting docs on here). fdk-go works,
which is here: https://github.com/fnproject/fdk-go/pull/30 . the output looks
the same as an http format function when invoking a function. wahoo.

there's some amount of stuff we can clean up here, enumerated:

* the cleanup of the sock files is iffy, high pri here

* permissions are a pain in the ass and i punted on dealing with them. you can
run `sudo ./fnserver` if running locally, it may/may not work in dind(?) ootb

* no pipe usage at all (yay), still could reduce buffer usage around the pipe
behavior, we could clean this up potentially before removal (and tests)

* my brain can’t figure out if dispatchOldFormats changes pipe behavior, but
tests work

* i marked XXX to do some clean up which will follow soon… need this to test fdk
tho so meh, any thoughts on those marked would be appreciated however (1 less
decision for me). mostly happy w/ general shape/plumbing tho

* there are no tests atm, this is a tricky dance indeed. attempts were made.
need to futz with the permission stuff before committing to adding any tests
here, which I don't like either. also, need to get the fdk-go based test image
updated according to the fdk-go, and there's a dance there too. rumba time..

* delaying the big big cleanup until we have good enough fdk support to kill
all the other formats.

open to ideas on how to maneuver landing stuff...

* fix unmount

* see if the tests work on ci...

* add call id header

* fix up makefile

* add configurable iofs opts

* add format file describing http-stream contract

* rm some cruft

* default iofs to /tmp, remove mounting

out of the box fn we can't mount. /tmp will provide a memory backed fs for us
on most systems, this will be fine for local developing and this can be
configured to be wherever for anyone that wants to make things more difficult
for themselves.

also removes the mounting, this has to be done as root. we can't do this in
the oss fn (short of requesting root, but no). in the future, we may want to
have a knob here to have a function that can be configured in fn that allows
further configuration here. since we don't know what we need in this dept
really, not doing that yet (it may be the case that it could be done
operationally outside of fn, eg, but not if each directory needs to be
configured itself, which seems likely, anyway...)

* add WIP note just in case...
2018-09-14 10:59:12 +01:00
Tolga Ceylan
4dcdb7d982 fn: paused and evicted container stats (#1209)
* fn: paused and evicted container stats

With this change, now stats reports paused state
as well as incidents of container exit due to evictions.

* fn: update/document state transitions in state tracker

There's no case of a transition moving from done to waiting. This
must be deprecated behavior.
2018-09-13 16:24:26 -07:00
Tolga Ceylan
586d5c4735 fn: make call.End() to blocking to reduce complexity (#1208)
agent/lb-agent/runner roles execute call.End() in the background
in some cases to reduce latency. With this change, we simplify this
and switch to non-background execution of call.End(). This fixes
hard to detect issues such as non-deterministic calculation of
call.CompletedAt or incomplete Call.Stats in runners.

Downstream projects if impacted by the now blocking call.End()
latency should take steps to handle this according to their requirements.
2018-09-13 11:28:11 +01:00
Tom Coupland
a0ccc4d7c4 Copy logs up to v2 endpoints (#1207)
Copies the log endpoints up to the V2 endpoints, in a similar way to
the call endpoints.

The main change is to when logs are inserted into S3. The signature of
the function has been changed to take the whole call object, rather
than just the app and call id's. This allows the function to switch
between calls for Routes and those for Fns. Obviously this switching
can be removed when v1 is removed.

In the sql implementation it inserts with both appID and fnID, this
allows the two get's to work, and the down grade of the
migration. When the v1 logs are removed, the appId can be dropped.

The log fetch test and error messages have been changed to be FnID specific.
2018-09-13 10:30:10 +01:00
Tolga Ceylan
aabbe0fba5 fn: check context timeout when waiting for non-blocking attach (#1201)
* fn: check context timeout when waiting for non-blocking attach

With this change, we no longer allow docker client AttachToContainerNonBlocking
to block on Success channel more than our context deadline/timeout.

* fn: move nbio chan handling in attach to docker from docker-client
2018-09-12 13:01:51 -07:00
Tolga Ceylan
6226af933a fn: slot metrics/stats should be in stats/metrics removing logging (#1200)
Slot stats are too noisy. These should be (or shortly will be) in
metrics/stats/tracing.
2018-09-10 16:30:25 -07:00
Tolga Ceylan
bb8436c3ee fn: docker driver stats/metrics for prometheus (#1197)
* fn: docker driver stats/metrics for prometheus
2018-09-10 13:35:50 -07:00
Gerardo Viedma
0e01f3e547 Gracefully handles client request cancelations, instead of treating treating them as server errors (#1194)
* Gracefully handles client request cancelations, instead of logging them as a 500 error

* adds runner_addr to runner client logs
2018-09-05 07:53:48 +01:00
Reed Allman
7638b31e11 use tini to run every container (#1195)
fixes #1101

additional context:

* this was introduced in docker 1.13 (1/2017), we require docker 17.10
(10/2017), this should not have any issues dependency-wise, as `docker-init`
is in the docker install from that point in time. unless explicitly removed,
it should be in the dind container we use as well...
* the PR that introduced this to docker is
https://github.com/moby/moby/pull/26061 for additional context
* it may be wise to put this through some paces, if anybody has any...
interesting... function containers. the tests seem to work fine, however, and
this shouldn't be something users have to think about (?) at all, just
something that we are doing. this isn't the default in docker for
compatibility reasons, which is maybe a yellow flag but I am not sure tbh
2018-09-04 15:41:30 -07:00
Tolga Ceylan
ad011fde7f fn: introducing docker-syslog driver as default logger (#1189)
* fn: introducing docker-syslog driver as default logger

With this change, fn-agent prefers RFC2454 docker-syslog driver
for logging stdout/stderr from containers. The advantage
of this is to offload it to docker itself instead of
streaming stderr along with stdout, which gets multiplexed
through single connection via docker-API.

The change will need support from FDKs in order to log
correct call-id and supress '\n' that splits syslog lines.
2018-08-29 13:08:02 -07:00
Gerardo Viedma
802832436c Sets FN_PATH in models.Call for fn invoke requests (#1192) 2018-08-29 12:58:39 +01:00
Reed Allman
292f673747 Go1.11 (#1188)
* update circleci to go1.11

* update opencensus dep to build with go1.11

* fix up for new gofmt rules
2018-08-27 10:55:52 -07:00
Reed Allman
9cac4c8eea update fsouza to v1.2.0 (#1186)
* update fsouza to v1.2.0

* unwind timeouts on docker

previously, we were setting our own transport on the docker client, but this
does not work anymore as fsouza now needs to call this:
https://github.com/fsouza/go-dockerclient/blob/master/client_unix.go
which makes a platform dependent client. fsouza now also appears to make a
transport that modifies the default http client with some saner values for
things like max idle conns per host (they get reaped if idle 90s):
https://github.com/fsouza/go-dockerclient/blob/master/client.go#L1059
-- these settings are sane and were why we were doing this to begin with.

additionally, have removed our setting of timeout on the docker client for 2
minutes. this is a leftover relic of a bygone era from a time when we relied
on these timeouts to timeout higher level things, which now we're properly
timing out in the enclosing methods. so, they gone, this makes the docker
client a little less whacky now.
2018-08-24 11:36:02 -07:00
Reed Allman
a6d60551ab disable user function logs at debug level config (#1179) 2018-08-21 21:02:49 -07:00
Tom Coupland
79a7308a17 Adding Fn invoke endpoint that works just like triggers endpoint (#1168) 2018-08-13 10:01:52 +01:00
Peter Jausovec
35408ac949 Change the syslog format to use app_name instead of app_id (#1166)
* Add AppName to the models.Call, so we can include it in the syslog

* Replace the app_id with app_name
2018-08-09 12:06:19 -07:00
Tolga Ceylan
f57571fb3a fn: SSL config adjustments (#1160)
SSL related FN_NODE_CERT (and related) settings are
not very clear today. Removing this in favor of a
simple map of tls.Config objects. Three keys are
provided for this map:

TLSGRPCServer
TLSAdminServer
TLSWebServer

which correspond to server TLS settings for the
associated services.

Operators/implementers can further add more
keys to the map and add their own TLS config.
2018-08-06 20:57:03 -07:00
Tolga Ceylan
b6aeae3680 fn: moving opencensus distribution buckets out of agent (#1158)
Users can best pick the proper range for their operating
environment. Default cmd/fnserver uses some sensible
defaults.
2018-08-06 10:48:52 -07:00
Tolga Ceylan
b524a94651 fn: fix math error in calculating msecs in container states (#1157) 2018-08-03 17:25:01 -07:00
Owen Cliffe
c3a46f9452 Use sha256 for slot token (#1155) 2018-08-03 19:07:28 +01:00
Tolga Ceylan
0105f8321e fn: stats view/distribution improvements (#1154)
* fn: stats view/distribution improvements

*) View latency distribution is now an argument
in view creation functions. This allows easier
override to set custom buckets. It is simplistic
and assumes all latency views would use the same
set, but in practice this is already the case.
*) Removed API view creation to main, this should not
be enabled for all node types. This is consistent with
the rest of the system.

* fn: Docker samples of cpu/mem/disk with specific buckets
2018-08-03 11:06:54 -07:00
Reed Allman
af94f3f8ac move max_request_size from agent to server (#1145)
moves the config option for max request size up to the front end, adds the env
var for it there, adds a server test for it and removes it from agent. a
request is either gonna come through the lb (before grpc) or to the server, we
can handle limiting the request there at least now, which may be easier than
having multiple layers of request body checking. this aligns with not making
the agent as responsible for http behaviors (eventually, not at all once route
is fully deprecated).
2018-07-31 08:58:47 -07:00
Reed Allman
409c104df3 make agent options/config pass lint checks (#1144) 2018-07-30 16:04:27 -07:00
Tolga Ceylan
9f29d824d6 fn: New timeout for LB Placer (#1137)
* fn: New timeout for LB Placer

Previously, LB Placers worked hard as long as
client contexts allowed for. Adding a Placer
config setting to bound this by 360 seconds by
default.

The new timeout is not accounted during actual
function execution and only applies to the amount
of wait time in Placers when the call is not
being executed.
2018-07-26 10:19:25 -07:00
Tolga Ceylan
2706323cec fn: tests for private repo auth and rename DOCKER_AUTH (#1134)
Renamed DOCKER_AUTH with FN_ prefix to clarify the purpose. Docker
does not use this variable.

New tests to clarify the repo/auth-config behavior.
2018-07-24 15:19:59 -07:00
Tolga Ceylan
cf37a21fab fn: cleanup of docker private registry code (#1130)
* fn: cleanup of docker private registry code

Start using URL parsed ServerAddress and its subdomains
for easier image ensure/pull in docker driver. Previous
code to lookup substrings was faulty without proper
URL parse and hostname tokenization. When searching
for a registry config, if image name does not contain
a registry and if there's a private registry configured,
then search for hub.docker.com and index.docker.io. This
is similar to previous code but with correct subdomain
matching.

* fn-dataplane: take port into account in auth configs
2018-07-24 02:15:25 +01:00
Tolga Ceylan
fc71208063 fn: add context into to logger passed to DialWithBackoff (#1133) 2018-07-23 13:05:30 -07:00
Tolga Ceylan
db7cbf73e2 fn: add requests received/handled in Status responses (#1132)
This is useful as additional data to inflight requests.
Callers can determine request arrival and processing
rate.
2018-07-20 16:00:02 -07:00
Tolga Ceylan
1258baeb7f fn: agent eviction revisited (#1131)
* fn: agent eviction revisited

Previously, the hot-container eviction logic used
number of waiters of cpu/mem resources to decide to
evict a container. An ejection ticker used to wake up
its associated container every 1 sec to reasses system
load based on waiter count. However, this does not work
for non-blocking agent since there are no waiters for
non-blocking mode.

Background on blocking versus non-blocking agent:
    *) Blocking agent holds a request until the
    the request is serviced or client times out. It assumes
    the request can be eventually serviced when idle
    containers eject themselves or busy containers finish
    their work.
    *) Non-blocking mode tries to limit this wait time.
    However non-blocking agent has never been truly
    non-blocking. This simply means that we only
    make a request wait if we take some action in
    the system. Non-blocking agents are configured with
    a much higher hot poll frequency to make the system
    more responsive as well as to handle cases where an
    too-busy event is missed by the request. This is because
    the communication between hot-launcher and waiting
    requests are not 1-1 and lossy if another request
    arrives for the same slot queue and receives a
    too-busy response before the original request.

Introducing an evictor where each hot container can
register itself, if it is idle for more than 1 seconds.
Upon registry, these idle containers become eligible
for eviction.

In hot container launcher, in non-blocking mode,
before we attempt to emit a too-busy response, now
we attempt an evict. If this is successful, then
we wait some more. This could result in requests
waiting for more than they used to only if a
container was evicted. For blocking-mode, the
hot launcher uses hot-poll period to assess if
a request has waited for too long, then eviction
is triggered.
2018-07-19 15:04:15 -07:00
Tolga Ceylan
e9d5221e15 fn: Status gRPC call timeout handling (#1125)
Status calls should not directly use client
gRPC context deadlines/timeouts during Status
execution. Status should allow plenty of time
for the scheduler agent and docker to run and
emit useful error information.

Setting this timeout to 60 seconds, which should
surface disk I/O, docker, etc. issues.
2018-07-16 18:33:23 -07:00
Tolga Ceylan
564db4e9d2 fn: Status should expose if data was served from cache. (#1123)
This is useful in scenarios where gRPC client might want
to reliably observe/report the status latency metrics
and remove any possible duplicates. If the status query
was served from cache, then these latencies show last
execution latency.
2018-07-13 17:35:00 -07:00
Tolga Ceylan
5dc5740a54 fn: runner status and docker load images (#1116)
* fn: runner status and docker load images

Introducing a function run for pure runner Status
calls. Previously, Status gRPC calls returned active
inflight request counts with the purpose of a simple
health checker. However this is not sufficient since
it does not show if agent or docker is healthy. With
this change, if pure runner is configured with a status
image, that image is executed through docker. The
call uses zero memory/cpu/tmpsize settings to ensure
resource tracker does not block it.

However, operators might not always have a docker
repository accessible/available for status image. Or
operators might not want the status to go over the
network. To allow such cases, and in general possibly
caching docker images, added a new environment variable
FN_DOCKER_LOAD_FILE. If this is set, fn-agent during
startup will load these images that were previously
saved with 'docker save' into docker.
2018-07-12 13:58:38 -07:00
Owen Cliffe
fff95e7992 Clean up/make consistent the APIs for registering core components, make Docker an optional component at compile time (#1111) 2018-07-07 10:37:19 +01:00
Owen Cliffe
b8b544ed25 HTTP Triggers hookup (#1086)
* Initial suypport for invoking tiggers

* dupe method

* tighten server constraints

* runner tests not working yet

* basic route tests passing

* post rebase fixes

* add hybrid support for trigger invoke and tests

* consoloidate all hybrid evil into one place

* cleanup and make triggers unique by source

* fix oops with Agent

* linting

* review fixes
2018-07-05 12:56:07 -05:00
Tolga Ceylan
300fcd7d92 fn: applications should be aware of reserved writable space (#1083)
Similar to FN_MEMORY, we pass FN_TMPSIZE to function config.
2018-07-03 16:04:48 -07:00
Tolga Ceylan
317de18e6b fn: lb-agent: Add Runner Scheduler/Execution Stats (#1107)
LB agent reports lb placer latency. It should also report
how long it took for the runner to initiate the call as
well as execution time inside the container if the runner
has accepted (committed) to the call.
2018-07-02 17:15:43 -07:00
Tom Coupland
3ebff051a4 Add support for Function and Trigger domain objects (#1060)
Vast commit, includes:

 * Introduces the Trigger domain entity.
 * Introduces the Fns domain entity.
 * V2 of the API for interacting with the new entities in swaggerv2.yml
 * Adds v2 end points for Apps to support PUT updates.
 * Rewrites the datastore level tests into a new pattern.
 * V2 routes use entity ID over name as the path parameter.
2018-06-25 15:37:06 +01:00
Reed Allman
51ff7caeb2 Bye bye openapi (#1081)
* add DateTime sans mgo

* change all uses of strfmt.DateTime to common.DateTime, remove test strfmt usage

* remove api tests, system-test dep on api test

multiple reasons to remove the api tests:

* awkward dependency with fn_go meant generating bindings on a branched fn to
vendor those to test new stuff. this is at a minimum not at all intuitive,
worth it, nor a fun way to spend the finite amount of time we have to live.
* api tests only tested a subset of functionality that the server/ api tests
already test, and we risk having tests where one tests some thing and the
other doesn't. let's not. we have too many test suites as it is, and these
pretty much only test that we updated the fn_go bindings, which is actually a
hassle as noted above and the cli will pretty quickly figure out anyway.
* fn_go relies on openapi, which relies on mgo, which is deprecated and we'd
like to remove as a dependency. openapi is a _huge_ dep built in a NIH
fashion, that cannot simply remove the mgo dep as users may be using it.
we've now stolen their date time and otherwise killed usage of it in fn core,
for fn_go it still exists but that's less of a problem.

* update deps

removals:

* easyjson
* mgo
* go-openapi
* mapstructure
* fn_go
* purell
* go-validator

also, had to lock docker. we shouldn't use docker on master anyway, they
strongly advise against that. had no luck with latest version rev, so i locked
it to what we were using before. until next time.

the rest is just playing dep roulette, those end up removing a ton tho

* fix exec test to work

* account for john le cache
2018-06-21 11:09:16 -07:00
Tolga Ceylan
881a0ba1db fn: agent call overrider (#1080)
Similar to LB Agent call overrider, this PR adds Agent overrider
for Agents to modify/analyze a Call/Extensions during GetCall().
2018-06-20 16:21:09 -07:00
Tolga Ceylan
e67d0e5f3f fn: Call extensions/overriding and more customization friendly docker driver (#1065)
In pure-runner and LB agent, service providers might want to set specific driver options.

For example, to add cpu-shares to functions, LB can add the information as extensions
to the Call and pass this via gRPC to runners. Runners then pick these extensions from
gRPC call and pass it to driver. Using a custom driver implementation, pure-runners can
process these extensions to modify docker.CreateContainerOptions.

To achieve this, LB agents can now be configured using a call overrider.

Pure-runners can be configured using a custom docker driver.

RunnerCall and Call interfaces both expose call extensions.

An example to demonstrate this is implemented in test/fn-system-tests/system_test.go
which registers a call overrider for LB agent as well as a simple custom docker driver.
In this example, LB agent adds a key-value to extensions and runners add this key-value
as an environment variable to the container.
2018-06-18 14:42:28 -07:00
Andrea Rosa
e637661ea2 Adding a way to inject a request ID (#1046)
* Adding a way to inject a request ID

It is very useful to associate a request ID to each incoming request,
this change allows to provide a function to do that via Server Option.
The change comes with a default function which will generate a new
request ID. The request ID is put in the request context along with a
common logger which always logs the request-id

We add gRPC interceptors to the server so it can get the request ID out
of the gRPC metadata and put it in the common logger stored in the
context so as all the log lines using the common logger from the context
will have the request ID logged
2018-06-14 10:40:55 +01:00
Peter Jausovec
bd5150f1ac Extract register view functionality (#1056)
* WIP

* Create separate Register*Views functions that are called from main.
2018-06-12 17:24:21 +01:00
Owen Cliffe
1ad27f4f0d Inverting deps on SQL, Log and MQ plugins to make them optional dependencies of extended servers, Removing some dead code that brought in unused dependencies Filtering out some non-linux transitive deps. (#1057)
* initial Db helper split - make SQL and datastore packages optional

* abstracting log store

* break out DB, MQ and log drivers as extensions

* cleanup

* fewer deps

* fixing docker test

* hmm dbness

* updating db startup

* Consolidate all your extensions into one convenient package

* cleanup

* clean up dep constraints
2018-06-11 18:23:28 +01:00
Tolga Ceylan
fce1e54746 fn: remove dead code in static pool (#1052)
Static pool is oriented for testing/basic usage and
as it's name implies it is a static pool. Therefore,
removing unnecessary/dead code.
2018-06-08 15:57:06 -07:00
Tolga Ceylan
8f969918bd fn: removing unused/dead code (#1051) 2018-06-08 15:51:19 -07:00
Tolga Ceylan
4fcb52f69d fn: MaxTotalCPU and MaxTotalMemory in non-Linux systems (#1043)
Non-Linux systems skip some of memory/cpu determination
code in resource tracker. But config settings to cap
these are used in tests, so they must not be ignored.

With this change, we apply these config settings even
on non-Linux systems.

Memory allocation code is also now same in non-Linux
systems, but default is raised to 2GB from 1.5GB.
2018-06-06 14:50:21 -07:00
Owen Cliffe
c6abc8bf64 Use context logging more to ensure context vars are present in log lines (#1039) 2018-06-06 15:14:29 +01:00
Tolga Ceylan
4af53025d8 fn: lb-agent: Initial TryCall result can be retriable. (#1035)
Before this change, we assumed data may end up in a container
once we placed a TryCall() and if gRPC send failed, we did not
retry. However, a send failure cannot result in data in a
container, since only upon successful receipt of a TryCall can
pure-runner schedule a call into a container. Here we trust
gRPC and if gRPC layer says it could not send a msg, then
the receiver did not receive it.
2018-06-05 14:41:13 -07:00
Andrea Rosa
c2c295ffb3 Add a LBAgent constructor which accept AgentConfig (#1037)
In some cases could be useful to pass Agent configurations to the
LnAgent constuctor, this small change adds a new constructor which
accepts an agent configuration as additional parameter.
2018-06-05 13:59:43 -07:00