79 Commits

Author SHA1 Message Date
Srinidhi Chokkadi Puranik
bb84ed35de Revert "safe responsewriter usage in TryExec (#1490)" (#1522)
This reverts commit 1fb78ed836.
2019-07-04 09:51:06 +01:00
Richard Connon
8b32ba2697 Fix typos in Makefile from go mod migration (#1508)
When we migrated from dep to mod we updated the .PHONY targets but not
the actual target names in the Makefile. Fix this.
2019-06-10 08:16:53 -07:00
Reed Allman
e8931d28c8 remove unused extensions cruft (#1481)
* changes v1 to v2 for adding an endpoint
* removed the handler funcs for adding handlers onto eg /apps/:app_id/x, we
don't have them for funcs or triggers, and they honestly seem useless as it's
easy to build it with the ability to add a handler and to access the fnext
datastore which we offer, as they were they are really expensive since they
yank the app out of the db even if the operation may not even need it in that
handler. so instead of adding for the rest, remove all of these. none of our
example extensions, which aren't working at the moment, use these either
(that's why I'm here anyway).
* removes dead code helpers and references to app name url param which is no
longer a thing. these were just hanging around bugging me when I ran into
them, so killing them..
2019-06-04 10:38:46 +03:00
Reed Allman
1fb78ed836 safe responsewriter usage in TryExec (#1490)
inside of TryExec we were writing directly to the response writer inside of a
goroutine, but TryExec can timeout and then get called again to a different
runner or even have the front end writing headers while TryExec is writing
headers.

one way to make this safe is to make a new response writer for TryExec to
write the response into, and only after the goroutine handling the response
has returned, from the TryExec goroutine we can copy the response back up as
the caller will not call TryExec again until it has returned (this is
seemingly part of the placer contract). unfortunately, we're already buffering
the response writer in the front end, too - it's possible we can get rid of
that but it may need further testing.

this adds an optimization when copying the request body from the LB to a
runner, since we're using request.GetBody() and returning a reader we
are familiar with that happens to just wrap a buffer's bytes (which we just
need multiple readers on, but the data doesn't change). anyway, this whole
interaction is unfortunate but kind of necessary due to needing to maneuver
into a protobuf, it seems like a worth it and somewhat ok abstraction wise
optimization.

additionally, this gets rid of passing the client response headers down into
the agent for detached functions. we don't need these since detached functions
are not responding with the functions response to the client, only a 202, this
was leading to races around writing the headers in retries too, but this is
just for posterity/correctness now.

updated the makefile/system test script so that I could run these faster to
repro, pretty handy, should add to other stuff too...

closes #1484
2019-05-01 17:56:13 -07:00
Reed Allman
a0f92abdcf remove logs/calls apis, async, and most of hybrid (#1458)
* start from the top

remove runner configuration mode

* remove async, logs, calls... hybrid still has one use

* add note

* fix tests

* remove all async verbiage / cruft

* fix test

* remove logs and calls from swagger

* fix rebase

* fix system tests

* remove calls/logs from sql db

adds migration and removes datastore methods

* go mod tidy && go mod vendor

* remove unused env vars

* remove stale server options
2019-04-08 15:11:22 -07:00
Tomas Knappek
27c1814cee Prevent in-built docker VOLUME commands (#1378) 2019-02-05 12:01:49 -08:00
Reed Allman
d85fadb142 add gosec scanning to ci (#1349)
gosec severity=medium passes, all severity=low errors are from unhandled
errors, we have 107 of them. tbh it doesn't look worth it to me, but maybe
there are a few assholes even itchier than mine out there. medium has some
good stuff in it, and of course high makes sense if we're gonna do this at
all.

this adds some nosec annotations for some things like sql sprintfs where we
know it's clean (we're constructing the strings with variables in them). fixed
up other spots where we were sprinting without need.

some stuff like filepath.Clean when opening a file from a variable, and file
permissions, easy stuff...

I can't get the CI build to shut up, but I can locally get it to be pretty
quiet about imports and it just outputs the gosec output. fortunately, it
still works as expected even when it's noisy. I got it to shut up by unsetting
some of the go mod flags locally, but that doesn't seem to quite do it in
circle, printed the env out and don't see them, so idk... i give up, this
works

closes #1303
2018-12-13 17:57:25 -08:00
Eric Fode
8de5aef09d go modifyed (#1284)
* go modified

fiddling with vendor

got rid of the vendor directory

revendored but with the exact same versions of things

maybe better

added mods for the images

revendored

using `GOFLAGS` instead of repeating my self

vendor everything to the exact same commit hash as before

and fixed ugorji

Delete Deproxy.toml

empty file

cleaned up some file

cleaned up some cruft

get rid of some unused packages and exclude some Microsoft packages

added flags to the variables that get pushed into docker in the makefile

It works I suppose

added noop

excluded what we did not want

even less hacky

reverted to a version that has not been mangled

* get rid of my experiment
2018-11-07 11:10:22 -08:00
Reed Allman
e13a6fd029 death to format (#1281)
* get rid of old format stuff, utils usage, fix up for fdk2.0 interface

* pure agent format removal, TODO remove format field, fix up all tests

* shitter's clogged

* fix agent tests

* start rolling through server tests

* tests compile, some failures

* remove json / content type detection on invoke/httptrigger, fix up tests

* remove hello, fixup system tests

the fucking status checker test just hangs and it's testing that it doesn't
work so the test passes but the test doesn't pass fuck life it's not worth it

* fix migration

* meh

* make dbhelper shut up about dbhelpers not being used

* move fail status at least into main thread, jfc

* fix status call to have FN_LISTENER

also turns off the stdout/stderr blocking between calls, because it's
impossible to debug without that (without syslog), now that stdout and stderr
go to the same place (either to host stderr or nowhere) and isn't used for
function output this shouldn't be a big fuss really

* remove stdin

* cleanup/remind: fixed bug where watcher would leak if container dies first

* silence system-test logs until fail, fix datastore tests

postgres does weird things with constraints when renaming tables, took the
easy way out

system-tests were loud as fuck and made you download a circleci text file of
the logs, made them only yell when they goof

* fix fdk-go dep for test image. fun

* fix swagger and remove test about format

* update all the gopkg files

* add back FN_FORMAT for fdks that assert things. pfft

* add useful error for functions that exit

this error is really confounding because containers can exit for all manner of
reason, we're just guessing that this is the most likely cause for now, and
this error message should very likely change or be removed from the client
path anyway (context.Canceled wasn't all that useful either, but anyway, I'd
been hunting for this... so found it). added a test to avoid being publicly
shamed for 1 line commits (beware...).
2018-10-26 10:43:04 -07:00
Reed Allman
01b8e8679d HTTP trigger http-stream tests (#1241) 2018-09-26 13:25:48 +01:00
Reed Allman
3a9c48b8a3 http-stream format (#1202)
* POC code for inotify UDS-io-socket

* http-stream format

introducing the `http-stream` format support in fn. there are many details for
this, none of which can be linked from github :( -- docs are coming (I could
even try to add some here?). this is kinda MVP-ish level, but does not
implement the remaining spec, ie 'headers' fixing up / invoke fixing up. the
thinking being we can land this to test fdks / cli with and start splitting
work up on top of this. all other formats work the same as previous (no
breakage, only new stuff)

with the cli you can set `format: http-stream` and deploy, and then invoke a
function via the `http-stream` format. this uses unix domain socket (uds) on
the container instead of previous stdin/stdout, and fdks will have to support
this in a new fashion (will see about getting docs on here). fdk-go works,
which is here: https://github.com/fnproject/fdk-go/pull/30 . the output looks
the same as an http format function when invoking a function. wahoo.

there's some amount of stuff we can clean up here, enumerated:

* the cleanup of the sock files is iffy, high pri here

* permissions are a pain in the ass and i punted on dealing with them. you can
run `sudo ./fnserver` if running locally, it may/may not work in dind(?) ootb

* no pipe usage at all (yay), still could reduce buffer usage around the pipe
behavior, we could clean this up potentially before removal (and tests)

* my brain can’t figure out if dispatchOldFormats changes pipe behavior, but
tests work

* i marked XXX to do some clean up which will follow soon… need this to test fdk
tho so meh, any thoughts on those marked would be appreciated however (1 less
decision for me). mostly happy w/ general shape/plumbing tho

* there are no tests atm, this is a tricky dance indeed. attempts were made.
need to futz with the permission stuff before committing to adding any tests
here, which I don't like either. also, need to get the fdk-go based test image
updated according to the fdk-go, and there's a dance there too. rumba time..

* delaying the big big cleanup until we have good enough fdk support to kill
all the other formats.

open to ideas on how to maneuver landing stuff...

* fix unmount

* see if the tests work on ci...

* add call id header

* fix up makefile

* add configurable iofs opts

* add format file describing http-stream contract

* rm some cruft

* default iofs to /tmp, remove mounting

out of the box fn we can't mount. /tmp will provide a memory backed fs for us
on most systems, this will be fine for local developing and this can be
configured to be wherever for anyone that wants to make things more difficult
for themselves.

also removes the mounting, this has to be done as root. we can't do this in
the oss fn (short of requesting root, but no). in the future, we may want to
have a knob here to have a function that can be configured in fn that allows
further configuration here. since we don't know what we need in this dept
really, not doing that yet (it may be the case that it could be done
operationally outside of fn, eg, but not if each directory needs to be
configured itself, which seems likely, anyway...)

* add WIP note just in case...
2018-09-14 10:59:12 +01:00
Tolga Ceylan
5dc5740a54 fn: runner status and docker load images (#1116)
* fn: runner status and docker load images

Introducing a function run for pure runner Status
calls. Previously, Status gRPC calls returned active
inflight request counts with the purpose of a simple
health checker. However this is not sufficient since
it does not show if agent or docker is healthy. With
this change, if pure runner is configured with a status
image, that image is executed through docker. The
call uses zero memory/cpu/tmpsize settings to ensure
resource tracker does not block it.

However, operators might not always have a docker
repository accessible/available for status image. Or
operators might not want the status to go over the
network. To allow such cases, and in general possibly
caching docker images, added a new environment variable
FN_DOCKER_LOAD_FILE. If this is set, fn-agent during
startup will load these images that were previously
saved with 'docker save' into docker.
2018-07-12 13:58:38 -07:00
Tolga Ceylan
f8d737dd46 fn: api-tests are decommissioned: cleanup Makefile (#1082)
* fn: api-tests are decommissioned: cleanup Makefile

* fn: increase mem in system-tests due to fn-test-utils image
2018-06-21 12:33:20 -07:00
Tolga Ceylan
c73d3f362e fn: remove confusing parallelism in test scripts (#1079)
* fn: remove confusing parallelism in test scripts

*) Tests should be consistent when run from makefile versus
running these test scripts from command line. Let go use
GOMAXPROCS instead of hardcoded 4 cpus in Makefile.
*) Moved docker pull for specific image versions into
helpers scripts as well. Easier to maintain image version
for tests in the same place.
*) Minor Makefile cleanup: removed unused makefile targets.

* fn: git-diff rename limit increase
2018-06-20 13:49:31 -07:00
Tolga Ceylan
bd7f67a74a fn: test scripts should use well defined ports (#1077)
* fn: test scripts should use well defined ports

Moved allocation of listener ports for mysql/minio/postgres
to helper script with a list of service list names.

* fn: makefile docker pull mysql version must match tests
2018-06-20 10:55:05 -07:00
Reed Allman
00c29b8bf3 datastore no longer implements logstore (#1013)
* datastore no longer implements logstore

the underlying implementation of our sql store implements both the datastore
and the logstore interface, however going forward we are likely to encounter
datastore implementers that would mock out the logstore interface and not use
its methods - signalling a poor interface. this remedies that, now they are 2
completely separate things, which our sqlstore happens to implement both of.

related to some recent changes around wrapping, this keeps the imposed metrics
and validation wrapping of a servers logstore and datastore, just moving it
into New instead of in the opts - this is so that a user can have the
underlying datastore in order to set the logstore to it, since wrapping it in
a validator/metrics would render it no longer a logstore implementer (i.e.
validate datastore doesn't implement the logstore interface), we need to do
this after setting the logstore to the datastore if one wasn't provided
explicitly.

* splits logstore and datastore metrics & validation logic
* `make test` should be `make full-test` always. got rid of that so that
nobody else has to wait for CI to blow up on them after the tests pass locally
ever again.

* fix new tests
2018-06-04 00:08:16 -07:00
Travis Reeder
999820d15b Moves main into cmd dir. (#977) 2018-05-09 10:52:52 +03:00
Justin Ko
2e0a22a3e0 Make sure to rebuild protobuf files during build (#908)
I noticed that the Makefile rule to build protobuf files was listing a
non-existent file and that the protobuf files were not getting rebuilt
during builds.
2018-03-30 12:56:28 +01:00
Tolga Ceylan
0addcb8911 fn: pre-fork pool for namespace/network speedup (#874)
* fn: pre-fork pool experimental implementation
2018-03-23 16:35:35 -07:00
Tolga Ceylan
b74db6762b fn: remove pre go 1.10 optimization of build install (#859)
This speeded up tests before go 1.10 improvements, but
also clashed with fn cli tool, which gets installed as 'fn'.
2018-03-14 14:14:31 -07:00
Gerardo Viedma
8af57da7b2 Support load-balanced runner groups for multitenant compute isolation (#814)
* Initial stab at the protocol

* initial protocol sketch for node pool manager

* Added http header frame as a message

* Force the use of WithAgent variants when creating a server

* adds grpc models for node pool manager plus go deps

* Naming things is really hard

* Merge (and optionally purge) details received by the NPM

* WIP: starting to add the runner-side functionality of the new data plane

* WIP: Basic startup of grpc server for pure runner. Needs proper certs.

* Go fmt

* Initial agent for LB nodes.

* Agent implementation for LB nodes.

* Pass keys and certs to LB node agent.

* Remove accidentally left reference to env var.

* Add env variables for certificate files

* stub out the capacity and group membership server channels

* implement server-side runner manager service

* removes unused variable

* fixes build error

* splits up GetCall and GetLBGroupId

* Change LB node agent to use TLS connection.

* Encode call model as JSON to send to runner node.

* Use hybrid client in LB node agent.

This should provide access to get app and route information for the call
from an API node.

* More error handling on the pure runner side

* Tentative fix for GetCall problem: set deadlines correctly when reserving slot

* Connect loop for LB agent to runner nodes.

* Extract runner connection function in LB agent.

* drops committed capacity counts

* Bugfix - end state tracker only in submit

* Do logs properly

* adds first pass of tracking capacity metrics in agent

* maked memory capacity metric uint64

* maked memory capacity metric uint64

* removes use of old capacity field

* adds remove capacity call

* merges overwritten reconnect logic

* First pass of a NPM

Provide a service that talks to a (simulated) CP.

- Receive incoming capacity assertions from LBs for LBGs
- expire LB requests after a short period
- ask the CP to add runners to a LBG
- note runner set changes and readvertise
- scale down by marking runners as "draining"
- shut off draining runners after some cool-down period

* add capacity update on schedule

* Send periodic capcacity metrics

Sending capcacity metrics to node pool manager

* splits grpc and api interfaces for capacity manager

* failure to advertise capacity shouldn't panic

* Add some instructions for starting DP/CP parts.

* Create the poolmanager server with TLS

* Use logrus

* Get npm compiling with cert fixups.

* Fix: pure runner should not start async processing

* brings runner, nulb and npm together

* Add field to acknowledgment to record slot allocation latency; fix a bug too

* iterating on pool manager locking issue

* raises timeout of placement retry loop

* Fix up NPM

Improve logging

Ensure that channels etc. are actually initialised in the structure
creation!

* Update the docs - runners GRPC port is 9120

* Bugfix: return runner pool accurately.

* Double locking

* Note purges as LBs stop talking to us

* Get the purging of old LBs working.

* Tweak: on restart, load runner set before making scaling decisions.

* more agent synchronization improvements

* Deal with teh CP pulling out active hosts from under us.

* lock at lbgroup level

* Send request and receive response from runner.

* Add capacity check right before slot reservation

* Pass the full Call into the receive loop.

* Wait for the data from the runner before finishing

* force runner list refresh every time

* Don't init db and mq for pure runners

* adds shutdown of npm

* fixes broken log line

* Extract an interface for the Predictor used by the NPM

* purge drained connections from npm

* Refactor of the LB agent into the agent package

* removes capacitytest wip

* Fix undefined err issue

* updating README for poolmanager set up

* ues retrying dial for lb to npm connections

* Rename lb_calls to lb_agent now that all functionality is there

* Use the right deadline and errors in LBAgent

* Make stream error flag per-call rather than global otherwise the whole runner is damaged by one call dropping

* abstracting gRPCNodePool

* Make stream error flag per-call rather than global otherwise the whole runner is damaged by one call dropping

* Add some init checks for LB and pure runner nodes

* adding some useful debug

* Fix default db and mq for lb node

* removes unreachable code, fixes typo

* Use datastore as logstore in API nodes.

This fixes a bug caused by trying to insert logs into a nil logstore. It
was nil because it wasn't being set for API nodes.

* creates placement abstraction and moves capacity APIs to NodePool

* removed TODO, added logging

* Dial reconnections for LB <-> runners

LB grpc connections to runners are established using a backoff stategy
in event of reconnections, this allows to let the LB up even in case one
of the runners go away and reconnect to it as soon as it is back.

* Add a status call to the Runner protocol

Stub at the moment. To be used for things like draindown, health checks.

* Remove comment.

* makes assign/release capacity lockless

* Fix hanging issue in lb agent when connections drop

* Add the CH hash from fnlb

Select this with FN_PLACER=ch when launching the LB.

* small improvement for locking on reloadLBGmembership

* Stabilise the list of Runenrs returned by NodePool

The NodePoolManager makes some attempt to keep the list of runner nodes advertised as
stable as possible. Let's preserve this effort in the client side. The main point of this
is to attempt to keep the same runner at the same inxed in the []Runner returned by
NodePool.Runners(lbgid); the ch algorithm likes it when this is the case.

* Factor out a generator function for the Runners so that mocks can be injected

* temporarily allow lbgroup to be specified in HTTP header, while we sort out changes to the model

* fixes bug with nil runners

* Initial work for mocking things in tests

* fix for anonymouse go routine error

* fixing lb_test to compile

* Refactor: internal objects for gRPCNodePool are now injectable, with defaults for the real world case

* Make GRPC port configurable, fix weird handling of web port too

* unit test reload Members

* check on runner creation failure

* adding nullRunner in case of failure during runner creation

* Refactored capacity advertisements/aggregations. Made grpc advertisement post asynchronous and non-blocking.

* make capacityEntry private

* Change the runner gRPC bind address.

This uses the existing `whoAmI` function, so that the gRPC server works
when the runner is running on a different host.

* Add support for multiple fixed runners to pool mgr

* Added harness for dataplane system tests, minor refactors

* Add Dockerfiles for components, along with docs.

* Doc fix: second runner needs a different name.

* Let us have three runners in system tests, why not

* The first system test running a function in API/LB/PureRunner mode

* Add unit test for Advertiser logic

* Fix issue with Pure Runner not sending the last data frame

* use config in models.Call as a temporary mechanism to override lb group ID

* make gofmt happy

* Updates documentation for how to configure lb groups for an app/route

* small refactor unit test

* Factor NodePool into its own package

* Lots of fixes to Pure Runner - concurrency woes with errors and cancellations

* New dataplane with static runnerpool (#813)

Added static node pool as default implementation

* moved nullRunner to grpc package

* remove duplication in README

* fix go vet issues

* Fix server initialisation in api tests

* Tiny logging changes in pool manager.

Using `WithError` instead of `Errorf` when appropriate.

* Change some log levels in the pure runner

* fixing readme

* moves multitenant compute documentation

* adds introduction to multitenant readme

* Proper triggering of system tests in makefile

* Fix insructions about starting up the components

* Change db file for system tests to avoid contention in parallel tests

* fixes revisions from merge

* Fix merge issue with handling of reserved slot

* renaming nulb to lb in the doc and images folder

* better TryExec sleep logic clean shutdown

In this change we implement a better way to deal with the sleep inside
the for loop during the attempt for placing a call.
Plus we added a clean way to shutdown the connections with external
component when we shut down the server.

* System_test mysql port

set mysql port for system test to a different value to the one set for
the api tests to avoid conflicts as they can run in parallel.

* change the container name for system-test

* removes flaky test TestRouteRunnerExecution pending resolution by issue #796

* amend remove_containers to remove new added containers

* Rework capacity reservation logic at a higher level for now

* LB agent implements Submit rather than delegating.

* Fix go vet linting errors

* Changed a couple of error levels

* Fix formatting

* removes commmented out test

* adds snappy to vendor directory

* updates Gopkg and vendor directories, removing snappy and addhing siphash

* wait for db containers to come up before starting the tests

* make system tests start API node on 8085 to avoid port conflict with api_tests

* avoid port conflicts with api_test.sh which are run in parallel

* fixes postgres port conflict and issue with removal of old containers

* Remove spurious println
2018-03-08 14:45:19 -08:00
Tolga Ceylan
0bdd0b45a7 fn: remove fnlb from Makefile image list (#774) 2018-02-14 20:38:06 -08:00
Tolga Ceylan
fdf5a67f6f fn: error image is now deprecated (#737)
Please use fn-test-utils instead for testing.
2018-02-05 11:12:27 -08:00
Tolga Ceylan
6b5486c699 fn: sleeper image is now deprecated (#736)
Please use fn-test-utils instead for testing.
2018-02-05 10:01:11 -08:00
jan grant
025e598c4b Selective releasing (#708)
* Rejig the build process

During a build, we check and rebuild any dependencies prior to
potentially using them.

Build:
- DIND (this only produces a new docker image, no local code changes)
- fnserver (built as part of the testing)

On master, if everything works, then we release the built artifacts,
if necessary:
- DIND (this pushes a docker image and a tag)
- fnserver (this builds the docker image and releases it, if necessary).

Fnserver is dealt with last by the release script: all previous steps
in CI use locally-run go tests rather than a docker file.

When a commit happens, we need to know (a) if we need to rebuild
a set of tools and artifacts (or whether we can continue to use
published ones); and (b) if we need to release new versions of
those tools, if all tests pass.

We do this by identifying the previous release tag on origin/master
(which is the release branch), then checking for changes between
that point at the current one.

Those changes may appear in various places in the tree: some simple
boolean rules work out whether the change means we need to rebuild
and rerelease.

* Make the fnproject/fnserver build use the latest dind

As docker bumps from 17.12.x, use whatever dind we just built.

* Use bash
2018-02-01 12:43:43 +00:00
jan grant
d85e6bd61b Ensure we have the latest fnproject/dind (#687)
If we need to reissue fnproject/dind:17.12 (which fnproject/fnserver
is based upon) then let's make sure we're using the latest one
when cutting a release.

To ensure we don't accidentally use stale images lying around in
the docker cache (there probably shouldn't be *any*), call
    make clear-images
before running the build.
2018-01-19 10:02:25 -08:00
Travis Reeder
3b9818bc58 Switch to dep from glide (#664) 2018-01-09 14:11:08 -08:00
Denis Makogon
4bb0744853 Use alpine images to make tests take less time (#629)
* Use retry func while trying to ping SQL datastore

 - implements retry func specifically for SQL datastore ping
 - fmt fixes
 - using sqlx.Db.PingContext instead of sqlx.Db.Ping
 - propogate context to SQL datastore

* Use alpine images to make tests take less time

 * use PG alpine
 * use Minio alpine
 * no official alpine distro for MySQL, uhhh :(
 * install swagger tool instead of docker image
 * use retry func to confirm that datastore is okay before running tests

* Store swagger tool at Fn during CI time

 somehow it's a problem to put binary to ${GOPATH}/bin

* Adjust swagger tool reference path

* Revert minio image

* Use amd64/alpine-based swagger tool image for API spec validation

* Cleanup
2018-01-02 14:56:38 -06:00
Denis Makogon
9d6f0b2a05 Speed up API tests (#624)
* Adjust API tests internal API

* Refactor API tests to take less time

 - sqlite: tests 15s, overall time: 1m
 - mysql: tests 15s, overall time: 59s

* Use retry func to survive in faulty places

* Use retry func while trying to ping SQL datastore

 - implements retry func specifically for SQL datastore ping
 - fmt fixes
 - using sqlx.Db.PingContext instead of sqlx.Db.Ping
 - propogate context to SQL datastore

* Simplify TestCanCauseTimeout retry loop

* Call retry with sane timeout

* Fix TestOversizedLog, use retry func

* Increase number of attempts

 2 test cases are really faulty in CI, so they need a lot more time to finish.

* Increase TestCanCauseTimeout timeout

* Use retry at TestMultiLog to speed it up

* Use retry at TestCanWriteLogs to speed it up

* Use retry at TestGetCallsSuccess to speed it up

* Use retry at TestCanGetAsyncState to speed it up

* Use retry at TestListCallsSuccess to speed it up

* Remove sleep calls

* Remove dup test case

* Cleaup Calls API test

* Build API tests binary once

 This patch lets CI to build API tests binary once and reuse that whenever it needs it

* Swap API tests checks

* Build API test binary by default

 dirty fix for CircleCI

* Use retry func to determine if datastore is alive in tests

* go install should also reduce build time

* Fix rebase issues
2018-01-02 13:29:49 -06:00
Tolga Ceylan
d329e0ef5b fn: circleci and makefile adjustments (#625)
* fn: circleci and makefile adjustments

*) Moved more tasks into Makefile to allow for
parallelism and dependency checks.
*) Added cpu count in circleci make invocations
for parallelism

* fn: typo sqlite => sqlite3

* fn: removed unnecessary make pull & install
2017-12-23 10:12:18 -06:00
Denis Makogon
5c68a88599 Fn-prefix everything (#545)
* Fn-prefix everything

Closes: #492

* Global replacement

* missed one fn_
2017-11-29 17:50:24 -08:00
Travis Reeder
ab18e467fa updates functions -> fnserver (#516)
* updates functions -> fn-server and fnlb -> fn-lb

* changed to fnserver and fnlb
2017-11-17 15:53:44 -08:00
Travis Reeder
1ba8620035 FIx release issue 2017-11-17 11:44:52 -08:00
Travis Reeder
96cfc9f5c1 Update json (#463)
* wip

* wip

* Added more fields to JSON and added blank line between objects.

* Update tests.

* wip

* Updated to represent recent discussions.

* Fixed up the json test

* More docs

* Changed from blank line to bracket, newline, open bracket.

* Blank line added back, easier for delimiting.
2017-11-16 09:59:13 -08:00
Reed Allman
8a59654582 go vet yourself (#397)
go vet caught some nifty bugs. so fixed those here, and also made it so that
we vet everything from now on since the robots seem to do a better job of
vetting than we have managed to.

also adds gofmt check to circle. could move this to the test.sh script (didn't
want a script calling a script, because $reasons) and it's nice and isolated
in its own little land as it is. side note, changed the script so it runs in
100ms instead of 3s, i think find is a lot faster than go list.

attempted some minor cleanup of various scripts
2017-10-06 08:42:33 -07:00
James Jeffrey
c7f3066c75 Update references remove refs to treeder oracle funcy (#376)
* Remove lots of refs to iron and funcy oracle etc..

* more ref replacements

* Replacing more refs. Treeder

* Use Fn not FN
2017-09-29 16:22:15 -07:00
Reed Allman
caba9e0ec6 more strict configuration of routes
* idle_timeout max of 1h
* timeout max of 120s for sync, 1h for async
* max memory of 8GB
* do full route validation before call invocation
* ensure that idle_timeout >= timeout

we are now doing validation of updating route inside of the database
transaction, which is what we should have been doing all along really.
we need this behavior to ensure that the idle timeout is longer than the
timeout, among other benefits (like not updating the most recent version of
the existing struct and overwriting previous updates, yay). since we have
this, we can get rid of the weird skipZero behavior on validate too and
validate the real deal holyfield.

validating the route before making the call is handy so that we don't do weird
things like run a func that wants to use 300GB of RAM and run for 3 weeks.

closes #192
closes #344
closes #162
2017-09-21 04:04:34 -07:00
Reed Allman
71a88a991c hang the runner, agent=new sheriff (#270)
* fix docker build

this is trivially incorrect since glide doesn't actually provide reproducible
builds. the idea is to build with the deps that we have checked into git, so
that we actually know what code is executing so that we might debug it...

all for multi stage build instead of what we had, but adding the glide step is
wrong. i added a loud warning so as to discourage this behavior in the future.

* hang the runner, agent=new sheriff

tl;dr agent is now runner, with a hopefully saner api

the general idea is get rid of all the various 'task' structs now, change our
terminology to only be 'calls' now, push a lot of the http construction of a
call into the agent, allow calls to mutate their state around their execution
easily and to simplify the number of code paths, channels and context timeouts
in something [hopefully] easy to understand.

this introduces the idea of 'slots' which are either hot or cold and are
separate from reserving memory (memory is denominated in 'tokens' now).
a 'slot' is essentially a container that is ready for execution of a call, be
it hot or cold (it just means different things based on hotness). taking a
look into Submit should make these relatively easy to grok.

sorry, things were pretty broken especially wrt timings. I tried to keep good
notes (maybe too good), to highlight stuff so that we don't make the same
mistakes again (history repeating itself blah blah quote). even now, there is
lots of work to do :)

I encourage just reading the agent.go code, Submit is really simple and
there's a description of how the whole thing works at the head of the file
(after TODOs). call.go contains code for constructing calls, as well as Start
/ End (small atm). I did some amount of code massaging to try to make things
simple / straightforward / fit reasonable mental model, but as always am open
to critique (the more negative the better) as I'm just one guy and wth do i
know...

-----------------------------------------------------------------------------

below enumerates a number of changes as briefly as possible (heh..):

models.Call all the things

removes models.Task as models.Call is now what it previously was.
models.FnCall is now rid of in favor of models.Call, despite the datastore
only storing a few fields of it [for now]. we should probably store entire
calls in the db, since app & route configurations can change at any given
moment, it would be nice to see the parameters of each call (costs db space,
obviously).

this removes the endpoints for getting & deleting messages, we were just
looping back to localhost to call the MQ (wtf? this was for iron integration i
think) and just calls the MQ.

changes the name of the FnLog to LogStore, confusing cause there's also a
`FuncLogger` which uses the Logstore (punting). removes other `Fn` prefixed
structs (redundant naming convention).

removes some unused and/or weird structs (IDStatus, CompleteTime)

updates the swagger

makes the db methods consistent to use 'Call' nomenclature.

remove runner nuisances:

* push down registry stuff to docker driver
* remove Environment / Stats stuff of yore
* remove unused writers (now in FuncLogger)
* remove 2 of the task types, old hot stuff, runner, etc

fixes ram available calculation on startup to not always be 300GB (helps a lot
on a laptop!)

format for DOCKER_AUTH env now is not a list but a map (there are no docs,
would prefer to get rid of this altogether anyway). the ~/.docker/cfg expected
format is unchanged.

removes arbitrary task queue, if a machine is out of ram we can probably just
time out without queueing... (can open separate discussion) in any case the
old one didn't really account well for hot tasks, it just lined everyone up in
the task queue if there wasn't a place to run hot and then timed them out
[even if a slot became free].

removes HEADER_ prefixing on any headers in the request to a invoke a call.
(this was inconsistent with cli for test anyway)

removes TASK_ID header sent in to hot only (this is a dupe of FN_CALL_ID,
which has not been removed)

now user functions can reply directly to the client. this means that for
cold containers if they write to stdout it will send a 200 + headers. for
hot containers, the user can reply directly to the client from the container,
i.e. with its preferred status code / headers (vs. always getting a 200).
the dispatch itself is a little http specific atm, i think we can add an
interchange format but the current version is easily extended to add json for
now, separate discussion. this eliminates a lot of the request/response
rewriting and buffering we were doing (yey). now Dispatch ONLY does input and
output, vs. managing the call timeout and having access to a call's fields.

cache is pushed down into agent now instead of in the front end, would like to
push it down to the datastore actually but it's here for now anyway. cache
delete functions removed (b/c fn is distributed anyway?). added app caching,
should help with latency.

in general, a lot of server/runner.go got pushed down into the agent. i think
it will be useful in testing to be able to construct calls without having to
invoke http handlers + async also needs to construct calls without a handler.

safe shutdown actually works now for everything (leaked / didn't wait on
certain things before)

now we're waiting for hot slots to open up while we're attempting to get ram
to launch a container if we didn't find any hot slots to run the call in
immediately. we can change this policy really easily now (no more channel
jungle; still some channels). also looking for somewhere else to go while the
container is launching now. slots now get sent _out_ of a container, vs.
a container receiving calls, which makes this kind of policy easier to
implement. this fixes a number of bugs around things like trying to execute
calls against containers that have not and may never start and trying to
launch a bazillion containers when there are no free containers. the driver api
underwent some changes to make this possible (relatively minimal, added Wait).
the easiest way to think about this is that allocating ram has moved 'up'
instead of just wrapping launching containers, so that we can select on a
channel trying to find ram.

not dispatching hot calls to containers that died anymore either...

the timeout is now started at the beginning of Submit, rather than Dispatch or
the container itself having to manage the call timeout, which was an
inaccurate way of doing things since finding a slot / allocating ram / pulling
image can all take a non-trivial (timeout amount, even!) amount of time. this
makes for much more reasonable response times from fn under load, there's
still a little TODO about handling cold+timeout container removal response
times but it's much improved.

if call.Start is called with < call.timeout/2 time left, then the call will
not be executed and return a timeout. we can discuss. this makes async play
_a lot_ nicer, specifically. for large timeouts / 2 makes less sense.

env is no longer getting upper cased (admittedly, this can look a little weird
now). our whole route.Config/app.Config/env/headers stuff probably deserves a
whole discussion...

sync output no longer has the call id in json if there's an error / timeout.
we could add this back to signify that it's _us_ writing these but this was
out of place. FN_CALL_ID is still shipped out to get the id for sync calls,
and async [server] output remains unchanged.

async logs are now an entire raw http request (so that a user can write a 400
or something from their hot async container)

async hot now 'just works'

cold sync calls can now reply to the client before container removal, which
shaves a lot of latency off of those (still eat start). still need to figure
out async removal if timeout or something.

-----------------------------------------------------------------------------

i've located a number of bugs that were generally inherited, and also added
a number of TODOs in the head of the agent.go file according to robustness we
probably need to add. this is at least at parity with the previous
implementation, to my knowledge (hopefully/likely a good bit ahead). I can
memorialize these to github quickly enough, not that anybody searches before
adding bugs anyway (sigh).

the big thing to work on next imo is async being a lot more robust,
specifically to survive fn server failures / network issues.

thanks for review (gulp)
2017-09-05 20:32:51 +03:00
Travis Reeder
f559acd7ed Renamed a bunch of images to use fnproject org. (#239)
* Renamed a bunch of images to use fnproject org.

* Multi-stage build for Docker.

* Added tmp vendor dirs to gitignore.

* Run docker-build at beginning of test.
2017-08-23 22:43:53 +03:00
Denis Makogon
bb8f12ece9 Fixing tests and CI file 2017-07-31 21:14:11 +03:00
Travis Reeder
48e3781d5e Rename to GitHub (#3)
* circle

* Rename to github and fn->cli

*  Rename to github and fn->cli
2017-07-26 10:50:19 -07:00
Denis Makogon
5b41fe2dc7 Improving API tests 2017-07-25 10:29:20 -07:00
Travis Reeder
053c7cb0e6 Added gomega and updated deps. 2017-07-17 13:01:00 -07:00
Reed Allman
4e52c595d2 merge datastores into sqlx package
replace default bolt option with sqlite3 option. the story here is that we
just need a working out of the box solution, and sqlite3 is just fine for that
(actually, likely better than bolt).

with sqlite3 supplanting bolt, we mostly have sql databases. so remove redis
and then we just have one package that has a `sql` implementation of the
`models.Datastore` and lean on sqlx to do query rewriting. this does mean
queries have to be formed a certain way and likely have to be ANSI-SQL (no
special features) but we weren't using them anyway and our base api is
basically done and we can easily extend this api as needed to only implement
certain methods in certain backends if we need to get cute.

* remove bolt & redis datastores (can still use as mqs)
* make sql queries work on all 3 (maybe?)
* remove bolt log store and use sqlite3
* shove the FnLog shit into the datastore shit for now (free pg/mysql logs...
just for demos, etc, not prod)
* fix up the docs to remove bolt references
* add sqlite3, sqlx dep
* fix up tests & mock stuff, make validator less insane
* remove put & get in datastore layer as nobody is using.

this passes tests which at least seem like they test all the different
backends. if we trust our tests then this seems to work great. (tests `make
docker-test-run-with-*` work now too)
2017-07-07 01:30:02 -07:00
Denis Makogon
adf61c77be Full stack tests 2017-07-05 12:38:09 -07:00
James
8a3edb8309 All of the changes for func logs 2017-06-19 11:38:11 -07:00
Travis Reeder
9a8ff408b5 Fixes scary output on docker startup. 2017-06-15 15:48:34 -07:00
Shaun Smith
a31bbdc676 Added -e NO_PROXY and -e HTTP_PROXY to docker-run to fix docker failure to connect to host unix.sock 2017-06-13 11:15:21 -07:00
James Jeffrey
79f1dab007 Deploy sh 2017-06-09 13:42:59 -07:00
James Jeffrey
c7a5bae587 Merge branch 'chad-gitlab-url-change' into 'master'
Chad gitlab url change

See merge request !28
2017-05-30 11:34:22 -07:00