* Docker compose file for simpler dev env
* Updating readme, adding UI to compose
* Define dependencies between services
* Prometheus. Grafana
* Link Prometheus to Grafana service
* Addressing review comments
* Linking compose doc to common table of content
* add error to call model
closes#331
previously, for async this error was being masked completely even if it was
something useful like the image not existing. for sync, the error was returned
in the http request but now it's also being stored. this error itself can
cover a lot of landscape, it could be an error in getting a slot, pulling an
image, running a container, among other things. anyway, no longer being
masked. we can likely improve it in certain cases we run into in the future,
but it's open ended at the moment and not being masked like some errors in
sync http request returns (503 non-models.APIError) for now.
* tucks in callTrigger stuff to keep api clean
* adds swagger
* adds migration
* adds tests for datastore and agent to ensure behavior
* pull images before tests are ran
* gofmt migrations file
* add per call stats field as histogram
this will add a histogram of up to 240 data points of call data, produced
every second, stored at the end of a call invocation in the db. the same
metrics are also still shipped to prometheus (prometheus has the
not-potentially-reduced version). for the API reference, see the updates to
the swagger spec, this is just added onto the get call endpoint.
this does not add any extra db calls and the field for stats in call is a json
blob, which is easily modified to add / omit future fields. this is just
tacked on to the call we're making to InsertCall, and expect this to add very
little overhead; we are bounding the set to be relatively small, planning to
clean out the db of calls periodically, functions will generally be short, and
the same code used at a previous firm did not cause a notable db size increase
with production workload that is worse, wrt histogram size (I checked). the
code changes are really small aside from changing to strfmt.DateTime,
adding a migration and implementing sql.Valuer; needed to slightly modify the
swap function so that we can safely read `call.Stats` field to upload at end.
with the full histogram in hand, we can compute max/min/average/median/growth
rate/bernoulli distributions/whatever very easily in a UI or tooling. in
particular, this data is easily chartable [for a UI], which is beneficial.
* adds swagger spec of api update to calls endpoint
* adds migration for call.stats field
* adds call.stats field to sql queries
* change swapping of hot logger to exec, so we know that call.Stats is no
longer being modified after `exec` [in call.End]
* throws out docker stats between function invocations in hot functions (no
call to store them on, we could change this later for debug; they're in prom)
* tested in tests and API
closes#19
* add format of ints to swag
* fn: introducing 503 responses for out of capacity case
*) Adding 503 with Retry-After header case if request failed
during waiting for slots.
*) TODO: return 503 without Retry-After if the request can
never be met by this fn server.
*) fn: runner test docker pull fixup
*) fn: MaxMemory for routes is now a variable to allow
testing and adjusting it according to fleet memory sizes.
* add minio-go dep, update deps
* add minio s3 client
minio has an s3 compatible api and is an open source project and, notably, is
not amazon, so it seems best to use their client (fwiw the aws-sdk-go is a
giant hair ball of things we don't need, too). it was pretty easy and seems
to work, so rolling with it. also, minio is a totally feasible option for fn
installs in prod / for demos / for local.
* adds 's3' package for s3 compatible log storage api, for use with storing
logs from calls and retrieving them.
* removes DELETE /v1/apps/:app/calls/:call/log endpoint
* removes internal log deletion api
* changes the GetLog API to use an io.Reader, which is a backwards step atm
due to the json api for logs, I have another branch lined up to make a plain
text log API and this will be much more efficient (also want to gzip)
* hooked up minio to the test suite and fixed up the test suite
* add how to run minio docs and point fn at it docs
some notes: notably we aren't cleaning up these logs. there is a ticket
already to make a Mr. Clean who wakes up periodically and nukes old stuff, so
am punting any api design around some kind of TTL deletion of logs. there are
a lot of options really for Mr. Clean, we can notably defer to him when apps
are deleted, too, so that app deletion is fast and then Mr. Clean will just
clean them up later (seems like a good option).
have not tested against BMC object store, which has an s3 compatible API. but
in theory it 'just works' (the reason for doing this). in any event, that's
part of the service land to figure out.
closes#481closes#473
* add log not found error to minio land
before returning the cookie in the driver, wait for health checks
https://docs.docker.com/engine/reference/builder/#healthcheck if provided.
for images that don't have health checks, this will have no affect (an added
call to inspect container, for hot it's small potatoes).
this will be useful for containers so that they can pull large files or do
setup that takes a while before accepting tasks. since this is before start,
it won't run into the idle timeout. we could likely use these for hot
containers in general and check between runs or something, but didn't do that
here.
one nascient concern is that for hot if the containers never become healthy
I don't think we will ever kill them and the slot will 'leak'. this is true
for this and for other cases (pulling image) I think, we should probably
recycle hot containers every hour or something which would also close this.
anyway, not a huge blocker I don't think, there will likely be 1 user of this
feature for a bit, it's not documented since we're not sure we want to support
it.
closes#336