* fn: introducing 503 responses for out of capacity case
*) Adding 503 with Retry-After header case if request failed
during waiting for slots.
*) TODO: return 503 without Retry-After if the request can
never be met by this fn server.
*) fn: runner test docker pull fixup
*) fn: MaxMemory for routes is now a variable to allow
testing and adjusting it according to fleet memory sizes.
* add minio-go dep, update deps
* add minio s3 client
minio has an s3 compatible api and is an open source project and, notably, is
not amazon, so it seems best to use their client (fwiw the aws-sdk-go is a
giant hair ball of things we don't need, too). it was pretty easy and seems
to work, so rolling with it. also, minio is a totally feasible option for fn
installs in prod / for demos / for local.
* adds 's3' package for s3 compatible log storage api, for use with storing
logs from calls and retrieving them.
* removes DELETE /v1/apps/:app/calls/:call/log endpoint
* removes internal log deletion api
* changes the GetLog API to use an io.Reader, which is a backwards step atm
due to the json api for logs, I have another branch lined up to make a plain
text log API and this will be much more efficient (also want to gzip)
* hooked up minio to the test suite and fixed up the test suite
* add how to run minio docs and point fn at it docs
some notes: notably we aren't cleaning up these logs. there is a ticket
already to make a Mr. Clean who wakes up periodically and nukes old stuff, so
am punting any api design around some kind of TTL deletion of logs. there are
a lot of options really for Mr. Clean, we can notably defer to him when apps
are deleted, too, so that app deletion is fast and then Mr. Clean will just
clean them up later (seems like a good option).
have not tested against BMC object store, which has an s3 compatible API. but
in theory it 'just works' (the reason for doing this). in any event, that's
part of the service land to figure out.
closes#481closes#473
* add log not found error to minio land
before returning the cookie in the driver, wait for health checks
https://docs.docker.com/engine/reference/builder/#healthcheck if provided.
for images that don't have health checks, this will have no affect (an added
call to inspect container, for hot it's small potatoes).
this will be useful for containers so that they can pull large files or do
setup that takes a while before accepting tasks. since this is before start,
it won't run into the idle timeout. we could likely use these for hot
containers in general and check between runs or something, but didn't do that
here.
one nascient concern is that for hot if the containers never become healthy
I don't think we will ever kill them and the slot will 'leak'. this is true
for this and for other cases (pulling image) I think, we should probably
recycle hot containers every hour or something which would also close this.
anyway, not a huge blocker I don't think, there will likely be 1 user of this
feature for a bit, it's not documented since we're not sure we want to support
it.
closes#336
* fn: prometheus collector concurrent map access
*) Added mutex to guard against concurrent access to maps
* fn: prometheus collector method receivers should be ptr
* fn: prometheus collector concurrent map access
*) Moved the mutex into getHistogramVec()
* fn: fnlb: default health state for new nodes
*) Any new node now by default is in unknown state.
*) One health check is required for unknown state to move in
to healthy/unhealthy states, then actual interval and
thresholds apply.
*) add() no longer runs health check as this is now handled
with the new logic.
This means during restarts fnlb will run one health check
immediately for nodes to switch to healthy/unhealthy state
to ensure speedy start, but guard against routing traffic to
unhealthy servers. After this initial state, nodes are
subjected to regular interval and thresholds.
* *) style fixes
* fn: fnlb: enhancements and new grouper tests
*) added healthy threshold (default: 1)
*) grouper is now using configured hcEndpoint for version checks
*) grouper now logs when servers switch between healthy/unhealthy status
*) moved DB code out of grouper
*) run health check immediately at start (don't wait until hcInterval)
*) optional shutdown timeout (default: 0) & mgmt port (default: 8081)
*) hot path List() in grouper now uses atomic ptr Load
*) consistent router: moved closure to a new function
*) bugfix: version parsing from fn servers should not panic fnlb
*) bugfix: servers removed from DB, stayed in healthy list
*) bugfix: if DB is down, health checker stopped monitoring
*) basic new tests for grouper (add/rm/unhealthy/healthy) server
* Docker stats to Prometheus
* Fix compilation error in docker_test
* Refactor docker driver Run function to wait for the container to have stopped before stopping the colleciton of statistics
* Fix go fmt errors
* Updates to sending docker stats to Prometheus
* remove new test TestWritResultImpl because we changes to support multiple waiters have been removed
* Update docker.Run to use channels not contextrs to shut down stats collector