another bone head fail here. hopefully can leave this alone meow...
newID does not get inlined, but doesn't allocate to trigger any stack
expansion either. net perf hit on my laptop is 5ns, and we get a test out of
it. it will push a new stack, so it's not negligible overhead and we could
avoid it. we could keep the logic in both places just to have a test for it
separate instead of re-using the function in the hot path. up to us.
the error itself from up/down & dirty can be improved to show direction and
version information to help a user of the package determine where things went
wrong, which is useful when a series of migrations are run and the db error
itself is not clear about what went wrong exactly.
* Move delegated agent creation within NewLBAgent so we can hide the fact we disable docker
* Move delegated agent creation within NewPureRunner for better encapsulation
* Move out node-pool manager and replace it with RunnerPool extension
* adds extension points for runner pools in load-balanced mode
* adds error to return values in RunnerPool and Runner interfaces
* Implements runner pool contract with context-aware shutdown
* fixes issue with range
* fixes tests to use runner abstraction
* adds empty test file as a workaround for build requiring go source files in top-level package
* removes flappy timeout test
* update docs to reflect runner pool setup
* refactors system tests to use runner abstraction
* removes poolmanager
* moves runner interfaces from models to api/runnerpool package
* Adds a second runner to pool docs example
* explicitly check for request spillover to second runner in test
* moves runner pool package name for system tests
* renames runner pool pointer variable for consistency
* pass model json to runner
* automatically cast to http.ResponseWriter in load-balanced call case
* allow overriding of server RunnerPool via a programmatic ServerOption
* fixes return type of ResponseWriter in test
* move Placer interface to runnerpool package
* moves hash-based placer out of open source project
* removes siphash from Gopkg.lock
Added env FN_MAX_FS_SIZE_MB, which if defined and non-zero
is passed to docker as storage opt size. We do not validate
if this option is supported by docker currently. This is
because it's difficult to actually validate this since it
not only depends on storage driver and its backing filesystem,
but also the mount options used to mount that fs.
* fn: reorg agent config
*) Moving constants in agent to agent config, which helps
with testing, tuning.
*) Added max total cpu & memory for testing & clamping max
mem & cpu usage if needed.
* fn: adjust PipeIO time
* fn: for hot, cannot reliably test EndOfLogs in TestRouteRunnerExecution
* add jaeger support, link hot container & req span
* adds jaeger support now with FN_JAEGER_URL, there's a simple tutorial in the
operating/metrics.md file now and it's pretty easy to get up and running.
* links a hot request span to a hot container span. when we change this to
sample at a lower ratio we'll need to finagle the hot container span to always
sample or something, otherwise we'll hide that info. at least, since we're
sampling at 100% for now if this is flipped on, can see freeze/unfreeze etc.
if they hit. this is useful for debugging. note that zipkin's exporter does
not follow the link at all, hence jaeger... and they're backed by the Cloud
Empire now (CNCF) so we'll probably use it anyway.
* vendor: add thrift for jaeger
* fn, dockerd pid collector & go collector metrics
the prometheus client we're using has a nice collector for process metrics and
for go metrics. these are things we are very interested in operationally and
recently the benevolent team at opencensus made this possible again, so this
hooks it up for us with added dockerd sugar.
nannying the dockerd we're using should be super useful since that thing likes
to get carried away, it'll be nice to differentiate memory/cpu usage between
dockerd / the host / fn. this will basically only work in a 'dind'
environment, or on a linux host that is running fn outside of docker that is
configured with the permissions to be able to check this. otherwise, it will
simply fail. we also probably want disk i/o and net i/o information for that
as well, or at least it would be interesting to differentiate from the host,
but this isn't hooked up in the default collectors unfortunately.
dockerd:
```
dockerd_process_cpu_seconds_total 520.74
dockerd_process_max_fds 1.048576e+06
dockerd_process_resident_memory_bytes 9.033728e+07
dockerd_process_start_time_seconds 1.52029677322e+09
dockerd_process_virtual_memory_bytes 1.782509568e+09
```
fn:
```
fn_process_cpu_seconds_total 0.14
fn_process_max_fds 1024
fn_process_open_fds 12
fn_process_resident_memory_bytes 2.7348992e+07
fn_process_start_time_seconds 1.52056274238e+09
fn_process_virtual_memory_bytes 7.20068608e+08
```
go:
```
go_gc_duration_seconds{quantile="0"} 4.4194e-05
go_gc_duration_seconds{quantile="0.25"} 9.8118e-05
go_gc_duration_seconds{quantile="0.5"} 0.000105989
go_gc_duration_seconds{quantile="0.75"} 0.000106251
go_gc_duration_seconds{quantile="1"} 0.000157864
go_gc_duration_seconds_sum 0.000512416
go_gc_duration_seconds_count 5
go_goroutines 30
go_memstats_alloc_bytes 3.897696e+06
go_memstats_alloc_bytes_total 1.2916016e+07
go_memstats_buck_hash_sys_bytes 1.45034e+06
go_memstats_frees_total 75399
go_memstats_gc_sys_bytes 450560
go_memstats_heap_alloc_bytes 3.897696e+06
go_memstats_heap_idle_bytes 868352
go_memstats_heap_inuse_bytes 5.750784e+06
go_memstats_heap_objects 29925
go_memstats_heap_released_bytes_total 0
go_memstats_heap_sys_bytes 6.619136e+06
go_memstats_last_gc_time_seconds 1.520562751182639e+09
go_memstats_lookups_total 239
go_memstats_mallocs_total 105324
go_memstats_mcache_inuse_bytes 3472
go_memstats_mcache_sys_bytes 16384
go_memstats_mspan_inuse_bytes 90592
go_memstats_mspan_sys_bytes 98304
go_memstats_next_gc_bytes 6.31304e+06
go_memstats_other_sys_bytes 710548
go_memstats_stack_inuse_bytes 720896
go_memstats_stack_sys_bytes 720896
go_memstats_sys_bytes 1.0066168e+07
```
* cache pid until it stops working
* move mattes migrations to migratex
* changes format of migrations to migratex format
* updates test runner to use new interface (double checked this with printlns,
the tests go fully down and then up, and work on pg/mysql)
* remove mattes/migrate
* update tests from deps
* update readme
* fix other file extensions
* Refactor PureRunner as an Agent so that it encapsulates its grpc server
* Maintain a list of extra contexts for the server to select on to handle errors and cancellations
1) in theory it may be possible for an exited container to
requeue a slot, close this gap by always setting fatal error
for a slot if a container has exited.
2) when a client request times out or cancelled (client
disconnect, etc.) the slot should not be allowed to be
requeued and container should terminate to avoid accidental
mixing of previous response into next.
code is feature complete in the general sense, with minor TODO left.
this is just a patch with 'migratex' and does not use it for fn's migrations
yet, would like to get feedback prior to doing that.
presenting:
A migration library loosely based on pressly/goose and mattes/migrate design,
that does migrations across a smattering of sql databases by only accepting a
`*sqlx.DB`.
why?
* goose didn't support kindly allowing us to rebind transactions based on a
given db to various dialects or offer oracle support
* goose didn't support locking the db (maybe not needed with tx? it's late..
we may want to lock the whole db eventually?)
* goose requires us to do semi-complex migration to it from mattes/migrate
* mattes has stepped down as migrate maintainer and the project is in flux
* mattes/migrate did not allow us to define migrations in go and rebind to
different dialects, an issue since we need to insert ids in our own format and
can't define this in sql
* neither handled context plumbing and risked issues there for various
reasons (deadlock, etc).
* I think I'm forgetting 1 or 2
in the style of goose, this lets us define `*sqlx.Tx` up and down funcs in go
code, but uses mattes' migration table so we don't need to migrate that and
retains its lock behavior with added tx sugar and less errors. most
importantly, this code is terse, leveraging sqlx to support a lot of sql dbs
(unlike mattes) and we control this. there is one useful TODO to handle
migrations failing at startup more gracefully, in prod stuff like that will be
nice to have. open to discussion of putting in a separate library, the
landscape of go sql migrators is... really something.
TODO make test suite and test against sqlite3, pg, mysql [, oracledb] like we
have for our own unit tests. I'm thinking it's faster to wire up through
there and use our bevy of migrations?
This change add the option to set a timeout for the dialer used in
making gRPC connection, with that we remove the check on the state of
the connections and therefore remove any potential race conditions.
If a runner disconnect not gracefully it could happen that the
connection gets stuck in connecting mode, this change verifies the state
of the connection before starting to execute a call, if the client
connection is not ready we fail fast to give a change to the next runner
(if any) to execute the call.