we had the inspect container here for 3 reasons:
1) get exit code
2) see if container is still running (debugging madness)
3) see if docker thinks it was an OOM
1) is something wait returns, but due to 2) and 3) we just delayed it until
inspection
2) was really just for debugging since we had 3)
3) seems unnecessary. to me, an OOM is an OOM is an OOM. so why have a whole
docker inspect call just to find out? (we could move this down, since it's a
sad path, and make the call only when necessary, but are we really getting any
value from this distinction anyway? i've never ran into it, myself)
inspect was actually causing tasks to time out, since the call to inspect
could put us over our task timeout, even though our container ran to
completion. we could have fixed this by checking the context earlier, but we
don't really need inspect either, which will reduce the docker calls we make,
which will make more unicorn puppers. now tasks should have more 'true'
timeouts.
tried to boy scout, but tracing patch also cleans this block up too.
we finally graduated high school and can make our own ramen
we no longer need this since fn appears to have no concept of canceling tasks
through an api we need to watch, and the context is plumbed if the request is
canceled. since tasks are short, we may never need to do cancellation of
running tasks like we had with iron worker. this was an added docker call
that's unnecessary since we are doing force removal of the container at the
end anyway.
we're routinely doing transactions which will hold up connections for some
time, it was pretty easy to run out of 30 conns from routine function
invocations. the 'right' thing is probably to add a config val to the url that
we can strip before passing into the db, but i'm not sure i want to have our
own query params in db urls, either.
we had this _almost_ right, in that we were trying, but we weren't masking the
error from the user response for any error we don't intend to show. this also
adds a stack trace from any internal server errors, so that we might be able
to track them down in the future (looking at you, 'context deadline
exceeded'). in addition, this adds a new `models.APIError` interface which all
of the errors in `models` now implement, and can be caught easily / added to
easily.
the front end now does no status rewriting based on api errors, now when we
get a non-nil error we can call `handleResponse(c, err)` with it and if it's a
proper error, return it to the user with the right status code, otherwise log
a stack trace and return `internal server error`. this cleans up a lot of the
front end code.
also rewrites start task ctx deadline exceeded as timeout. with iw we had
async tasks so we could start the clock later and it didn't matter, but now
with sync tasks time out sometimes just making docker calls, and we want the
task status to show up as timed out. we may want to just catch all this above
in addition to this, but this seems like the right thing to do.
remove squishing together errors. this was weird, now we return the first
error for the purposes of using the new err interface.
removed a lot of 5xx errors that really should have been 4xx errors. changed
some of the 400 errors to 409 errors, since they are from sending in
conflicting info and not a malformed request.
removed unused errors / useless errors (many were used for logging, and didn't
provide any context. now with stack traces we don't need context as much in
the logs).
replace default bolt option with sqlite3 option. the story here is that we
just need a working out of the box solution, and sqlite3 is just fine for that
(actually, likely better than bolt).
with sqlite3 supplanting bolt, we mostly have sql databases. so remove redis
and then we just have one package that has a `sql` implementation of the
`models.Datastore` and lean on sqlx to do query rewriting. this does mean
queries have to be formed a certain way and likely have to be ANSI-SQL (no
special features) but we weren't using them anyway and our base api is
basically done and we can easily extend this api as needed to only implement
certain methods in certain backends if we need to get cute.
* remove bolt & redis datastores (can still use as mqs)
* make sql queries work on all 3 (maybe?)
* remove bolt log store and use sqlite3
* shove the FnLog shit into the datastore shit for now (free pg/mysql logs...
just for demos, etc, not prod)
* fix up the docs to remove bolt references
* add sqlite3, sqlx dep
* fix up tests & mock stuff, make validator less insane
* remove put & get in datastore layer as nobody is using.
this passes tests which at least seem like they test all the different
backends. if we trust our tests then this seems to work great. (tests `make
docker-test-run-with-*` work now too)