mirror of
https://github.com/fnproject/fn.git
synced 2022-10-28 21:29:17 +03:00
a481191db235f2dc4bb8d11e2c1abdb7e551bfe1
7 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
a481191db2 |
migratex api uses tx now instead of db (#939)
* migratex api uses tx now instead of db we want to be able to do external queries outside of the migration itself inside of the same transaction for version checking. if we don't do this, we risk the case where we set the version to the latest but we don't run the table creates at all, so we have a db that thinks it's up to date but doesn't even have any tables, and on subsequent boots if a migration slides in then the migrations will run when there are no tables. it was unlikely, but now it's dead. * tx friendly table exists check the previous existence checker for dbs was relying on getting back errors about the db not existing. if we use this in a tx, it makes the whole tx invalid for postgres. so, now we have count the table queries which return a 1 or a 0 instead of a 1 or an error so that we can check existence inside of a transaction. voila. |
||
|
|
56a2861748 |
move calls to logstore, implement s3 (#911)
* move calls to logstore, implement s3 closes #482 the basic motivation is that logs and calls will be stored with a very high write rate, while apps and routes will be relatively infrequently updated; it follows that we should likely split up their storage location, to back them with appropriate storage facilities. s3 is a good candidate for ingesting higher write rate data than a sql database, and will make it easier to manage that data set. can read #482 for more detailed justification. summary: * calls api moved from datastore to logstore * logstore used in front-end to serve calls endpoints * agent now throws calls into logstore instead of datastore * s3 implementation of calls api for logstore * s3 logs key changed (nobody using / nbd?) * removed UpdateCall api (not in use) * moved call tests from datastore to logstore tests * mock logstore now tested (prev. sqlite3 only) * logstore tests run against every datastore (mysql, pg; prev. only sqlite3) * simplify NewMock in tests commentary: brunt of the work is implementing the listing of calls in GetCalls for the s3 logstore implementation. the GetCalls API requires returning items in the newest to oldest order, and the s3 api lists items in lexicographic order based on created_at. An easy thing to do here seemed to be to reverse the encoding of our id format to return a lexicographically descending order, since ids are time based, reasonably encoded to be lexicographically sortable, and de-duped (unlike created_at). This seems to work pretty well, it's not perfect around the boundaries of to_time and from_time and a tiny amount of results may be omitted, but to me this doesn't seem like a deal breaker to get 6999 results instead of 7000 when trying to get calls between 3:00pm and 4:00pm Monday 3 weeks ago. Of course, without to_time and from_time, there are no issues in listing results. We could use created at and encode it, but it would be an additional marker for point lookup (GetCall) since we would have to search for a created_at stamp, search for ids around that until we find the matching one, just to do a point lookup. So, the tradeoff here seems worth it. There is additional optimization around to_time to seek over newer results (since we have descending order). The other complication in GetCalls is returning a list of calls for a given path. Since the keys to do point lookups are only app_id + call_id, and we need listing across an app as well, this leads us to the 'marker' collection which is sorted by app_id + path + call_id, to allow quick listing by path. All in all, it should be pretty straightforward to follow the implementation and I tried to be lavish with the comments, please let me know if anything needs further clarification in the code. The implementation itself has some glaring inefficiencies, but they're relatively minute: json encoding is kinda lazy, but workable; s3 doesn't offer batch retrieval, so we point look up each call one by one in get call; not re-using buffers -- but the seeking around the keys should all be relatively fast, not too worried about performance really and this isn't a hot path for reads (need to make a cut point and turn this in!). Interestingly, in testing, minio performs significantly worse than pg for storing both logs and calls (or just logs, I tested that too). minio seems to have really high cpu consumption, but in any event, we won't be using minio, we'll be using a cloud object store that implements the s3 api. Anyway, mostly a knock on using minio for high performance, not really anything to do with this, just thought it was interesting. I think it's safe to remove UpdateCall, admittedly this made implementing the s3 api a lot easier. This operation may also be something we never need, it was unused at present and was only in the cards for a previous hybrid implementation, which we've now abandoned. If we need, we can always resurrect from git. Also not worried about changing the log key, we need to put a prefix on this thing anyway, but I don't think anybody is using this anyway. in any event, it simply means old logs won't show up through the API, but aside from nobody using this yet, that doesn't seem a big deal breaker really -- new logs will appear fine. future: TODO make logstore implementation optional for datastore, check in front-end at runtime and offer a nil logstore that errors appropriately TODO low hanging fruit optimizations of json encoding, re-using buffers for download, get multiple calls at a time, id reverse encoding could be optimized like normal encoding to not be n^2 TODO api for range removal of logs and calls * address review comments * push id to_time magic into id package * add note about s3 key sizes * fix validation check |
||
|
|
4084b727c0 |
phase 2: mattes/migrate -> migratex (#848)
* move mattes migrations to migratex * changes format of migrations to migratex format * updates test runner to use new interface (double checked this with printlns, the tests go fully down and then up, and work on pg/mysql) * remove mattes/migrate * update tests from deps * update readme * fix other file extensions |
||
|
|
206aa3c203 |
opentracing -> opencensus (#802)
* update vendor directory, add go.opencensus.io * update imports * oops * s/opentracing/opencensus/ & remove prometheus / zipkin stuff & remove old stats * the dep train rides again * fix gin build * deps from last guy * start in on the agent metrics * she builds * remove tags for now, cardinality error is fussing. subscribe instead of register * update to patched version of opencensus to proceed for now TODO switch to a release * meh fix imports * println debug the bad boys * lace it with the tags * update deps again * fix all inconsistent cardinality errors * add our own logger * fix init * fix oom measure * remove bugged removal code * fix s3 measures * fix prom handler nil |
||
|
|
faaf5846ce |
Use retry func while trying to ping SQL datastore (#630)
* Use retry func while trying to ping SQL datastore - implements retry func specifically for SQL datastore ping - fmt fixes - using sqlx.Db.PingContext instead of sqlx.Db.Ping - propogate context to SQL datastore * Rely on context from ServerOpt * Consolidate log instances * Cleanup * Fix server usage in API tests |
||
|
|
61b416a9b5 |
automagic sql db migrations (#461)
* adds migrations closes #57 migrations only run if the database is not brand new. brand new databases will contain all the right fields when CREATE TABLE is called, this is for readability mostly more than efficiency (do not want to have to go through all of the database migrations to ascertain what columns a table has). upon startup of a new database, the migrations will be analyzed and the highest version set, so that future migrations will be run. this should also avoid running through all the migrations, which could bork db's easily enough (if the user just exits from impatience, say). otherwise, all migrations that a db has not yet seen will be run against it upon startup, this should be seamless to the user whether they had a db that had 0 migrations run on it before or N. this means users will not have to explicitly run any migrations on their dbs nor see any errors when we upgrade the db (so long as things go well). if migrations do not go so well, users will have to manually repair dbs (this is the intention of the `migrate` library and it seems sane), this should be rare, and I'm unsure myself how best to resolve not having gone through this myself, I would assume it will require running down migrations and then manually updating the migration field; in any case, docs once one of us has to go through this. migrations are written to files and checked into version control, and then use go-bindata to generate those files into go code and compiled in to be consumed by the migrate library (so that we don't have to put migration files on any servers) -- this is also in vcs. this seems to work ok. I don't like having to use the separate go-bindata tool but it wasn't really hard to install and then go generate takes care of the args. adding migrations should be relatively rare anyway, but tried to make it pretty painless. 1 migration to add created_at to the route is done here as an example of how to do migrations, as well as testing these things ;) -- `created_at` will be `0001-01-01T00:00:00.000Z` for any existing routes after a user runs this version. could spend the extra time adding 'today's date to any outstanding records, but that's not really accurate, the main thing is nobody will have to nuke their db with the migrations in place & we don't have any prod clusters really to worry about. all future routes will correctly have `created_at` set, and plan to add other timestamps but wanted to keep this patch as small as possible so only did routes.created_at. there are tests that a spankin new db will work as expected as well as a db after running all down & up migrations works. the latter tests only run on mysql and postgres, since sqlite3 does not like ALTER TABLE DROP COLUMN; up migrations will need to be tested manually for sqlite3 only, but in theory if they are simple and work on postgres and mysql, there is a good likelihood of success; the new migration from this patch works on sqlite3 fine. for now, we need to use `github.com/rdallman/migrate` to move forward, as getting integrated into upstream is proving difficult due to `github.com/go-sql-driver/mysql` being broken on master (yay dependencies). Fortunately for us, we vendor a version of the `mysql` bindings that actually works, thus, we are capable of using the `mattes/migrate` library with success due to that. this also will require go1.9 to use the new `database/sql.Conn` type, CI has been updated accordingly. some doc fixes too from testing.. and of course updated all deps. anyway, whew. this should let us add fields to the db without busting everybody's dbs. open to feedback on better ways, but this was overall pretty simple despite futzing with mysql. * add migrate pkg to deps, update deps use rdallman/migrate until we resolve in mattes land * add README in migrations package * add ref to mattes lib |
||
|
|
337e962416 |
add pagination to all list endpoints
calls, apps, and routes listing were previously returning the entire data set, which just won't scale. this adds pagination with cursoring forward to each of these endpoints (see the [docs](docs/definitions.md)). the patch is really mostly tests, shouldn't be that bad to pick through. some blarble about implementation is in order: calls are sorted by ids but allow searching within certain `created_at` ranges (finally). this is because sorting by `created_at` isn't feasible when combined with paging, as `created_at` is not guaranteed to be unique -- id's are (eliding theoreticals). i.e. on a page boundary, if there are 200 calls with the same `created_at`, providing a `cursor` of that `created_at` will skip over the remaining N calls with that `created_at`. also using id will be better on the index anyway (well, less of them). yay having sortable ids! I can't discern any issues doing this, as even if 200 calls have the same created_at, they will have different ids, and the sort should allow paginating them just fine. ids are also url safe, so the id works as the cursor value just fine. apps and routes are sorted by alphabetical order. as they aren't guaranteed to be url safe, we are base64'ing them in the front end to a url safe format and then returning them, and then base64 decoding them when we get them. this does mean that they can be relatively large if the path/app is long, but if we don't want to add ids then they were going to be pretty big anyway. a bonus that this kind of obscures them. if somebody has better idea on formatting, by all means. notably, we are not using the sql paging facilities, and we are baking our own based on cursors, which ends up being much more efficient for querying longer lists of resources. this also should be easy to implement in other non-sql dbs and the cursoring formats we can change on the fly since we are just exposing them as opaque strings. the front end deals with the base64 / formatting, etc and the back end is taking raw values (strfmt.DateTime or the id for calls). the cursor that is being passed to/by the user is simply the last resource on the previous page, so in theory we don't even need to return it, but it does make it a little easier to use, also, cursor being blank on the last page depends on page full-ness, so sometimes users will get a cursor when there are no results on next page (1/N chance, and it's not really end of world -- actually searching for the next thing would make things more complex). there are ample tests for this behavior. I've turned off all query parameters allowing `LIKE` queries on certain listing endpoints, as we should not expose sql behavior through our API in the event that we end up not using a sql db down the road. I think we should only allow prefix matching, which sql can support as well as other types of databases relatively cheaply, but this is not hooked up here as it didn't 'just work' when I was fiddling with it (can add later, they're unnecessary and weren't wired in before in front end). * remove route listing across apps (unused) * fix panic when doing `/app//`. this is prob possible for other types of endpoints, out of scope here. added a guard in front of all endpoints for this * adds `from_time` and `to_time` query parameters to calls, so you can e.g. list the last hour of tasks. these are not required and default to oldest/newest. * hooked back up the datastore tests to the sql db, only run with sqlite atm, but these are useful, added a lot to them too. * added a bunch of tests to the front end, so pretty sure this all works now. * added to swagger, we'll need to re-gen. also wrote some words about pagination workings, I'm not sure how best to link to these, feedback welcome. * not sure how we want to manage indexes, but we may need to add some (looking at created_at, mostly) * `?route` changed to `?path` in routes listing, to keep consistency with everything else * don't 404 when searching for calls where the route doesn't exist, just return an empty list (it's a query param ffs) closes #141 |