mirror of
https://github.com/fnproject/fn.git
synced 2022-10-28 21:29:17 +03:00
ccde0d235749161c5d59370ccfea516195680f7c
6 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
56a2861748 |
move calls to logstore, implement s3 (#911)
* move calls to logstore, implement s3 closes #482 the basic motivation is that logs and calls will be stored with a very high write rate, while apps and routes will be relatively infrequently updated; it follows that we should likely split up their storage location, to back them with appropriate storage facilities. s3 is a good candidate for ingesting higher write rate data than a sql database, and will make it easier to manage that data set. can read #482 for more detailed justification. summary: * calls api moved from datastore to logstore * logstore used in front-end to serve calls endpoints * agent now throws calls into logstore instead of datastore * s3 implementation of calls api for logstore * s3 logs key changed (nobody using / nbd?) * removed UpdateCall api (not in use) * moved call tests from datastore to logstore tests * mock logstore now tested (prev. sqlite3 only) * logstore tests run against every datastore (mysql, pg; prev. only sqlite3) * simplify NewMock in tests commentary: brunt of the work is implementing the listing of calls in GetCalls for the s3 logstore implementation. the GetCalls API requires returning items in the newest to oldest order, and the s3 api lists items in lexicographic order based on created_at. An easy thing to do here seemed to be to reverse the encoding of our id format to return a lexicographically descending order, since ids are time based, reasonably encoded to be lexicographically sortable, and de-duped (unlike created_at). This seems to work pretty well, it's not perfect around the boundaries of to_time and from_time and a tiny amount of results may be omitted, but to me this doesn't seem like a deal breaker to get 6999 results instead of 7000 when trying to get calls between 3:00pm and 4:00pm Monday 3 weeks ago. Of course, without to_time and from_time, there are no issues in listing results. We could use created at and encode it, but it would be an additional marker for point lookup (GetCall) since we would have to search for a created_at stamp, search for ids around that until we find the matching one, just to do a point lookup. So, the tradeoff here seems worth it. There is additional optimization around to_time to seek over newer results (since we have descending order). The other complication in GetCalls is returning a list of calls for a given path. Since the keys to do point lookups are only app_id + call_id, and we need listing across an app as well, this leads us to the 'marker' collection which is sorted by app_id + path + call_id, to allow quick listing by path. All in all, it should be pretty straightforward to follow the implementation and I tried to be lavish with the comments, please let me know if anything needs further clarification in the code. The implementation itself has some glaring inefficiencies, but they're relatively minute: json encoding is kinda lazy, but workable; s3 doesn't offer batch retrieval, so we point look up each call one by one in get call; not re-using buffers -- but the seeking around the keys should all be relatively fast, not too worried about performance really and this isn't a hot path for reads (need to make a cut point and turn this in!). Interestingly, in testing, minio performs significantly worse than pg for storing both logs and calls (or just logs, I tested that too). minio seems to have really high cpu consumption, but in any event, we won't be using minio, we'll be using a cloud object store that implements the s3 api. Anyway, mostly a knock on using minio for high performance, not really anything to do with this, just thought it was interesting. I think it's safe to remove UpdateCall, admittedly this made implementing the s3 api a lot easier. This operation may also be something we never need, it was unused at present and was only in the cards for a previous hybrid implementation, which we've now abandoned. If we need, we can always resurrect from git. Also not worried about changing the log key, we need to put a prefix on this thing anyway, but I don't think anybody is using this anyway. in any event, it simply means old logs won't show up through the API, but aside from nobody using this yet, that doesn't seem a big deal breaker really -- new logs will appear fine. future: TODO make logstore implementation optional for datastore, check in front-end at runtime and offer a nil logstore that errors appropriately TODO low hanging fruit optimizations of json encoding, re-using buffers for download, get multiple calls at a time, id reverse encoding could be optimized like normal encoding to not be n^2 TODO api for range removal of logs and calls * address review comments * push id to_time magic into id package * add note about s3 key sizes * fix validation check |
||
|
|
3c15ca6ea6 |
App ID (#641)
* App ID * Clean-up * Use ID or name to reference apps * Can use app by name or ID * Get rid of AppName for routes API and model routes API is completely backwards-compatible routes API accepts both app ID and name * Get rid of AppName from calls API and model * Fixing tests * Get rid of AppName from logs API and model * Restrict API to work with app names only * Addressing review comments * Fix for hybrid mode * Fix rebase problems * Addressing review comments * Addressing review comments pt.2 * Fixing test issue * Addressing review comments pt.3 * Updated docstring * Adjust UpdateApp SQL implementation to work with app IDs instead of names * Fixing tests * fmt after rebase * Make tests green again! * Use GetAppByID wherever it is necessary - adding new v2 endpoints to keep hybrid api/runner mode working - extract CallBase from Call object to expose that to a user (it doesn't include any app reference, as we do for all other API objects) * Get rid of GetAppByName * Adjusting server router setup * Make hybrid work again * Fix datastore tests * Fixing tests * Do not ignore app_id * Resolve issues after rebase * Updating test to make it work as it was * Tabula rasa for migrations * Adding calls API test - we need to ensure we give "App not found" for the missing app and missing call in first place - making previous test work (request missing call for the existing app) * Make datastore tests work fine with correctly applied migrations * Make CallFunction middleware work again had to adjust its implementation to set app ID before proceeding * The biggest rebase ever made * Fix 8's migration * Fix tests * Fix hybrid client * Fix tests problem * Increment app ID migration version * Fixing TestAppUpdate * Fix rebase issues * Addressing review comments * Renew vendor * Updated swagger doc per recommendations |
||
|
|
20089c4e83 |
make headers quasi-consistent (#660)
possible breakages: * `FN_HEADER` on cold are no longer `s/-/_/` -- this is so that cold functions can rebuild the headers as they were when they came in on the request (fdks, specifically), there's no guarantee that a reversal `s/_/-/` is the original header on the request. * app and route config no longer `s/-/_/` -- it seemed really weird to rewrite the users config vars on these. should just pass them exactly as is to env. * headers no longer contain the environment vars (previously, base config; app config, route config, `FN_PATH`, etc.), these are still available in the environment. this gets rid of a lot of the code around headers, specifically the stuff that shoved everything into headers when constructing a call to begin with. now we just store the headers separately and add a few things, like FN_CALL_ID to them, and build a separate 'config' now to store on the call. I thought 'config' was more aptly named, 'env' was confusing, though now 'config' is exactly what 'base_vars' was, which is only the things being put into the env. we weren't storing this field in the db, this doesn't break unless there are messages in a queue from another version, anyway, don't think we're there and don't expect any breakage for anybody with field name changes. this makes the configuration stuff pretty straight forward, there's just two separate buckets of things, and cold just needs to mash them together into the env, and otherwise hot containers just need to put 'config' in the env, and then hot format can shove 'headers' in however they'd like. this seems better than my last idea about making this easier but worse (RIP). this means: * headers no longer contain all vars, the set of base vars can only be found in the environment. * headers is only the headers from request + call_id, deadline, method, url * for cold, we simply add the headers to the environment, prepending `FN_HEADER_` to them, BUT NOT upper casing or `s/-/_/` * fixes issue where async hot functions would end up with `Fn_header_` prefixed headers * removes idea of 'base' vars and 'env'. this was a strange concept. now we just have 'config' which was base vars, and headers, which was base_env+headers; i.e. they are disjoint now. * casing for all headers will lean to be `My-Header` style, which should help with consistency. notable exceptions for cold only are FN_CALL_ID, FN_METHOD, and FN_REQUEST_URL -- this is simply to avoid breakage, in either hot format they appear as `Fn_call_id` still. * removes FN_PARAM stuff * updated doc with behavior weird things left: `Fn_call_id` e.g. isn't a correctly formatted http header, it should likely be `Fn-Call-Id` but I wanted to live to fight another day on this one, it would add some breakage. examples to be posted of each format below closes #329 |
||
|
|
2ebc9c7480 |
hybrid mergy (#581)
* so it begins * add clarification to /dequeue, change response to list to future proof * Specify that runner endpoints are also under /v1 * Add a flag to choose operation mode (node type). This is specified using the `FN_NODE_TYPE` environment variable. The default is the existing behaviour, where the server supports all operations (full API plus asynchronous and synchronous runners). The additional modes are: * API - the full API is available, but no functions are executed by the node. Async calls are placed into a message queue, and synchronous calls are not supported (invoking them results in an API error). * Runner - only the invocation/route API is present. Asynchronous and synchronous invocation requests are supported, but asynchronous requests are placed onto the message queue, so might be handled by another runner. * Add agent type and checks on Submit * Sketch of a factored out data access abstraction for api/runner agents * Fix tests, adding node/agent types to constructors * Add tests for full, API, and runner server modes. * Added atomic UpdateCall to datastore * adds in server side endpoints * Made ServerNodeType public because tests use it * Made ServerNodeType public because tests use it * fix test build * add hybrid runner client pretty simple go api client that covers surface area needed for hybrid, returning structs from models that the agent can use directly. not exactly sure where to put this, so put it in `/clients/hybrid` but maybe we should make `/api/runner/client` or something and shove it in there. want to get integration tests set up and use the real endpoints next and then wrap this up in the DataAccessLayer stuff. * gracefully handles errors from fn * handles backoff & retry on 500s * will add to existing spans for debuggo action * minor fixes * meh |
||
|
|
caba9e0ec6 |
more strict configuration of routes
* idle_timeout max of 1h * timeout max of 120s for sync, 1h for async * max memory of 8GB * do full route validation before call invocation * ensure that idle_timeout >= timeout we are now doing validation of updating route inside of the database transaction, which is what we should have been doing all along really. we need this behavior to ensure that the idle timeout is longer than the timeout, among other benefits (like not updating the most recent version of the existing struct and overwriting previous updates, yay). since we have this, we can get rid of the weird skipZero behavior on validate too and validate the real deal holyfield. validating the route before making the call is handy so that we don't do weird things like run a func that wants to use 300GB of RAM and run for 3 weeks. closes #192 closes #344 closes #162 |
||
|
|
337e962416 |
add pagination to all list endpoints
calls, apps, and routes listing were previously returning the entire data set, which just won't scale. this adds pagination with cursoring forward to each of these endpoints (see the [docs](docs/definitions.md)). the patch is really mostly tests, shouldn't be that bad to pick through. some blarble about implementation is in order: calls are sorted by ids but allow searching within certain `created_at` ranges (finally). this is because sorting by `created_at` isn't feasible when combined with paging, as `created_at` is not guaranteed to be unique -- id's are (eliding theoreticals). i.e. on a page boundary, if there are 200 calls with the same `created_at`, providing a `cursor` of that `created_at` will skip over the remaining N calls with that `created_at`. also using id will be better on the index anyway (well, less of them). yay having sortable ids! I can't discern any issues doing this, as even if 200 calls have the same created_at, they will have different ids, and the sort should allow paginating them just fine. ids are also url safe, so the id works as the cursor value just fine. apps and routes are sorted by alphabetical order. as they aren't guaranteed to be url safe, we are base64'ing them in the front end to a url safe format and then returning them, and then base64 decoding them when we get them. this does mean that they can be relatively large if the path/app is long, but if we don't want to add ids then they were going to be pretty big anyway. a bonus that this kind of obscures them. if somebody has better idea on formatting, by all means. notably, we are not using the sql paging facilities, and we are baking our own based on cursors, which ends up being much more efficient for querying longer lists of resources. this also should be easy to implement in other non-sql dbs and the cursoring formats we can change on the fly since we are just exposing them as opaque strings. the front end deals with the base64 / formatting, etc and the back end is taking raw values (strfmt.DateTime or the id for calls). the cursor that is being passed to/by the user is simply the last resource on the previous page, so in theory we don't even need to return it, but it does make it a little easier to use, also, cursor being blank on the last page depends on page full-ness, so sometimes users will get a cursor when there are no results on next page (1/N chance, and it's not really end of world -- actually searching for the next thing would make things more complex). there are ample tests for this behavior. I've turned off all query parameters allowing `LIKE` queries on certain listing endpoints, as we should not expose sql behavior through our API in the event that we end up not using a sql db down the road. I think we should only allow prefix matching, which sql can support as well as other types of databases relatively cheaply, but this is not hooked up here as it didn't 'just work' when I was fiddling with it (can add later, they're unnecessary and weren't wired in before in front end). * remove route listing across apps (unused) * fix panic when doing `/app//`. this is prob possible for other types of endpoints, out of scope here. added a guard in front of all endpoints for this * adds `from_time` and `to_time` query parameters to calls, so you can e.g. list the last hour of tasks. these are not required and default to oldest/newest. * hooked back up the datastore tests to the sql db, only run with sqlite atm, but these are useful, added a lot to them too. * added a bunch of tests to the front end, so pretty sure this all works now. * added to swagger, we'll need to re-gen. also wrote some words about pagination workings, I'm not sure how best to link to these, feedback welcome. * not sure how we want to manage indexes, but we may need to add some (looking at created_at, mostly) * `?route` changed to `?path` in routes listing, to keep consistency with everything else * don't 404 when searching for calls where the route doesn't exist, just return an empty list (it's a query param ffs) closes #141 |