move calls to logstore, implement s3 (#911)

* move calls to logstore, implement s3

closes #482

the basic motivation is that logs and calls will be stored with a very high
write rate, while apps and routes will be relatively infrequently updated; it
follows that we should likely split up their storage location, to back them
with appropriate storage facilities. s3 is a good candidate for ingesting
higher write rate data than a sql database, and will make it easier to manage
that data set. can read #482 for more detailed justification.

summary:

* calls api moved from datastore to logstore
* logstore used in front-end to serve calls endpoints
* agent now throws calls into logstore instead of datastore
* s3 implementation of calls api for logstore
* s3 logs key changed (nobody using / nbd?)
* removed UpdateCall api (not in use)
* moved call tests from datastore to logstore tests
* mock logstore now tested (prev. sqlite3 only)
* logstore tests run against every datastore (mysql, pg; prev. only sqlite3)
* simplify NewMock in tests

commentary:

brunt of the work is implementing the listing of calls in GetCalls for the s3
logstore implementation. the GetCalls API requires returning items in the
newest to oldest order, and the s3 api lists items in lexicographic order
based on created_at. An easy thing to do here seemed to be to reverse the
encoding of our id format to return a lexicographically descending order,
since ids are time based, reasonably encoded to be lexicographically
sortable, and de-duped (unlike created_at). This seems to work pretty well,
it's not perfect around the boundaries of to_time and from_time and a tiny
amount of results may be omitted, but to me this doesn't seem like a deal
breaker to get 6999 results instead of 7000 when trying to get calls between
3:00pm and 4:00pm Monday 3 weeks ago. Of course, without to_time and
from_time, there are no issues in listing results. We could use created at and
encode it, but it would be an additional marker for point lookup (GetCall)
since we would have to search for a created_at stamp, search for ids around
that until we find the matching one, just to do a point lookup. So, the
tradeoff here seems worth it. There is additional optimization around to_time
to seek over newer results (since we have descending order).

The other complication in GetCalls is returning a list of calls for a given
path. Since the keys to do point lookups are only app_id + call_id, and we
need listing across an app as well, this leads us to the 'marker' collection
which is sorted by app_id + path + call_id, to allow quick listing by path.
All in all, it should be pretty straightforward to follow the implementation
and I tried to be lavish with the comments, please let me know if anything
needs further clarification in the code.

The implementation itself has some glaring inefficiencies, but they're
relatively minute: json encoding is kinda lazy, but workable; s3 doesn't offer
batch retrieval, so we point look up each call one by one in get call; not
re-using buffers -- but the seeking around the keys should all be relatively
fast, not too worried about performance really and this isn't a hot path for
reads (need to make a cut point and turn this in!).

Interestingly, in testing, minio performs significantly worse than pg for
storing both logs and calls (or just logs, I tested that too). minio seems to
have really high cpu consumption, but in any event, we won't be using minio,
we'll be using a cloud object store that implements the s3 api. Anyway, mostly
a knock on using minio for high performance, not really anything to do with
this, just thought it was interesting.

I think it's safe to remove UpdateCall, admittedly this made implementing the
s3 api a lot easier. This operation may also be something we never need, it
was unused at present and was only in the cards for a previous hybrid
implementation, which we've now abandoned. If we need, we can always resurrect
from git.

Also not worried about changing the log key, we need to put a prefix on this
thing anyway, but I don't think anybody is using this anyway. in any event, it
simply means old logs won't show up through the API, but aside from nobody
using this yet, that doesn't seem a big deal breaker really -- new logs will
appear fine.

future:

TODO make logstore implementation optional for datastore, check in front-end
at runtime and offer a nil logstore that errors appropriately

TODO low hanging fruit optimizations of json encoding, re-using buffers for
download, get multiple calls at a time, id reverse encoding could be optimized
like normal encoding to not be n^2

TODO api for range removal of logs and calls

* address review comments

* push id to_time magic into id package
* add note about s3 key sizes
* fix validation check
This commit is contained in:
Reed Allman
2018-04-05 10:49:25 -07:00
committed by GitHub
parent 6ca002973d
commit 56a2861748
27 changed files with 630 additions and 565 deletions

View File

@@ -8,12 +8,9 @@ import (
"testing"
"time"
"github.com/fnproject/fn/api/datastore/sql"
"github.com/fnproject/fn/api/id"
"github.com/fnproject/fn/api/models"
"github.com/go-openapi/strfmt"
"net/url"
"os"
)
var testApp = &models.App{
@@ -28,55 +25,131 @@ var testRoute = &models.Route{
Format: "http",
}
func SetupTestCall(t *testing.T, ctx context.Context, ds models.Datastore) *models.Call {
func SetupTestCall(t *testing.T, ctx context.Context, ls models.LogStore) *models.Call {
testApp.SetDefaults()
_, err := ds.InsertApp(ctx, testApp)
if err != nil {
t.Log(err.Error())
t.Fatalf("Test InsertLog(ctx, call.ID, logText): unable to insert app, err: `%v`", err)
}
testRoute.AppID = testApp.ID
_, err = ds.InsertRoute(ctx, testRoute)
if err != nil {
t.Log(err.Error())
t.Fatalf("Test InsertLog(ctx, call.ID, logText): unable to insert app route, err: `%v`", err)
}
var call models.Call
call.AppID = testApp.ID
call.CreatedAt = strfmt.DateTime(time.Now())
call.Status = "success"
call.StartedAt = strfmt.DateTime(time.Now())
call.CompletedAt = strfmt.DateTime(time.Now())
call.AppID = testApp.ID
call.Path = testRoute.Path
return &call
}
const tmpLogDb = "/tmp/func_test_log.db"
func SetupSQLiteDS(t *testing.T) models.Datastore {
os.Remove(tmpLogDb)
func Test(t *testing.T, fnl models.LogStore) {
ctx := context.Background()
uLog, err := url.Parse("sqlite3://" + tmpLogDb)
if err != nil {
t.Fatalf("failed to parse url: %v", err)
}
call := SetupTestCall(t, ctx, fnl)
ds, err := sql.New(ctx, uLog)
if err != nil {
t.Fatalf("failed to create sqlite3 datastore: %v", err)
}
return ds
}
// test list first, the rest are point lookup tests
t.Run("calls-get", func(t *testing.T) {
filter := &models.CallFilter{AppID: call.AppID, Path: call.Path, PerPage: 100}
now := time.Now()
call.CreatedAt = strfmt.DateTime(now)
call.ID = id.New().String()
err := fnl.InsertCall(ctx, call)
if err != nil {
t.Fatal(err)
}
calls, err := fnl.GetCalls(ctx, filter)
if err != nil {
t.Fatalf("Test GetCalls(ctx, filter): unexpected error `%v`", err)
}
if len(calls) != 1 {
t.Fatalf("Test GetCalls(ctx, filter): unexpected length `%v`", len(calls))
}
func Test(t *testing.T, ds models.Datastore, fnl models.LogStore) {
ctx := context.Background()
if ds == nil {
ds = SetupSQLiteDS(t)
}
call := SetupTestCall(t, ctx, ds)
c2 := *call
c3 := *call
now = time.Now().Add(100 * time.Millisecond)
c2.CreatedAt = strfmt.DateTime(now) // add ms cuz db uses it for sort
c2.ID = id.New().String()
now = time.Now().Add(200 * time.Millisecond)
c3.CreatedAt = strfmt.DateTime(now)
c3.ID = id.New().String()
err = fnl.InsertCall(ctx, &c2)
if err != nil {
t.Fatal(err)
}
err = fnl.InsertCall(ctx, &c3)
if err != nil {
t.Fatal(err)
}
// test that no filter works too
calls, err = fnl.GetCalls(ctx, &models.CallFilter{AppID: call.AppID, PerPage: 100})
if err != nil {
t.Fatalf("Test GetCalls(ctx, filter): unexpected error `%v`", err)
}
if len(calls) != 3 {
t.Fatalf("Test GetCalls(ctx, filter): unexpected length `%v`", len(calls))
}
// test that pagination stuff works. id, descending
filter.PerPage = 1
calls, err = fnl.GetCalls(ctx, filter)
if err != nil {
t.Fatalf("Test GetCalls(ctx, filter): unexpected error `%v`", err)
}
if len(calls) != 1 {
t.Fatalf("Test GetCalls(ctx, filter): unexpected length `%v`", len(calls))
} else if calls[0].ID != c3.ID {
t.Fatalf("Test GetCalls: call ids not in expected order: %v %v", calls[0].ID, c3.ID)
}
filter.PerPage = 100
filter.Cursor = calls[0].ID
calls, err = fnl.GetCalls(ctx, filter)
if err != nil {
t.Fatalf("Test GetCalls(ctx, filter): unexpected error `%v`", err)
}
if len(calls) != 2 {
t.Fatalf("Test GetCalls(ctx, filter): unexpected length `%v`", len(calls))
} else if calls[0].ID != c2.ID {
t.Fatalf("Test GetCalls: call ids not in expected order: %v %v", calls[0].ID, c2.ID)
} else if calls[1].ID != call.ID {
t.Fatalf("Test GetCalls: call ids not in expected order: %v %v", calls[1].ID, call.ID)
}
// test that filters actually applied
calls, err = fnl.GetCalls(ctx, &models.CallFilter{AppID: "wrongappname", PerPage: 100})
if err != nil {
t.Fatalf("Test GetCalls(ctx, filter): unexpected error `%v`", err)
}
if len(calls) != 0 {
t.Fatalf("Test GetCalls(ctx, filter): unexpected length `%v`", len(calls))
}
calls, err = fnl.GetCalls(ctx, &models.CallFilter{AppID: call.AppID, Path: "wrongpath", PerPage: 100})
if err != nil {
t.Fatalf("Test GetCalls(ctx, filter): unexpected error `%v`", err)
}
if len(calls) != 0 {
t.Fatalf("Test GetCalls(ctx, filter): unexpected length `%v`", len(calls))
}
// make sure from_time and to_time work
filter = &models.CallFilter{
PerPage: 100,
FromTime: call.CreatedAt,
ToTime: c3.CreatedAt,
AppID: call.AppID,
}
calls, err = fnl.GetCalls(ctx, filter)
if err != nil {
t.Fatalf("Test GetCalls(ctx, filter): unexpected error `%v`", err)
}
if len(calls) != 1 {
t.Fatalf("Test GetCalls(ctx, filter): unexpected length `%v`", len(calls))
} else if calls[0].ID != c2.ID {
t.Fatalf("Test GetCalls: call id not expected %s vs %s", calls[0].ID, c2.ID)
}
})
t.Run("call-log-insert-get", func(t *testing.T) {
call.ID = id.New().String()
@@ -86,10 +159,7 @@ func Test(t *testing.T, ds models.Datastore, fnl models.LogStore) {
if err != nil {
t.Fatalf("Test InsertLog(ctx, call.ID, logText): unexpected error during inserting log `%v`", err)
}
logEntry, err := fnl.GetLog(ctx, testApp.ID, call.ID)
if err != nil {
t.Fatalf("Test InsertLog(ctx, call.ID, logText): unexpected error during log get `%v`", err)
}
logEntry, err := fnl.GetLog(ctx, call.AppID, call.ID)
var b bytes.Buffer
io.Copy(&b, logEntry)
if !strings.Contains(b.String(), logText) {
@@ -105,4 +175,57 @@ func Test(t *testing.T, ds models.Datastore, fnl models.LogStore) {
t.Fatal("GetLog should return not found, but got:", err)
}
})
call = new(models.Call)
call.CreatedAt = strfmt.DateTime(time.Now())
call.Status = "error"
call.Error = "ya dun goofed"
call.StartedAt = strfmt.DateTime(time.Now())
call.CompletedAt = strfmt.DateTime(time.Now())
call.AppID = testApp.Name
call.Path = testRoute.Path
t.Run("call-insert", func(t *testing.T) {
call.ID = id.New().String()
err := fnl.InsertCall(ctx, call)
if err != nil {
t.Fatalf("Test InsertCall(ctx, &call): unexpected error `%v`", err)
}
})
t.Run("call-get", func(t *testing.T) {
call.ID = id.New().String()
err := fnl.InsertCall(ctx, call)
if err != nil {
t.Fatalf("Test GetCall: unexpected error `%v`", err)
}
newCall, err := fnl.GetCall(ctx, call.AppID, call.ID)
if err != nil {
t.Fatalf("Test GetCall: unexpected error `%v`", err)
}
if call.ID != newCall.ID {
t.Fatalf("Test GetCall: id mismatch `%v` `%v`", call.ID, newCall.ID)
}
if call.Status != newCall.Status {
t.Fatalf("Test GetCall: status mismatch `%v` `%v`", call.Status, newCall.Status)
}
if call.Error != newCall.Error {
t.Fatalf("Test GetCall: error mismatch `%v` `%v`", call.Error, newCall.Error)
}
if time.Time(call.CreatedAt).Unix() != time.Time(newCall.CreatedAt).Unix() {
t.Fatalf("Test GetCall: created_at mismatch `%v` `%v`", call.CreatedAt, newCall.CreatedAt)
}
if time.Time(call.StartedAt).Unix() != time.Time(newCall.StartedAt).Unix() {
t.Fatalf("Test GetCall: started_at mismatch `%v` `%v`", call.StartedAt, newCall.StartedAt)
}
if time.Time(call.CompletedAt).Unix() != time.Time(newCall.CompletedAt).Unix() {
t.Fatalf("Test GetCall: completed_at mismatch `%v` `%v`", call.CompletedAt, newCall.CompletedAt)
}
if call.AppID != newCall.AppID {
t.Fatalf("Test GetCall: app_name mismatch `%v` `%v`", call.AppID, newCall.AppID)
}
if call.Path != newCall.Path {
t.Fatalf("Test GetCall: path mismatch `%v` `%v`", call.Path, newCall.Path)
}
})
}