Commit Graph

14 Commits

Author SHA1 Message Date
Reed Allman
e13a6fd029 death to format (#1281)
* get rid of old format stuff, utils usage, fix up for fdk2.0 interface

* pure agent format removal, TODO remove format field, fix up all tests

* shitter's clogged

* fix agent tests

* start rolling through server tests

* tests compile, some failures

* remove json / content type detection on invoke/httptrigger, fix up tests

* remove hello, fixup system tests

the fucking status checker test just hangs and it's testing that it doesn't
work so the test passes but the test doesn't pass fuck life it's not worth it

* fix migration

* meh

* make dbhelper shut up about dbhelpers not being used

* move fail status at least into main thread, jfc

* fix status call to have FN_LISTENER

also turns off the stdout/stderr blocking between calls, because it's
impossible to debug without that (without syslog), now that stdout and stderr
go to the same place (either to host stderr or nowhere) and isn't used for
function output this shouldn't be a big fuss really

* remove stdin

* cleanup/remind: fixed bug where watcher would leak if container dies first

* silence system-test logs until fail, fix datastore tests

postgres does weird things with constraints when renaming tables, took the
easy way out

system-tests were loud as fuck and made you download a circleci text file of
the logs, made them only yell when they goof

* fix fdk-go dep for test image. fun

* fix swagger and remove test about format

* update all the gopkg files

* add back FN_FORMAT for fdks that assert things. pfft

* add useful error for functions that exit

this error is really confounding because containers can exit for all manner of
reason, we're just guessing that this is the most likely cause for now, and
this error message should very likely change or be removed from the client
path anyway (context.Canceled wasn't all that useful either, but anyway, I'd
been hunting for this... so found it). added a test to avoid being publicly
shamed for 1 line commits (beware...).
2018-10-26 10:43:04 -07:00
Tolga Ceylan
b0c93dbd82 fn: new agent resource tracker metrics (#1215)
New metrics for agent resource tracker: CpuUsed, CpuAvail,
MemUsed, MemAvail.
2018-09-17 10:31:17 -07:00
Tom Coupland
d56a49b321 Remove V1 endpoints and Routes (#1210)
Largely a removal job, however many tests, particularly system level
ones relied on Routes. These have been migrated to use Fns.

* Add 410 response to swagger
* No app names in log tags
* Adding constraint in GetCall for FnID
* Adding test to check FnID is required on call
* Add fn_id to call selector
* Fix text in docker mem warning
* Correct buildConfig func name
* Test fix up
* Removing CPU setting from Agent test

CPU setting has been deprecated, but the code base is still riddled
with it. This just removes it from this layer. Really we need to
remove it from Call.

* Remove fn id check on calls
* Reintroduce fn id required on call
* Adding fnID to calls for execute test
* Correct setting of app id in middleware
* Removes root middlewares ability to redirect fun invocations
* Add over sized test check
* Removing call fn id check
2018-09-17 16:44:51 +01:00
Denis Makogon
3c15ca6ea6 App ID (#641)
* App ID

* Clean-up

* Use ID or name to reference apps

* Can use app by name or ID

* Get rid of AppName for routes API and model

 routes API is completely backwards-compatible
 routes API accepts both app ID and name

* Get rid of AppName from calls API and model

* Fixing tests

* Get rid of AppName from logs API and model

* Restrict API to work with app names only

* Addressing review comments

* Fix for hybrid mode

* Fix rebase problems

* Addressing review comments

* Addressing review comments pt.2

* Fixing test issue

* Addressing review comments pt.3

* Updated docstring

* Adjust UpdateApp SQL implementation to work with app IDs instead of names

* Fixing tests

* fmt after rebase

* Make tests green again!

* Use GetAppByID wherever it is necessary

 - adding new v2 endpoints to keep hybrid api/runner mode working
 - extract CallBase from Call object to expose that to a user
   (it doesn't include any app reference, as we do for all other API objects)

* Get rid of GetAppByName

* Adjusting server router setup

* Make hybrid work again

* Fix datastore tests

* Fixing tests

* Do not ignore app_id

* Resolve issues after rebase

* Updating test to make it work as it was

* Tabula rasa for migrations

* Adding calls API test

 - we need to ensure we give "App not found" for the missing app and missing call in first place
 - making previous test work (request missing call for the existing app)

* Make datastore tests work fine with correctly applied migrations

* Make CallFunction middleware work again

 had to adjust its implementation to set app ID before proceeding

* The biggest rebase ever made

* Fix 8's migration

* Fix tests

* Fix hybrid client

* Fix tests problem

* Increment app ID migration version

* Fixing TestAppUpdate

* Fix rebase issues

* Addressing review comments

* Renew vendor

* Updated swagger doc per recommendations
2018-03-26 11:19:36 -07:00
Reed Allman
c0df9496a7 reduce allocs in getSlotQueueKey (#778)
this somewhat minimally comes up in profiling, but it was an itch i needed to
scratch. this does 10x less allocations and is 3x faster (with 3x less bytes),
and they're the small painful kind of allocation. we're only reading these
strings so the uses of unsafe are fine (I think audit me).  the byte array
we're casting to a string at the end is also heap allocated and does
escape. I only count 2 allocations, but there's 3 (`hash.Sum` and
`make([]string)`), using a pool of sha1 hash.Hash shaves 120 byte and an alloc
off so seems worth it (it's minimal). if we set a max size of config vals with
a constant we could avoid that allocation and we could probably find a
checksum package that doesn't use the `hash.Hash` that would speed things up a
little (no dynamic dispatch, doesn't allocate in Sum) but there's not one I
know of in stdlib.

master:
```
✗: go test -run=yodawg -bench . -benchmem -benchtime 1s -cpuprofile cpu.out
goos: linux
goarch: amd64
pkg: github.com/fnproject/fn/api/agent
BenchmarkSlotKey          200000              6068 ns/op             696 B/op         31 allocs/op
PASS
ok      github.com/fnproject/fn/api/agent       1.454s
```

now:
```
✗: go test -run=yodawg -bench . -benchmem -benchtime 1s -cpuprofile cpu.out
goos: linux
goarch: amd64
pkg: github.com/fnproject/fn/api/agent
BenchmarkSlotKey         1000000              1901 ns/op             168 B/op          3 allocs/op
PASS
ok      github.com/fnproject/fn/api/agent       2.092s
```

once we have versioned apps/routes we don't need to build a sha or sort
configs so this will get a lot faster.

anyway, mostly funsies here... my life is that sad now.
2018-02-16 11:39:10 -08:00
Tolga Ceylan
c848fc6181 fn: hot container timer improvements (#751)
* fn: hot container timer improvements

With this change, now we are allocating the timers
when the container starts and managing them via
stop/clear as needed, which should not only be more
efficient, but also easier to follow.

For example, previously, if eject time out was
set to 10 secs, this could have delayed idle timeout
up to 10 secs as well. It is also not necessary to do
any math for elapsed time.

Now consumers avoid any requeuing when startDequeuer() is cancelled.
This was triggering additional dequeue/requeue causing
containers to wake up spuriously. Also in startDequeuer(),
we no longer remove the item from the actual queue and
leave this to acquire/eject, which side steps issues related
with item landing in the channel, not consumed, etc.
2018-02-12 14:12:03 -08:00
Reed Allman
27179ddf54 plumb ctx for container removal spanno (#750)
these were just dangling off on the side, took some plumbing work but not so
bad
2018-02-08 22:48:23 -08:00
Tolga Ceylan
97d78c584b fn: better slot/container/request state tracking (#719)
* fn: better slot/container/request state tracking
2018-01-26 12:21:11 -08:00
Tolga Ceylan
8c31e47c01 fn: agent slot improvements (#704)
*) Stopped using latency previous/current stats, this
was not working as expected. Fresh starts usually have
these stats zero for a long time, and initial samples
are high due to downloads, caches, etc.

*) New state to track: containers that are idle. In other
words, containers that have an unused token in the slot
queue.

*) Removed latency counts since these are not used in
container start decision anymore. Simplifies logs.

*) isNewContainerNeeded() simplified to use idle count
to estimate effective waiters. Removed speculative
latency based logic and progress check comparison.
In agent, waitHot() delayed signalling compansates
for these changes. If the estimation may fail, but
this should correct itself in the next 200 msec
signal.
2018-01-19 12:35:52 -08:00
Tolga Ceylan
2f0de2b574 fn: resource and slot cancel and broadcast improvements (#696)
* fn: resource and slot cancel and broadcast improvements

*) Context argument does not wake up the waiters correctly upon
cancellation/timeout.
*) Avoid unnecessary broadcasts in slot and resource.

* fn: limit scope of context in resource/slot calls in agent
2018-01-18 13:43:56 -08:00
Tolga Ceylan
5a7778a656 fn: cancellations in WaitAsyncResource (#694)
* fn: cancellations in WaitAsyncResource

Added go context with cancel to wait async resource. Although
today, the only case for cancellation is shutdown, this cleans
up agent shutdown a little bit.

* fn: locked broadcast to avoid missed wake-ups

* fn: removed ctx arg to WaitAsyncResource and startDequeuer

This is confusing and unnecessary.
2018-01-17 16:08:54 -08:00
Tolga Ceylan
1c8029e4f1 fn: more tests for hot container launch logic (#678) 2018-01-11 16:00:37 -08:00
Tolga Ceylan
db159e595f fn: new container lauch adjustments (#677)
*) revert executor wait queue size comparison. This is too
   aggresive and with stall check below, now unnecessary.
*) new container logic now checks if stats are constant, if
   this is the case, then we assume the system is stalled (eg
   running functions that take long time), this means we need
   to make progress and spin up a new container.
2018-01-11 14:09:21 -08:00
Tolga Ceylan
14789aba41 Slot mgr fixes (#613)
*) during shutdown, errors should be 503
*) new inactivity time out for hot queue, we previously kept hot queues in memory forever.
*) each hot queue now has a hot launcher to monitor and launch hot containers
*) consumers now create a consumer channel with startDequeuer() that can be cancelled via context
*) consumers now ping (signal) hot launcher every 200 msecs until they get a slot
*) tests for slot queue & mgr
2018-01-04 11:34:43 -08:00