fn: sync.WaitGroup replacement common.WaitGroup (#937)

* fn: sync.WaitGroup replacement common.WaitGroup

agent/lb_agent/pure_runner has been incorrectly using
sync.WaitGroup semantics. Switching these components to
use the new common.WaitGroup() that provides a few handy
functionality for common graceful shutdown cases.

From https://golang.org/pkg/sync/#WaitGroup,
    "Note that calls with a positive delta that occur when the counter
     is zero must happen before a Wait. Calls with a negative delta,
     or calls with a positive delta that start when the counter is
     greater than zero, may happen at any time. Typically this means
     the calls to Add should execute before the statement creating
     the goroutine or other event to be waited for. If a WaitGroup
     is reused to wait for several independent sets of events,
     new Add calls must happen after all previous Wait calls have
     returned."

HandleCallEnd introduces some complexity to the shutdowns, but this
is currently handled by AddSession(2) initially and letting the
HandleCallEnd() when to decrement by -1 in addition to decrement -1 in
Submit().

lb_agent shutdown sequence and particularly timeouts with runner pool
needs another look/revision, but this is outside of the scope of this
commit.

* fn: lb-agent wg share

* fn: no need to +2 in Submit with defer.

Removed defer since handleCallEnd already has
this responsibility.
This commit is contained in:
Tolga Ceylan
2018-04-12 11:33:01 -07:00
committed by GitHub
parent f350b2ca48
commit e53d23afc9
7 changed files with 298 additions and 99 deletions

View File

@@ -3,22 +3,26 @@ package agent
import (
"context"
"encoding/json"
"errors"
"io"
"sync"
"time"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials"
pb "github.com/fnproject/fn/api/agent/grpc"
"github.com/fnproject/fn/api/common"
pool "github.com/fnproject/fn/api/runnerpool"
"github.com/fnproject/fn/grpcutil"
"github.com/sirupsen/logrus"
)
var (
ErrorRunnerClosed = errors.New("Runner is closed")
)
type gRPCRunner struct {
// Need a WaitGroup of TryExec in flight
wg sync.WaitGroup
shutWg *common.WaitGroup
address string
conn *grpc.ClientConn
client pb.RunnerProtocolClient
@@ -31,6 +35,7 @@ func SecureGRPCRunnerFactory(addr, runnerCertCN string, pki *pool.PKIData) (pool
}
return &gRPCRunner{
shutWg: common.NewWaitGroup(),
address: addr,
conn: conn,
client: client,
@@ -43,7 +48,7 @@ func (r *gRPCRunner) Close(ctx context.Context) error {
err := make(chan error, 1)
go func() {
defer close(err)
r.wg.Wait()
r.shutWg.CloseGroup()
err <- r.conn.Close()
}()
@@ -86,8 +91,10 @@ func (r *gRPCRunner) Address() string {
func (r *gRPCRunner) TryExec(ctx context.Context, call pool.RunnerCall) (bool, error) {
logrus.WithField("runner_addr", r.address).Debug("Attempting to place call")
r.wg.Add(1)
defer r.wg.Done()
if !r.shutWg.AddSession(1) {
return true, ErrorRunnerClosed
}
defer r.shutWg.AddSession(-1)
// extract the call's model data to pass on to the pure runner
modelJSON, err := json.Marshal(call.Model())