Files
ai-telephony-demo-agent/README.md
2025-06-18 12:04:43 +05:30

10 KiB

AI Telephony Agent

Make INBOUND and OUTBOUND calls with AI agents using VideoSDK. Supports multiple SIP providers and AI agents with a clean, extensible architecture for VoIP telephony solutions.

Architecture : Connecting Voice Agent to Telephony Agent

Documentation Video Tutorials Get Started Discord Community PyPI Package

Installation

Prerequisites

  • Python 3.11+
  • VideoSDK account
  • Twilio account (SIP trunking provider)
  • Google API key (for Gemini AI)

Setup

  1. Clone the repository
git clone https://github.com/yourusername/ai-agent-telephony.git
cd ai-agent-telephony
  1. Install dependencies
pip install -r requirements.txt
  1. Configure environment variables Create a .env file:
# VideoSDK Configuration
VIDEOSDK_AUTH_TOKEN=your_videosdk_token
VIDEOSDK_SIP_USERNAME=your_sip_username
VIDEOSDK_SIP_PASSWORD=your_sip_password

# AI Configuration
GOOGLE_API_KEY=your_google_api_key

# Twilio SIP Trunking Configuration
TWILIO_SID=your_twilio_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
TWILIO_NUMBER=your_twilio_number
  1. Run the server
python server.py

The server will start on http://localhost:8000

API Endpoints

Handle Inbound Calls (SIP User Agent Server)

POST /inbound-call

Handles incoming calls from your SIP provider. Expects Twilio webhook parameters, either host this server or use ngrok:

POST <server-url>/inbound-call
  • CallSid: Unique call identifier
  • From: Caller's phone number (CLI - Calling Line Identification)
  • To: Recipient's phone number (DID - Direct Inward Dialing)

Initiate Outbound Calls (SIP User Agent Client)

POST /outbound-call
Content-Type: application/json

{
  "to_number": "+1234567890",
  "initial_greeting": "Hello from AI Agent!"
}

Configure SIP Provider

POST /configure-provider?provider_name=twilio

Switch SIP providers at runtime (currently supports: twilio).

Adding New SIP Providers

The modular architecture makes it easy to add new SIP providers and SIP trunking services. Here's how to add a new provider:

1. Create Provider Implementation

Create providers/your_provider.py:

from typing import Dict, Any
from .base import SIPProvider
from config import Config

class YourProvider(SIPProvider):
    def __init__(self):
        self.client = self.create_client()

    def create_client(self) -> Any:
        return YourProviderClient(Config.YOUR_API_KEY)

    def generate_twiml(self, sip_endpoint: str, **kwargs) -> str:
        return f"<Response><Dial><Sip>{sip_endpoint}</Sip></Dial></Response>"

    def initiate_outbound_call(self, to_number: str, twiml: str) -> Dict[str, Any]:
        call = self.client.calls.create(
            to=to_number,
            from_=Config.YOUR_NUMBER,
            twiml=twiml
        )
        return {
            "call_sid": call.id,
            "status": call.status,
            "provider": "your_provider"
        }

    def get_provider_name(self) -> str:
        return "your_provider"

2. Update Provider Factory

Add to providers/__init__.py:

from .your_provider import YourProvider

def get_provider(provider_name: str = "twilio") -> SIPProvider:
    providers = {
        "twilio": TwilioProvider,
        "your_provider": YourProvider,
    }
    # ... rest of function

3. Add Configuration

Update config.py:

class Config:
    YOUR_API_KEY = os.getenv("YOUR_API_KEY")
    YOUR_NUMBER = os.getenv("YOUR_NUMBER")

    @classmethod
    def validate(cls) -> None:
        required_vars = {
            # ... existing vars
            "YOUR_API_KEY": cls.YOUR_API_KEY,
            "YOUR_NUMBER": cls.YOUR_NUMBER,
        }
        # ... rest of validation

Adding New AI Agents

Similarly, you can add new AI agents for intelligent call handling:

1. Create AI Agent Implementation

Create ai/your_ai_agent.py:

from typing import Dict, Any
from videosdk.agents import AgentSession, RealTimePipeline
from .base_agent import AIAgent
from voice_agent import VoiceAgent
from config import Config

class YourAIAgent(AIAgent):
    def create_pipeline(self) -> RealTimePipeline:
        model = YourAIModel(
            api_key=Config.YOUR_AI_API_KEY,
            model="your-model-name"
        )
        return RealTimePipeline(model=model)

    def create_session(self, room_id: str, context: Dict[str, Any]) -> AgentSession:
        pipeline = self.create_pipeline()
        agent_context = {
            "name": "Your AI Agent",
            "meetingId": room_id,
            "videosdk_auth": Config.VIDEOSDK_AUTH_TOKEN,
            **context
        }

        session = AgentSession(
            agent=VoiceAgent(context=agent_context),
            pipeline=pipeline,
            context=agent_context
        )
        return session

    def get_agent_name(self) -> str:
        return "your_ai_agent"

2. Update AI Agent Factory

Add to ai/__init__.py:

from .your_ai_agent import YourAIAgent

def get_ai_agent(agent_name: str = "gemini") -> AIAgent:
    agents = {
        "gemini": GeminiAgent,
        "your_ai_agent": YourAIAgent,
    }
    # ... rest of function

Testing

Health Check

curl "http://localhost:8000/health"

Outbound Call Test (SIP UAC)

curl -X POST "http://localhost:8000/outbound-call" \
  -H "Content-Type: application/json" \
  -d '{"to_number": "+1234567890", "initial_greeting": "Hello from AI Agent!"}'

Switch SIP Provider

curl -X POST "http://localhost:8000/configure-provider?provider_name=twilio"

🔧 Configuration

Environment Variables

Variable Description Required
VIDEOSDK_AUTH_TOKEN VideoSDK authentication token
VIDEOSDK_SIP_USERNAME VideoSDK SIP username
VIDEOSDK_SIP_PASSWORD VideoSDK SIP password
GOOGLE_API_KEY Google API key for Gemini
TWILIO_SID Twilio account SID
TWILIO_AUTH_TOKEN Twilio auth token
TWILIO_NUMBER Twilio phone number

Provider-Specific Variables

For additional SIP providers, add their specific environment variables to config.py.

Features

  • SIP/VoIP Integration: Pluggable SIP providers (Twilio, and more) with session initiation protocol support
  • AI-Powered Voice Agents: Pluggable AI agents (Gemini, and more) for intelligent call handling
  • Real-time Voice Communication: AI agents with real-time transport protocol (RTP) capabilities
  • Modular Architecture: Clean separation of concerns for scalable telephony solutions
  • Runtime Configuration: Switch SIP providers and AI agents without restart
  • VideoSDK Integration: Seamless room creation and session management
  • Call Control: Advanced call routing, forwarding, and transfer capabilities
  • Codec Support: Multiple audio codecs for optimal voice quality

Use Cases

Customer Service (SIP-based)

  • AI agents handle customer inquiries via VoIP
  • 24/7 availability with SIP trunking
  • Consistent service quality across PSTN and IP networks

Appointment Scheduling

  • Automated appointment booking via SIP calls
  • Reminder calls using SIP user agent client
  • Rescheduling assistance with DTMF support

Surveys and Feedback

  • Automated survey calls over SIP
  • Customer feedback collection via VoIP
  • Data collection with real-time transport protocol

Emergency Notifications

  • Automated emergency alerts via SIP trunking
  • Mass notification systems using PSTN integration
  • Status updates through IP multimedia subsystem (IMS)

Architecture Benefits

  1. Separation of Concerns: Each component has a single responsibility
  2. Extensibility: Easy to add new SIP providers and AI agents
  3. Testability: Components can be tested in isolation
  4. Maintainability: Clear structure makes code easier to understand
  5. Reusability: Components can be reused across different projects
  6. Configuration Management: Centralized configuration with validation
  7. SIP Compliance: Full session initiation protocol support
  8. VoIP Integration: Seamless integration with voice over internet protocol

Roadmap

  • Add support for multiple AI agents per session
  • Implement SIP-specific features (SBC, registrar, proxy server)
  • Add monitoring and metrics for SIP sessions
  • Create provider-specific webhook handlers
  • Add support for different voice codecs per AI agent
  • Implement call recording and transcription
  • Add sentiment analysis for call quality
  • Create web dashboard for call management
  • Support for H.323 protocol integration
  • Advanced call control features (forwarding, transfer, queue)

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Guidelines

  • Follow the existing code patterns
  • Add proper error handling
  • Include logging
  • Update documentation
  • Add tests if possible

License

This project is licensed under the MIT License - see the LICENSE file for details.

Made with ❤️ for the developer community