AI Telephony Agent
Make INBOUND and OUTBOUND calls with AI agents using VideoSDK. Supports multiple SIP providers and AI agents with a clean, extensible architecture for VoIP telephony solutions.
Installation
Prerequisites
- Python 3.11+
- VideoSDK account
- Twilio account (SIP trunking provider)
- Google API key (for Gemini AI)
Setup
- Clone the repository
git clone https://github.com/yourusername/ai-agent-telephony.git
cd ai-agent-telephony
- Install dependencies
pip install -r requirements.txt
- Configure environment variables
Create a
.envfile:
# VideoSDK Configuration
VIDEOSDK_AUTH_TOKEN=your_videosdk_token
VIDEOSDK_SIP_USERNAME=your_sip_username
VIDEOSDK_SIP_PASSWORD=your_sip_password
# AI Configuration
GOOGLE_API_KEY=your_google_api_key
# Twilio SIP Trunking Configuration
TWILIO_SID=your_twilio_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
TWILIO_NUMBER=your_twilio_number
- Run the server
python server.py
The server will start on http://localhost:8000
API Endpoints
Handle Inbound Calls (SIP User Agent Server)
POST /inbound-call
Handles incoming calls from your SIP provider. Expects Twilio webhook parameters, either host this server or use ngrok:
POST <server-url>/inbound-call
CallSid: Unique call identifierFrom: Caller's phone number (CLI - Calling Line Identification)To: Recipient's phone number (DID - Direct Inward Dialing)
Initiate Outbound Calls (SIP User Agent Client)
POST /outbound-call
Content-Type: application/json
{
"to_number": "+1234567890",
"initial_greeting": "Hello from AI Agent!"
}
Configure SIP Provider
POST /configure-provider?provider_name=twilio
Switch SIP providers at runtime (currently supports: twilio).
Adding New SIP Providers
The modular architecture makes it easy to add new SIP providers and SIP trunking services. Here's how to add a new provider:
1. Create Provider Implementation
Create providers/your_provider.py:
from typing import Dict, Any
from .base import SIPProvider
from config import Config
class YourProvider(SIPProvider):
def __init__(self):
self.client = self.create_client()
def create_client(self) -> Any:
return YourProviderClient(Config.YOUR_API_KEY)
def generate_twiml(self, sip_endpoint: str, **kwargs) -> str:
return f"<Response><Dial><Sip>{sip_endpoint}</Sip></Dial></Response>"
def initiate_outbound_call(self, to_number: str, twiml: str) -> Dict[str, Any]:
call = self.client.calls.create(
to=to_number,
from_=Config.YOUR_NUMBER,
twiml=twiml
)
return {
"call_sid": call.id,
"status": call.status,
"provider": "your_provider"
}
def get_provider_name(self) -> str:
return "your_provider"
2. Update Provider Factory
Add to providers/__init__.py:
from .your_provider import YourProvider
def get_provider(provider_name: str = "twilio") -> SIPProvider:
providers = {
"twilio": TwilioProvider,
"your_provider": YourProvider,
}
# ... rest of function
3. Add Configuration
Update config.py:
class Config:
YOUR_API_KEY = os.getenv("YOUR_API_KEY")
YOUR_NUMBER = os.getenv("YOUR_NUMBER")
@classmethod
def validate(cls) -> None:
required_vars = {
# ... existing vars
"YOUR_API_KEY": cls.YOUR_API_KEY,
"YOUR_NUMBER": cls.YOUR_NUMBER,
}
# ... rest of validation
Adding New AI Agents
Similarly, you can add new AI agents for intelligent call handling:
1. Create AI Agent Implementation
Create ai/your_ai_agent.py:
from typing import Dict, Any
from videosdk.agents import AgentSession, RealTimePipeline
from .base_agent import AIAgent
from voice_agent import VoiceAgent
from config import Config
class YourAIAgent(AIAgent):
def create_pipeline(self) -> RealTimePipeline:
model = YourAIModel(
api_key=Config.YOUR_AI_API_KEY,
model="your-model-name"
)
return RealTimePipeline(model=model)
def create_session(self, room_id: str, context: Dict[str, Any]) -> AgentSession:
pipeline = self.create_pipeline()
agent_context = {
"name": "Your AI Agent",
"meetingId": room_id,
"videosdk_auth": Config.VIDEOSDK_AUTH_TOKEN,
**context
}
session = AgentSession(
agent=VoiceAgent(context=agent_context),
pipeline=pipeline,
context=agent_context
)
return session
def get_agent_name(self) -> str:
return "your_ai_agent"
2. Update AI Agent Factory
Add to ai/__init__.py:
from .your_ai_agent import YourAIAgent
def get_ai_agent(agent_name: str = "gemini") -> AIAgent:
agents = {
"gemini": GeminiAgent,
"your_ai_agent": YourAIAgent,
}
# ... rest of function
Testing
Health Check
curl "http://localhost:8000/health"
Outbound Call Test (SIP UAC)
curl -X POST "http://localhost:8000/outbound-call" \
-H "Content-Type: application/json" \
-d '{"to_number": "+1234567890", "initial_greeting": "Hello from AI Agent!"}'
Switch SIP Provider
curl -X POST "http://localhost:8000/configure-provider?provider_name=twilio"
🔧 Configuration
Environment Variables
| Variable | Description | Required |
|---|---|---|
VIDEOSDK_AUTH_TOKEN |
VideoSDK authentication token | ✅ |
VIDEOSDK_SIP_USERNAME |
VideoSDK SIP username | ✅ |
VIDEOSDK_SIP_PASSWORD |
VideoSDK SIP password | ✅ |
GOOGLE_API_KEY |
Google API key for Gemini | ✅ |
TWILIO_SID |
Twilio account SID | ✅ |
TWILIO_AUTH_TOKEN |
Twilio auth token | ✅ |
TWILIO_NUMBER |
Twilio phone number | ✅ |
Provider-Specific Variables
For additional SIP providers, add their specific environment variables to config.py.
Features
- SIP/VoIP Integration: Pluggable SIP providers (Twilio, and more) with session initiation protocol support
- AI-Powered Voice Agents: Pluggable AI agents (Gemini, and more) for intelligent call handling
- Real-time Voice Communication: AI agents with real-time transport protocol (RTP) capabilities
- Modular Architecture: Clean separation of concerns for scalable telephony solutions
- Runtime Configuration: Switch SIP providers and AI agents without restart
- VideoSDK Integration: Seamless room creation and session management
- Call Control: Advanced call routing, forwarding, and transfer capabilities
- Codec Support: Multiple audio codecs for optimal voice quality
Use Cases
Customer Service (SIP-based)
- AI agents handle customer inquiries via VoIP
- 24/7 availability with SIP trunking
- Consistent service quality across PSTN and IP networks
Appointment Scheduling
- Automated appointment booking via SIP calls
- Reminder calls using SIP user agent client
- Rescheduling assistance with DTMF support
Surveys and Feedback
- Automated survey calls over SIP
- Customer feedback collection via VoIP
- Data collection with real-time transport protocol
Emergency Notifications
- Automated emergency alerts via SIP trunking
- Mass notification systems using PSTN integration
- Status updates through IP multimedia subsystem (IMS)
Architecture Benefits
- Separation of Concerns: Each component has a single responsibility
- Extensibility: Easy to add new SIP providers and AI agents
- Testability: Components can be tested in isolation
- Maintainability: Clear structure makes code easier to understand
- Reusability: Components can be reused across different projects
- Configuration Management: Centralized configuration with validation
- SIP Compliance: Full session initiation protocol support
- VoIP Integration: Seamless integration with voice over internet protocol
Roadmap
- Add support for multiple AI agents per session
- Implement SIP-specific features (SBC, registrar, proxy server)
- Add monitoring and metrics for SIP sessions
- Create provider-specific webhook handlers
- Add support for different voice codecs per AI agent
- Implement call recording and transcription
- Add sentiment analysis for call quality
- Create web dashboard for call management
- Support for H.323 protocol integration
- Advanced call control features (forwarding, transfer, queue)
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Guidelines
- Follow the existing code patterns
- Add proper error handling
- Include logging
- Update documentation
- Add tests if possible
License
This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ for the developer community
