# AI Telephony Agent
Make INBOUND and OUTBOUND calls with AI agents using VideoSDK. Supports multiple SIP providers and AI agents with a clean, extensible architecture for VoIP telephony solutions.
![Architecture : Connecting Voice Agent to Telephony Agent](https://assets.videosdk.live/images/sip-telephony-agent.png) Documentation Video Tutorials Get Started Discord Community PyPI Package
## Installation ### Prerequisites - Python 3.11+ - VideoSDK account - Twilio account (SIP trunking provider) - Google API key (for Gemini AI) ### Setup 1. **Clone the repository** ```bash git clone https://github.com/yourusername/ai-agent-telephony.git cd ai-agent-telephony ``` 2. **Install dependencies** ```bash pip install -r requirements.txt ``` 3. **Configure environment variables** Create a `.env` file: ```env # VideoSDK Configuration VIDEOSDK_AUTH_TOKEN=your_videosdk_token VIDEOSDK_SIP_USERNAME=your_sip_username VIDEOSDK_SIP_PASSWORD=your_sip_password # AI Configuration GOOGLE_API_KEY=your_google_api_key # Twilio SIP Trunking Configuration TWILIO_SID=your_twilio_sid TWILIO_AUTH_TOKEN=your_twilio_auth_token TWILIO_NUMBER=your_twilio_number ``` 4. **Run the server** ```bash python server.py ``` The server will start on `http://localhost:8000` ## API Endpoints ### Handle Inbound Calls (SIP User Agent Server) ```bash POST /inbound-call ``` Handles incoming calls from your SIP provider. Expects Twilio webhook parameters, either host this server or use `ngrok`: ```bash POST /inbound-call ``` - `CallSid`: Unique call identifier - `From`: Caller's phone number (CLI - Calling Line Identification) - `To`: Recipient's phone number (DID - Direct Inward Dialing) ### Initiate Outbound Calls (SIP User Agent Client) ```bash POST /outbound-call Content-Type: application/json { "to_number": "+1234567890", "initial_greeting": "Hello from AI Agent!" } ``` ### Configure SIP Provider ```bash POST /configure-provider?provider_name=twilio ``` Switch SIP providers at runtime (currently supports: `twilio`). ## Adding New SIP Providers The modular architecture makes it easy to add new SIP providers and SIP trunking services. Here's how to add a new provider: ### 1. Create Provider Implementation Create `providers/your_provider.py`: ```python from typing import Dict, Any from .base import SIPProvider from config import Config class YourProvider(SIPProvider): def __init__(self): self.client = self.create_client() def create_client(self) -> Any: return YourProviderClient(Config.YOUR_API_KEY) def generate_twiml(self, sip_endpoint: str, **kwargs) -> str: return f"{sip_endpoint}" def initiate_outbound_call(self, to_number: str, twiml: str) -> Dict[str, Any]: call = self.client.calls.create( to=to_number, from_=Config.YOUR_NUMBER, twiml=twiml ) return { "call_sid": call.id, "status": call.status, "provider": "your_provider" } def get_provider_name(self) -> str: return "your_provider" ``` ### 2. Update Provider Factory Add to `providers/__init__.py`: ```python from .your_provider import YourProvider def get_provider(provider_name: str = "twilio") -> SIPProvider: providers = { "twilio": TwilioProvider, "your_provider": YourProvider, } # ... rest of function ``` ### 3. Add Configuration Update `config.py`: ```python class Config: YOUR_API_KEY = os.getenv("YOUR_API_KEY") YOUR_NUMBER = os.getenv("YOUR_NUMBER") @classmethod def validate(cls) -> None: required_vars = { # ... existing vars "YOUR_API_KEY": cls.YOUR_API_KEY, "YOUR_NUMBER": cls.YOUR_NUMBER, } # ... rest of validation ``` ## Adding New AI Agents Similarly, you can add new AI agents for intelligent call handling: ### 1. Create AI Agent Implementation Create `ai/your_ai_agent.py`: ```python from typing import Dict, Any from videosdk.agents import AgentSession, RealTimePipeline from .base_agent import AIAgent from voice_agent import VoiceAgent from config import Config class YourAIAgent(AIAgent): def create_pipeline(self) -> RealTimePipeline: model = YourAIModel( api_key=Config.YOUR_AI_API_KEY, model="your-model-name" ) return RealTimePipeline(model=model) def create_session(self, room_id: str, context: Dict[str, Any]) -> AgentSession: pipeline = self.create_pipeline() agent_context = { "name": "Your AI Agent", "meetingId": room_id, "videosdk_auth": Config.VIDEOSDK_AUTH_TOKEN, **context } session = AgentSession( agent=VoiceAgent(context=agent_context), pipeline=pipeline, context=agent_context ) return session def get_agent_name(self) -> str: return "your_ai_agent" ``` ### 2. Update AI Agent Factory Add to `ai/__init__.py`: ```python from .your_ai_agent import YourAIAgent def get_ai_agent(agent_name: str = "gemini") -> AIAgent: agents = { "gemini": GeminiAgent, "your_ai_agent": YourAIAgent, } # ... rest of function ``` ## Testing ### Health Check ```bash curl "http://localhost:8000/health" ``` ### Outbound Call Test (SIP UAC) ```bash curl -X POST "http://localhost:8000/outbound-call" \ -H "Content-Type: application/json" \ -d '{"to_number": "+1234567890", "initial_greeting": "Hello from AI Agent!"}' ``` ### Switch SIP Provider ```bash curl -X POST "http://localhost:8000/configure-provider?provider_name=twilio" ``` ## 🔧 Configuration ### Environment Variables | Variable | Description | Required | | ----------------------- | ----------------------------- | -------- | | `VIDEOSDK_AUTH_TOKEN` | VideoSDK authentication token | ✅ | | `VIDEOSDK_SIP_USERNAME` | VideoSDK SIP username | ✅ | | `VIDEOSDK_SIP_PASSWORD` | VideoSDK SIP password | ✅ | | `GOOGLE_API_KEY` | Google API key for Gemini | ✅ | | `TWILIO_SID` | Twilio account SID | ✅ | | `TWILIO_AUTH_TOKEN` | Twilio auth token | ✅ | | `TWILIO_NUMBER` | Twilio phone number | ✅ | ### Provider-Specific Variables For additional SIP providers, add their specific environment variables to `config.py`. ## Features - **SIP/VoIP Integration**: Pluggable SIP providers (Twilio, and more) with session initiation protocol support - **AI-Powered Voice Agents**: Pluggable AI agents (Gemini, and more) for intelligent call handling - **Real-time Voice Communication**: AI agents with real-time transport protocol (RTP) capabilities - **Modular Architecture**: Clean separation of concerns for scalable telephony solutions - **Runtime Configuration**: Switch SIP providers and AI agents without restart - **VideoSDK Integration**: Seamless room creation and session management - **Call Control**: Advanced call routing, forwarding, and transfer capabilities - **Codec Support**: Multiple audio codecs for optimal voice quality ## Use Cases ### Customer Service (SIP-based) - AI agents handle customer inquiries via VoIP - 24/7 availability with SIP trunking - Consistent service quality across PSTN and IP networks ### Appointment Scheduling - Automated appointment booking via SIP calls - Reminder calls using SIP user agent client - Rescheduling assistance with DTMF support ### Surveys and Feedback - Automated survey calls over SIP - Customer feedback collection via VoIP - Data collection with real-time transport protocol ### Emergency Notifications - Automated emergency alerts via SIP trunking - Mass notification systems using PSTN integration - Status updates through IP multimedia subsystem (IMS) ## Architecture Benefits 1. **Separation of Concerns**: Each component has a single responsibility 2. **Extensibility**: Easy to add new SIP providers and AI agents 3. **Testability**: Components can be tested in isolation 4. **Maintainability**: Clear structure makes code easier to understand 5. **Reusability**: Components can be reused across different projects 6. **Configuration Management**: Centralized configuration with validation 7. **SIP Compliance**: Full session initiation protocol support 8. **VoIP Integration**: Seamless integration with voice over internet protocol ## Roadmap - [ ] Add support for multiple AI agents per session - [ ] Implement SIP-specific features (SBC, registrar, proxy server) - [ ] Add monitoring and metrics for SIP sessions - [ ] Create provider-specific webhook handlers - [ ] Add support for different voice codecs per AI agent - [ ] Implement call recording and transcription - [ ] Add sentiment analysis for call quality - [ ] Create web dashboard for call management - [ ] Support for H.323 protocol integration - [ ] Advanced call control features (forwarding, transfer, queue) ## 🤝 Contributing 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ### Guidelines - Follow the existing code patterns - Add proper error handling - Include logging - Update documentation - Add tests if possible ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. **Made with ❤️ for the developer community**