Files
ai-telephony-demo-agent/README.md
2025-06-18 12:04:43 +05:30

365 lines
10 KiB
Markdown

<div align="left">
# AI Telephony Agent
<div align="left" style="margin:0px 12px;">
Make INBOUND and OUTBOUND calls with AI agents using VideoSDK. Supports multiple SIP providers and AI agents with a clean, extensible architecture for VoIP telephony solutions.
</div>
<div align="center">
![Architecture : Connecting Voice Agent to Telephony Agent](https://assets.videosdk.live/images/sip-telephony-agent.png)
<a href="https://docs.videosdk.live/ai_agents/introduction" target="_blank"><img src="https://img.shields.io/badge/_Documentation-4285F4?style=for-the-badge" alt="Documentation"></a>
<a href="https://www.youtube.com/playlist?list=PLrujdOR6BS_1fMqsHd9tynAg0foSRX5ti" target="_blank"><img src="https://img.shields.io/badge/_Tutorials-FF0000?style=for-the-badge&logo=youtube&logoColor=white" alt="Video Tutorials"></a>
<a href="https://dub.sh/o59dJJB" target="_blank"><img src="https://img.shields.io/badge/_Get_Started-4285F4?style=for-the-badge" alt="Get Started"></a>
<a href="https://discord.gg/f2WsNDN9S5" target="_blank"><img src="https://img.shields.io/badge/_Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Discord Community"></a>
<a href="https://pypi.org/project/videosdk-agents/" target="_blank"><img src="https://img.shields.io/badge/_pip_install-3776AB?style=for-the-badge&logo=python&logoColor=white" alt="PyPI Package"></a>
</div>
</div>
## Installation
### Prerequisites
- Python 3.11+
- VideoSDK account
- Twilio account (SIP trunking provider)
- Google API key (for Gemini AI)
### Setup
1. **Clone the repository**
```bash
git clone https://github.com/yourusername/ai-agent-telephony.git
cd ai-agent-telephony
```
2. **Install dependencies**
```bash
pip install -r requirements.txt
```
3. **Configure environment variables**
Create a `.env` file:
```env
# VideoSDK Configuration
VIDEOSDK_AUTH_TOKEN=your_videosdk_token
VIDEOSDK_SIP_USERNAME=your_sip_username
VIDEOSDK_SIP_PASSWORD=your_sip_password
# AI Configuration
GOOGLE_API_KEY=your_google_api_key
# Twilio SIP Trunking Configuration
TWILIO_SID=your_twilio_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
TWILIO_NUMBER=your_twilio_number
```
4. **Run the server**
```bash
python server.py
```
The server will start on `http://localhost:8000`
## API Endpoints
### Handle Inbound Calls (SIP User Agent Server)
```bash
POST /inbound-call
```
Handles incoming calls from your SIP provider. Expects Twilio webhook parameters, either host this server or use `ngrok`:
```bash
POST <server-url>/inbound-call
```
- `CallSid`: Unique call identifier
- `From`: Caller's phone number (CLI - Calling Line Identification)
- `To`: Recipient's phone number (DID - Direct Inward Dialing)
### Initiate Outbound Calls (SIP User Agent Client)
```bash
POST /outbound-call
Content-Type: application/json
{
"to_number": "+1234567890",
"initial_greeting": "Hello from AI Agent!"
}
```
### Configure SIP Provider
```bash
POST /configure-provider?provider_name=twilio
```
Switch SIP providers at runtime (currently supports: `twilio`).
## Adding New SIP Providers
The modular architecture makes it easy to add new SIP providers and SIP trunking services. Here's how to add a new provider:
### 1. Create Provider Implementation
Create `providers/your_provider.py`:
```python
from typing import Dict, Any
from .base import SIPProvider
from config import Config
class YourProvider(SIPProvider):
def __init__(self):
self.client = self.create_client()
def create_client(self) -> Any:
return YourProviderClient(Config.YOUR_API_KEY)
def generate_twiml(self, sip_endpoint: str, **kwargs) -> str:
return f"<Response><Dial><Sip>{sip_endpoint}</Sip></Dial></Response>"
def initiate_outbound_call(self, to_number: str, twiml: str) -> Dict[str, Any]:
call = self.client.calls.create(
to=to_number,
from_=Config.YOUR_NUMBER,
twiml=twiml
)
return {
"call_sid": call.id,
"status": call.status,
"provider": "your_provider"
}
def get_provider_name(self) -> str:
return "your_provider"
```
### 2. Update Provider Factory
Add to `providers/__init__.py`:
```python
from .your_provider import YourProvider
def get_provider(provider_name: str = "twilio") -> SIPProvider:
providers = {
"twilio": TwilioProvider,
"your_provider": YourProvider,
}
# ... rest of function
```
### 3. Add Configuration
Update `config.py`:
```python
class Config:
YOUR_API_KEY = os.getenv("YOUR_API_KEY")
YOUR_NUMBER = os.getenv("YOUR_NUMBER")
@classmethod
def validate(cls) -> None:
required_vars = {
# ... existing vars
"YOUR_API_KEY": cls.YOUR_API_KEY,
"YOUR_NUMBER": cls.YOUR_NUMBER,
}
# ... rest of validation
```
## Adding New AI Agents
Similarly, you can add new AI agents for intelligent call handling:
### 1. Create AI Agent Implementation
Create `ai/your_ai_agent.py`:
```python
from typing import Dict, Any
from videosdk.agents import AgentSession, RealTimePipeline
from .base_agent import AIAgent
from voice_agent import VoiceAgent
from config import Config
class YourAIAgent(AIAgent):
def create_pipeline(self) -> RealTimePipeline:
model = YourAIModel(
api_key=Config.YOUR_AI_API_KEY,
model="your-model-name"
)
return RealTimePipeline(model=model)
def create_session(self, room_id: str, context: Dict[str, Any]) -> AgentSession:
pipeline = self.create_pipeline()
agent_context = {
"name": "Your AI Agent",
"meetingId": room_id,
"videosdk_auth": Config.VIDEOSDK_AUTH_TOKEN,
**context
}
session = AgentSession(
agent=VoiceAgent(context=agent_context),
pipeline=pipeline,
context=agent_context
)
return session
def get_agent_name(self) -> str:
return "your_ai_agent"
```
### 2. Update AI Agent Factory
Add to `ai/__init__.py`:
```python
from .your_ai_agent import YourAIAgent
def get_ai_agent(agent_name: str = "gemini") -> AIAgent:
agents = {
"gemini": GeminiAgent,
"your_ai_agent": YourAIAgent,
}
# ... rest of function
```
## Testing
### Health Check
```bash
curl "http://localhost:8000/health"
```
### Outbound Call Test (SIP UAC)
```bash
curl -X POST "http://localhost:8000/outbound-call" \
-H "Content-Type: application/json" \
-d '{"to_number": "+1234567890", "initial_greeting": "Hello from AI Agent!"}'
```
### Switch SIP Provider
```bash
curl -X POST "http://localhost:8000/configure-provider?provider_name=twilio"
```
## 🔧 Configuration
### Environment Variables
| Variable | Description | Required |
| ----------------------- | ----------------------------- | -------- |
| `VIDEOSDK_AUTH_TOKEN` | VideoSDK authentication token | ✅ |
| `VIDEOSDK_SIP_USERNAME` | VideoSDK SIP username | ✅ |
| `VIDEOSDK_SIP_PASSWORD` | VideoSDK SIP password | ✅ |
| `GOOGLE_API_KEY` | Google API key for Gemini | ✅ |
| `TWILIO_SID` | Twilio account SID | ✅ |
| `TWILIO_AUTH_TOKEN` | Twilio auth token | ✅ |
| `TWILIO_NUMBER` | Twilio phone number | ✅ |
### Provider-Specific Variables
For additional SIP providers, add their specific environment variables to `config.py`.
## Features
- **SIP/VoIP Integration**: Pluggable SIP providers (Twilio, and more) with session initiation protocol support
- **AI-Powered Voice Agents**: Pluggable AI agents (Gemini, and more) for intelligent call handling
- **Real-time Voice Communication**: AI agents with real-time transport protocol (RTP) capabilities
- **Modular Architecture**: Clean separation of concerns for scalable telephony solutions
- **Runtime Configuration**: Switch SIP providers and AI agents without restart
- **VideoSDK Integration**: Seamless room creation and session management
- **Call Control**: Advanced call routing, forwarding, and transfer capabilities
- **Codec Support**: Multiple audio codecs for optimal voice quality
## Use Cases
### Customer Service (SIP-based)
- AI agents handle customer inquiries via VoIP
- 24/7 availability with SIP trunking
- Consistent service quality across PSTN and IP networks
### Appointment Scheduling
- Automated appointment booking via SIP calls
- Reminder calls using SIP user agent client
- Rescheduling assistance with DTMF support
### Surveys and Feedback
- Automated survey calls over SIP
- Customer feedback collection via VoIP
- Data collection with real-time transport protocol
### Emergency Notifications
- Automated emergency alerts via SIP trunking
- Mass notification systems using PSTN integration
- Status updates through IP multimedia subsystem (IMS)
## Architecture Benefits
1. **Separation of Concerns**: Each component has a single responsibility
2. **Extensibility**: Easy to add new SIP providers and AI agents
3. **Testability**: Components can be tested in isolation
4. **Maintainability**: Clear structure makes code easier to understand
5. **Reusability**: Components can be reused across different projects
6. **Configuration Management**: Centralized configuration with validation
7. **SIP Compliance**: Full session initiation protocol support
8. **VoIP Integration**: Seamless integration with voice over internet protocol
## Roadmap
- [ ] Add support for multiple AI agents per session
- [ ] Implement SIP-specific features (SBC, registrar, proxy server)
- [ ] Add monitoring and metrics for SIP sessions
- [ ] Create provider-specific webhook handlers
- [ ] Add support for different voice codecs per AI agent
- [ ] Implement call recording and transcription
- [ ] Add sentiment analysis for call quality
- [ ] Create web dashboard for call management
- [ ] Support for H.323 protocol integration
- [ ] Advanced call control features (forwarding, transfer, queue)
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
### Guidelines
- Follow the existing code patterns
- Add proper error handling
- Include logging
- Update documentation
- Add tests if possible
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
**Made with ❤️ for the developer community**