WebRTC & TURN
Vidivo uses browser-native WebRTC for peer-to-peer video calls. No SFU (Selective Forwarding Unit) is needed for 1-on-1 calls. TURN relay is available as a fallback for approximately 15–20% of calls where direct P2P connectivity fails.
P2P Architecture
Section titled “P2P Architecture”Vidivo’s WebRTC architecture is deliberately simple for the 1-on-1 use case:
Guest Browser ◄══════════════════════► Host Browser Direct P2P (80-85%) DTLS-SRTP encrypted Browser-to-browser
OR (if direct fails):
Guest Browser ◄═══► TURN Server ◄═══► Host Browser Relay (~15-20%) Still DTLS-SRTP encrypted Media relayed via VPSWhy no SFU?
- SFUs are needed for group calls (3+ participants). Vidivo is strictly 1-on-1.
- P2P eliminates server media processing costs.
- P2P reduces latency — there is no server hop.
- DTLS-SRTP encryption is mandatory in browser WebRTC; media is encrypted even on TURN relay.
Extensibility: The architecture is designed to support future SFU integration (screen share, group calls, whiteboard) without major refactoring. The signaling protocol uses generic message types that a future SFU could consume.
Signaling Flow
Section titled “Signaling Flow”Signaling is the process of exchanging connection metadata between peers before the media stream begins. Vidivo’s signaling service is a WebSocket relay — it passes messages between the guest and host without modifying them.
Full Signaling Sequence
Section titled “Full Signaling Sequence” Guest Signal Server (WS) Host │ │ │ │── connect(token) ────────►│ │ │ │ │ │ │◄── connect(token) ───────│ │ │ │ │◄─── peer_joined ──────────│──── peer_joined ─────────►│ │ │ │ │ (create RTCPeerConnection, add media tracks) │ │ │ │ │──── offer (SDP) ─────────►│──── offer (SDP) ─────────►│ │ │ │ │ │ (setRemoteDescription)│ │ │ (create answer) │ │ │◄─── answer (SDP) ─────────│ │◄─── answer (SDP) ─────────│ │ │ │ │ │ (setRemoteDescription) │ │ │ │ │ │── ice_candidate ─────────►│──── ice_candidate ───────►│ │◄─ ice_candidate ──────────│◄─── ice_candidate ────────│ │ │ │ │ (ICE negotiation runs in background) │ │ │ │ │◄══════════════════ P2P or TURN connection ══════════► │ │ │ │ │ (media flows directly between browsers)│SDP Exchange
Section titled “SDP Exchange”Session Description Protocol (SDP) describes the media capabilities of each peer:
- Supported codecs (VP8, VP9, H.264 for video; Opus for audio)
- ICE credentials (username fragment + password)
- DTLS fingerprint for key verification
The offer is created by the guest (caller). The answer is created by the host. Both are relayed verbatim through the signaling server.
Signaling Message Format
Section titled “Signaling Message Format”{ "type": "offer", "payload": { "sdp": "v=0\r\no=- 46117317 2 IN IP4 127.0.0.1\r\n..." }}{ "type": "ice_candidate", "payload": { "candidate": "candidate:1 1 UDP 2130706431 192.168.1.100 54400 typ host", "sdpMid": "0", "sdpMLineIndex": 0 }}ICE Candidate Exchange
Section titled “ICE Candidate Exchange”Interactive Connectivity Establishment (ICE) is the protocol that finds the best network path between the two peers. ICE tests multiple candidate pairs in parallel and uses the best working path.
Candidate Types
Section titled “Candidate Types”| Type | Description | Priority |
|---|---|---|
host | Direct LAN IP address | Highest |
srflx | Server-reflexive — public IP via STUN | Medium |
relay | TURN relay address | Lowest (fallback) |
ICE Gathering Process
Section titled “ICE Gathering Process” Browser │ │ 1. host candidates — local network interfaces │ (e.g. 192.168.1.100:54400) │ │ 2. srflx candidates — public IP via STUN │ Browser contacts STUN server (Google's or Vidivo's) │ Response: "your public IP is 203.0.113.45:54400" │ │ 3. relay candidates — TURN relay │ Browser contacts TURN server with credentials │ TURN allocates a relay port │ Response: "relay at 198.51.100.10:49152" │ ▼ ICE candidates sent to remote peer via signaling Both peers test all candidate pairs Best working pair is selected (host > srflx > relay)TURN Server
Section titled “TURN Server”The TURN server at turn.vidivo.app is deployed on a dedicated VPS, outside Docker Swarm, running as a pion/turn binary managed by systemd.
Why Dedicated VPS?
Section titled “Why Dedicated VPS?”- TURN requires raw UDP access that cannot be routed through Traefik
- TURN traffic is high-bandwidth media relay — isolated from API services
- Systemd provides reliable restarts without Swarm overhead
| Protocol | Port | Usage |
|---|---|---|
| UDP | 3478 | Standard TURN/STUN |
| TCP | 3478 | TURN over TCP (firewall fallback) |
| TLS/TCP | 5349 | TURN over TLS (TURNS) |
| UDP | 49152–65535 | Media relay port range |
HMAC-SHA1 Credential Generation
Section titled “HMAC-SHA1 Credential Generation”Vidivo uses RFC 5389 time-limited HMAC credentials to authenticate clients to the TURN server. Credentials are generated server-side and issued to clients in the POST /calls/initiate response.
Credential format:
username = "<expiry_unix_timestamp>:<user_id>"password = HMAC-SHA1(secret_key, username)Example:
username = "1710090000:01JN2X6M0P5H9SRBTXY8ZDKEFG"password = base64(HMAC-SHA1("your-turn-secret", username))Credentials expire 24 hours after the call was initiated. The TURN server validates credentials by recomputing the HMAC using the shared secret.
Go implementation:
package turn
import ( "crypto/hmac" "crypto/sha1" "encoding/base64" "fmt" "time")
func GenerateCredentials(secret, userID string) (username, password string) { expiry := time.Now().Add(24 * time.Hour).Unix() username = fmt.Sprintf("%d:%s", expiry, userID)
mac := hmac.New(sha1.New, []byte(secret)) mac.Write([]byte(username)) password = base64.StdEncoding.EncodeToString(mac.Sum(nil))
return username, password}JavaScript (browser) usage:
const pc = new RTCPeerConnection({ iceServers: [ { urls: ['turn:turn.vidivo.app:3478', 'turns:turn.vidivo.app:5349'], username: '1710090000:01JN2X6M0P5H9SRBTXY8ZDKEFG', credential: 'base64-hmac-sha1-here', }, { urls: ['stun:stun.l.google.com:19302'], // Fallback STUN }, ],});DTLS-SRTP Encryption
Section titled “DTLS-SRTP Encryption”All WebRTC media — whether P2P direct or relayed through TURN — is encrypted with DTLS-SRTP:
- DTLS (Datagram TLS) performs the key exchange over UDP
- SRTP (Secure RTP) carries the encrypted media
- Encryption is mandatory in all modern browsers — there is no plaintext WebRTC
The encryption flow:
Guest Host │ │ │── DTLS ClientHello ───────────────►│ │◄── DTLS ServerHello ───────────────│ │ │ │ (certificate fingerprints compared │ against those in SDP — prevents MITM) │ │ │ SRTP keys derived from DTLS handshake │ │ │══ Encrypted audio/video (SRTP) ════│Media Configuration
Section titled “Media Configuration”Vidivo’s WebRTC configuration targets high-quality audio and acceptable video quality for 1-on-1 professional calls:
| Setting | Value |
|---|---|
| Codec | Opus |
| Sample rate | 48 kHz |
| Echo cancellation | Enabled |
| Noise suppression | Enabled |
| Auto gain control | Enabled |
| Setting | Value |
|---|---|
| Codec preference | VP8 (widest support), VP9 (if both peers support) |
| Target resolution | 720p (1280×720) |
| Target frame rate | 30 fps |
| Bitrate (video) | 500 kbps – 2.5 Mbps (adaptive) |
// Media constraints used in Vidivo web appconst constraints = { audio: { echoCancellation: true, noiseSuppression: true, autoGainControl: true, }, video: { width: { ideal: 1280 }, height: { ideal: 720 }, frameRate: { ideal: 30 }, },};
const stream = await navigator.mediaDevices.getUserMedia(constraints);Connection Failure Handling
Section titled “Connection Failure Handling”| Failure | Detection | Recovery |
|---|---|---|
| ICE failure (no path found) | iceConnectionState === 'failed' | Restart ICE (restartIce()) |
| ICE disconnected (brief outage) | iceConnectionState === 'disconnected' | Wait 10s, then restart ICE |
| Signaling WebSocket dropped | ws.onclose | Reconnect with exponential backoff |
| TURN server unreachable | ICE timeout on relay candidates | Attempt ICE restart; if still failing, surface error to user |
Redis Signaling State
Section titled “Redis Signaling State”During the signaling phase, SDP offers and answers are briefly stored in Redis as a relay mechanism:
| Key | TTL | Content |
|---|---|---|
signal:{session_id}:offer | 30s | SDP offer from guest |
signal:{session_id}:answer | 30s | SDP answer from host |
signal:{session_id}:ice:{n} | 30s | ICE candidates |
These keys are ephemeral — once both peers have connected and the WebSocket relay is active, Redis is not used for ongoing signaling.
Extensibility Notes
Section titled “Extensibility Notes”The WebRTC implementation is built with extensibility in mind for planned features:
- Screen sharing: Use
getDisplayMedia()and add a second video track to the peer connection - Data channels:
RTCDataChannelis available for future features (whiteboard, file transfer, chat) - Recording: Can be added by routing a media stream through a server-side MediaRecorder (requires SFU for server-side recording)
- Group calls: Requires migrating to an SFU (e.g. Livekit or Mediasoup). The signaling protocol is designed to be SFU-compatible.