Skip to content

WebRTC & TURN

Vidivo uses browser-native WebRTC for peer-to-peer video calls. No SFU (Selective Forwarding Unit) is needed for 1-on-1 calls. TURN relay is available as a fallback for approximately 15–20% of calls where direct P2P connectivity fails.

Vidivo’s WebRTC architecture is deliberately simple for the 1-on-1 use case:

Guest Browser ◄══════════════════════► Host Browser
Direct P2P (80-85%)
DTLS-SRTP encrypted
Browser-to-browser
OR (if direct fails):
Guest Browser ◄═══► TURN Server ◄═══► Host Browser
Relay (~15-20%)
Still DTLS-SRTP encrypted
Media relayed via VPS

Why no SFU?

  • SFUs are needed for group calls (3+ participants). Vidivo is strictly 1-on-1.
  • P2P eliminates server media processing costs.
  • P2P reduces latency — there is no server hop.
  • DTLS-SRTP encryption is mandatory in browser WebRTC; media is encrypted even on TURN relay.

Extensibility: The architecture is designed to support future SFU integration (screen share, group calls, whiteboard) without major refactoring. The signaling protocol uses generic message types that a future SFU could consume.

Signaling is the process of exchanging connection metadata between peers before the media stream begins. Vidivo’s signaling service is a WebSocket relay — it passes messages between the guest and host without modifying them.

Guest Signal Server (WS) Host
│ │ │
│── connect(token) ────────►│ │
│ │ │
│ │◄── connect(token) ───────│
│ │ │
│◄─── peer_joined ──────────│──── peer_joined ─────────►│
│ │ │
│ (create RTCPeerConnection, add media tracks) │
│ │ │
│──── offer (SDP) ─────────►│──── offer (SDP) ─────────►│
│ │ │
│ │ (setRemoteDescription)│
│ │ (create answer) │
│ │◄─── answer (SDP) ─────────│
│◄─── answer (SDP) ─────────│ │
│ │ │
│ (setRemoteDescription) │ │
│ │ │
│── ice_candidate ─────────►│──── ice_candidate ───────►│
│◄─ ice_candidate ──────────│◄─── ice_candidate ────────│
│ │ │
│ (ICE negotiation runs in background) │
│ │ │
│◄══════════════════ P2P or TURN connection ══════════► │
│ │ │
│ (media flows directly between browsers)│

Session Description Protocol (SDP) describes the media capabilities of each peer:

  • Supported codecs (VP8, VP9, H.264 for video; Opus for audio)
  • ICE credentials (username fragment + password)
  • DTLS fingerprint for key verification

The offer is created by the guest (caller). The answer is created by the host. Both are relayed verbatim through the signaling server.

{
"type": "offer",
"payload": {
"sdp": "v=0\r\no=- 46117317 2 IN IP4 127.0.0.1\r\n..."
}
}
{
"type": "ice_candidate",
"payload": {
"candidate": "candidate:1 1 UDP 2130706431 192.168.1.100 54400 typ host",
"sdpMid": "0",
"sdpMLineIndex": 0
}
}

Interactive Connectivity Establishment (ICE) is the protocol that finds the best network path between the two peers. ICE tests multiple candidate pairs in parallel and uses the best working path.

TypeDescriptionPriority
hostDirect LAN IP addressHighest
srflxServer-reflexive — public IP via STUNMedium
relayTURN relay addressLowest (fallback)
Browser
│ 1. host candidates — local network interfaces
│ (e.g. 192.168.1.100:54400)
│ 2. srflx candidates — public IP via STUN
│ Browser contacts STUN server (Google's or Vidivo's)
│ Response: "your public IP is 203.0.113.45:54400"
│ 3. relay candidates — TURN relay
│ Browser contacts TURN server with credentials
│ TURN allocates a relay port
│ Response: "relay at 198.51.100.10:49152"
ICE candidates sent to remote peer via signaling
Both peers test all candidate pairs
Best working pair is selected (host > srflx > relay)

The TURN server at turn.vidivo.app is deployed on a dedicated VPS, outside Docker Swarm, running as a pion/turn binary managed by systemd.

  • TURN requires raw UDP access that cannot be routed through Traefik
  • TURN traffic is high-bandwidth media relay — isolated from API services
  • Systemd provides reliable restarts without Swarm overhead
ProtocolPortUsage
UDP3478Standard TURN/STUN
TCP3478TURN over TCP (firewall fallback)
TLS/TCP5349TURN over TLS (TURNS)
UDP49152–65535Media relay port range

Vidivo uses RFC 5389 time-limited HMAC credentials to authenticate clients to the TURN server. Credentials are generated server-side and issued to clients in the POST /calls/initiate response.

Credential format:

username = "<expiry_unix_timestamp>:<user_id>"
password = HMAC-SHA1(secret_key, username)

Example:

username = "1710090000:01JN2X6M0P5H9SRBTXY8ZDKEFG"
password = base64(HMAC-SHA1("your-turn-secret", username))

Credentials expire 24 hours after the call was initiated. The TURN server validates credentials by recomputing the HMAC using the shared secret.

Go implementation:

package turn
import (
"crypto/hmac"
"crypto/sha1"
"encoding/base64"
"fmt"
"time"
)
func GenerateCredentials(secret, userID string) (username, password string) {
expiry := time.Now().Add(24 * time.Hour).Unix()
username = fmt.Sprintf("%d:%s", expiry, userID)
mac := hmac.New(sha1.New, []byte(secret))
mac.Write([]byte(username))
password = base64.StdEncoding.EncodeToString(mac.Sum(nil))
return username, password
}

JavaScript (browser) usage:

const pc = new RTCPeerConnection({
iceServers: [
{
urls: ['turn:turn.vidivo.app:3478', 'turns:turn.vidivo.app:5349'],
username: '1710090000:01JN2X6M0P5H9SRBTXY8ZDKEFG',
credential: 'base64-hmac-sha1-here',
},
{
urls: ['stun:stun.l.google.com:19302'], // Fallback STUN
},
],
});

All WebRTC media — whether P2P direct or relayed through TURN — is encrypted with DTLS-SRTP:

  • DTLS (Datagram TLS) performs the key exchange over UDP
  • SRTP (Secure RTP) carries the encrypted media
  • Encryption is mandatory in all modern browsers — there is no plaintext WebRTC

The encryption flow:

Guest Host
│ │
│── DTLS ClientHello ───────────────►│
│◄── DTLS ServerHello ───────────────│
│ │
│ (certificate fingerprints compared
│ against those in SDP — prevents MITM)
│ │
│ SRTP keys derived from DTLS handshake
│ │
│══ Encrypted audio/video (SRTP) ════│

Vidivo’s WebRTC configuration targets high-quality audio and acceptable video quality for 1-on-1 professional calls:

SettingValue
CodecOpus
Sample rate48 kHz
Echo cancellationEnabled
Noise suppressionEnabled
Auto gain controlEnabled
SettingValue
Codec preferenceVP8 (widest support), VP9 (if both peers support)
Target resolution720p (1280×720)
Target frame rate30 fps
Bitrate (video)500 kbps – 2.5 Mbps (adaptive)
// Media constraints used in Vidivo web app
const constraints = {
audio: {
echoCancellation: true,
noiseSuppression: true,
autoGainControl: true,
},
video: {
width: { ideal: 1280 },
height: { ideal: 720 },
frameRate: { ideal: 30 },
},
};
const stream = await navigator.mediaDevices.getUserMedia(constraints);
FailureDetectionRecovery
ICE failure (no path found)iceConnectionState === 'failed'Restart ICE (restartIce())
ICE disconnected (brief outage)iceConnectionState === 'disconnected'Wait 10s, then restart ICE
Signaling WebSocket droppedws.oncloseReconnect with exponential backoff
TURN server unreachableICE timeout on relay candidatesAttempt ICE restart; if still failing, surface error to user

During the signaling phase, SDP offers and answers are briefly stored in Redis as a relay mechanism:

KeyTTLContent
signal:{session_id}:offer30sSDP offer from guest
signal:{session_id}:answer30sSDP answer from host
signal:{session_id}:ice:{n}30sICE candidates

These keys are ephemeral — once both peers have connected and the WebSocket relay is active, Redis is not used for ongoing signaling.

The WebRTC implementation is built with extensibility in mind for planned features:

  • Screen sharing: Use getDisplayMedia() and add a second video track to the peer connection
  • Data channels: RTCDataChannel is available for future features (whiteboard, file transfer, chat)
  • Recording: Can be added by routing a media stream through a server-side MediaRecorder (requires SFU for server-side recording)
  • Group calls: Requires migrating to an SFU (e.g. Livekit or Mediasoup). The signaling protocol is designed to be SFU-compatible.