WebRTC Demystified (Part 2)

Signaling

Published on 2024-08-29

Table of Contents Icon showing table of contents is not open

Hello y'all!

Welcome back to our WebRTC Demystified series! In our last post, we looked into the architecture of WebRTC, exploring the key components that make real-time communication possible right in your browser. We touched on how WebRTC connects two devices directly and mentioned a crucial step in this process - signaling.

Today, we're going to peel back another layer of WebRTC and focus on the signaling process itself. While signaling might sound a bit technical, it's the foundation that makes all those seamless video calls, live streams, and peer-to-peer connections possible. Without it, your devices wouldn't even know how to start talking to each other.

Whether you're a developer looking to integrate WebRTC into your projects or just curious about how your favorite apps connect you to others, this post will break down everything you need to know about signaling, in a way that's easy to digest. So, let's dive in and see how this crucial piece of the WebRTC puzzle fits into the bigger picture!

Warning

We're going to get deep into technical terms and may be using jargon to explain things. Read at your own risk, and let us know if you'd like us to break it down further. Your feedback is what keeps us from turning into robots! 🤖

What is signaling in WebRTC?

Before two devices can communicate through WebRTC, they first need to find each other and agree on how to connect. This process, known as signaling, involves the exchange of control messages that help establish, maintain, and eventually terminate the connection between peers.

Think of signaling as the preliminary discussion that leads up to the actual conversation. Just like when you call someone on the phone, there's a bit of back-and-forth before you start talking - ringing, picking up, and saying "hello." In WebRTC, signaling serves a similar purpose, but instead of voices, it's devices talking. They discuss things like, "What's your IP address?" "What media formats can you support?" and "How can we ensure this connection stays secure?"

While signaling is essential for setting up a WebRTC connection, it's important to note that it's not part of the WebRTC standard itself. WebRTC handles the actual transmission of media and data between devices, but how the devices find each other and negotiate the connection is left up to the application developer. This allows developers to choose the protocol and methods that best fit their needs for exchanging the signaling information. The signaling channel is the medium used to transport these messages between peers.

This channel can be implemented in a variety of ways - using WebSockets, short polling or long polling. The choice depends on the specific use case, but the goal remains the same: to ensure both peers are on the same page before they start exchanging data.

The signaling process: Steps to establish a WebRTC connection

Establishing a WebRTC connection between two devices involves several key steps, each facilitated by the signaling process. Let's break down these steps to see how two peers go from being strangers on the internet to successfully exchanging data in real time.

Step 1: Creating and Exchanging SDP (Session Description Protocol) Offers and Answers

The process begins with the creation of a Session Description Protocol (SDP) offer by the initiating peer. The SDP offer includes information about the types of media it wants to exchange (e.g., audio, video), the codecs it supports, and details about the desired connection setup. This SDP offer is then sent to the other peer via the signaling channel.

The receiving peer responds with an SDP answer, which includes its media capabilities. This exchange sets the stage for the connection but does not yet involve the actual network details needed to connect.

Step 2: Gathering and Exchanging ICE candidates

After the SDP offer and answer have been exchanged, both peers start gathering ICE (Interactive Connectivity Establishment) candidates. ICE candidates represent potential network paths (IP addresses and ports) that each peer can use to connect to the other.

These candidates are collected incrementally, and as each peer gathers a new candidate, it sends it to the other peer via the signaling channel. This process continues until all viable candidates have been shared. However, the ICE agent remains active throughout the connection's lifecycle. It continuously monitors network interfaces, can gather new candidates if network conditions change, and may adapt the connection as needed. If a significant network change occurs, WebRTC can attempt to recover the connection automatically or may require an ICE restart, depending on the severity of the disruption. An ICE restart can reuse some of the existing connection information to minimize disruption, rather than starting the connection process from scratch.

The exchange of ICE candidates is crucial for determining the best possible network route for establishing a direct connection between the two peers, especially when dealing with NATs (Network Address Translators) and firewalls.

Step 3: Connecting peers via STUN/TURN servers

In many cases, direct peer-to-peer communication is complicated by the presence of NATs and firewalls. To navigate these obstacles, WebRTC uses STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT) servers.

STUN Servers help each peer discover its public IP address, which is necessary for making a direct connection. Think of a STUN server as a helpful receptionist - when your device needs to connect with another, the STUN server tells it what its public-facing IP address is, so other devices can easily find and connect to it.

However, TURN servers step in if a direct connection fails or isn't possible due to strict network configurations. Acting like relay operators, TURN servers pass messages between devices to ensure communication can still happen, even if it's indirect. These servers act as intermediaries, relaying data between peers when direct communication is blocked, ensuring that the conversation can continue no matter the obstacles.

Step 4: Finalizing the connection

As ICE candidates are exchanged, the RTCPeerConnection object in each peer's browser tries to establish a connection using the candidates provided. The ICE agent tests each candidate pair to find the most efficient path for communication.

Once a suitable path is found, the connection is finalized, and media streams (such as audio and video) or data channels can be established between the peers.

Step 5: Maintaining and Tearing down the Connection

Once a WebRTC connection is established, the RTCPeerConnection object manages the connection, maintaining media streams and data channels, and handling network changes dynamically. WebRTC ensures that all data transmitted between peers is end-to-end encrypted, providing a secure communication channel that protects against eavesdropping and tampering.

When the communication session is over, the application can call the RTCPeerConnection.close() method to terminate the connection. This method stops all ongoing media streams and data channels, effectively tearing down the connection.

While signaling can be used to inform the other peer that the session is ending (for example, through a custom message over the signaling channel), the actual termination of the WebRTC connection is handled by the WebRTC API, not by signaling.

Why is signaling left to the developer?

WebRTC is designed to handle the complex tasks of real-time communication, but when it comes to signaling - the process of setting up and managing connections - WebRTC leaves this to the developer. Here's why:

Flexibility and Customization: Different applications have unique requirements for signaling. By not enforcing a specific signaling protocol, WebRTC allows developers to choose the method that best suits their needs, whether it's WebSockets for real-time communication, HTTP-based polling, or even a custom-built solution. This flexibility enables developers to tailor signaling to fit into their existing infrastructure or specific use case.

Security and Integration: Signaling often involves exchanging sensitive information, such as IP addresses and connection parameters. By leaving signaling to the developer, WebRTC allows them to implement the appropriate security measures, like encryption and authentication, to protect this data. Additionally, developers can integrate WebRTC into their existing systems without having to overhaul their entire architecture, making the adoption process smoother and more secure.

Technologies used for Signaling

There are plenty of ways to exchange signaling information. In theory, people could write down the session descriptions, pass them to another person, and let them put this manually into their RTCPeerConnection. Since WebRTC is mostly used on websites, here are practical examples of how it could be done:

WebSockets

WebSockets are ideal for real-time, bi-directional communication. They maintain an open connection between the client and server, allowing messages to be exchanged without the overhead of repeatedly establishing new connections. This makes WebSockets perfect for sending the signaling information quickly to the other peer.

Long Polling

Long polling is an alternative that uses standard HTTP. In long polling, the client sends a request to the server and waits until the server has new data before responding. Once the server replies, the client immediately sends another request. While not as fast as WebSockets, long polling enables near real-time communication and can be a reliable fallback when WebSockets aren't supported.

In scenarios where backend clients need to connect and communicate without relying on a website or standard HTTP connections, long polling can be integrated with existing systems like shared databases or message queues. For example, a backend service could write updates to a shared database, and another service could use long polling to monitor changes to specific database entries, enabling them to react to new data in near real time. Similarly, long polling can be used with message queue systems where a client waits for new messages from the queue, allowing backend services to exchange data asynchronously without the need for persistent WebSocket connections. These integrations allow backend clients to utilize their existing architecture for communication without the overhead of establishing and maintaining separate connection protocols.

How Flottform's API Simplifies the Signaling Process

While WebRTC is a powerful technology, managing the signaling process can be complex and time-consuming, especially when building applications that need to scale or handle various network conditions. That's where Flottform's API can help.

Flottform makes it easy to handle WebRTC signaling by taking care of all the details, like setting up WebSockets or implementing long polling. With Flottform, you don't need to worry about the underlying technologies or how they interact with different network environments. Flottform manages the signaling process for you, automatically chooses the best method for the job, and ensures that your peer connections are established quickly and reliably.

The pro version of Flottform will include a TURN server and make it available to the connecting peers if necessary.

As always, we'd love to hear your thoughts on this! Your feedback is really valuable to us. Connect with us on LinkedIn and Twitter / X. Every comment, like, and share fuels our progress!

Cheers,

Tamara and

Jörn and

Nidhal

Newsletter Signup

Do you want to be notified when you can use Flottform yourself? Do you want to receive an e-mail whenever we post updates? Send an e-mail to newsletter@flottform.io to subscribe!

Subscribe me!