How WebRTC works in theory order of events

There's a lot of great WebRTC demo's floating around as well as tutorials on how to build full blown WebRTC applications.

What this text aims to do is explain the message passing for creating a peer connection. That's it. No frameworks or other bits getting in the way of understanding. All I'm doing here is bringing together the relevant RFCs and pointing out which are relevant at which stage.

If you like understanding the theory then this might be for you. There's no frameworks and as few black boxes as possible.
Paper cups telephone Ever made one of these as a kid?

Parts

  • Part 1: What to know about WebRTC (this page)
  • Part 2: More on SIP & SDP
  • Part 3: ICE
  • Part 4: TURN & STUN

What to know about WebRTC

Many protocols are at play to create what the web refers to as 'WebRTC'. It is a beautiful collection of technologies both old and new which make browser-to-browser communication almost effortless. Just to name a few protocols at play to make WebRTC happen:

  • Session Initiation Protocol (SIP)
  • Session Description Protocol (SDP)
  • Interactive Connectivity Establishment (ICE)
  • Signaling - To be explained.
  • STUN
  • TURN

WebRTC is more of an advancement of the open web as technologies converge and encumbered real time technologies are made open.

Many WebRTC tutorials jump into getUserMedia for webcam demos of chat applications. Whilst I was learning WebRTC concepts I found this distracting from understanding the actual WebRTC component. The end goal here is truly understanding how peer to peer connections between web browsers are created using the WebRTC apis. What we choose to attach to these channels of communication isn't necessarily important in understanding the basic sequence of events for forming a connection.

The big picture

Understanding WebRTC

With WebRTC we can:

  • Create direct connections between browsers
    • Create rich Video & Audio streams in realtime
    • Send arbitrary data between browsers (it's not just video & audio!) Less Dependant on cloud
  • Be less dependant on cloud based models (perhaps)
  • Encrypted connections between peers
  • Performance benefits where a clients <-> client model may be more efficient than clients -> server.
  • It's opening up debate on new ways to tackle industry challenges ranges from Health, Education A-Z

For a low down on what WebRTC isn't see Arin Sime's 3 Things WebRTC Cannot Do

SIP & WebRTC

What is the Session Initiation Protocol and what, if anything has it got to do with WebRTC?

Do you need to know SIP to do WebRTC? No, not if your wanting to make a web only non-interoperable WebRTC web app.

If your interested in making a webRTC video chat application the linked tutorial by Sam Dutton provides a walk-though with less theory.

However, did you know WebRTC can be connected to public telephone networks & private IP phone systems? Getting to grips with SIP and other concepts helps explain how this is possible with WebRTC.

If that excites you then read on! WebRTC blurs the line between the web and traditional telecommunications. For the impatient see RFC3666 which gives an example of interfacing SIP with Public Switched Telephone Networks (PSTN).

WebRTC is evidence of the continuing network convergence of (often propriety) standalone VoIP systems. Rather than maintain and develop separate infrastructures for VoIP, and web traffic, webRTC can provide voice and video capabilities replacing traditional VoIP deployments.

WebRTC can feel a bit like this when considering all the various pieces:

Stanford.AI.Lab.1970s Stanford AI Lab 1970s- Kludge is OK when your working on the future!

Where does SIP fit into the picture?

Peers need to make themselves known to each other somehow. For example, to 'call' another to invite them to a communication. The Session Initiation Protocol is used for this purpose as part of the ICE process. Think of SIP as just a tap on the shoulder saying: "Hey, I wan't to talk to person x, are they there?".

Depending on your architecture, a response might come direct from the recipient or via an intermediary. SIP doesn't care how peers are going to communicate, it simply helps peers establish an intention to connect, their availability, and be informed of the success of initiating and closing down calls between peers.

Establishing peer to peer connections reliably between browsers is harder than it sounds. For WebRTC the protocol "Interactive Connectivity Establishment (ICE)" defines this process. See part three of this text which explains ICE in more detail.

Note: SIP is just the commonly used Offer/Answer protocol used with WebRTC. It is used during session establishment between two peers. 'jingle' is an example of another Offer/Answer protocol.

ICE can be used by any protocol utilizing the offer/answer model, such as the Session Initiation Protocol (SIP). - (RFC 5245)

The Session initiation protocol defines a process for Offering/Answering sessions. In crude terms, both peers need to somehow tell each other their own IP address and port numbers so they can form a peer connection. But before they can do this, they need to establish that they are trying to call each other. This is the function of SIP in terms of its association with WebRTC. Note that a messenger is needed to allow peers to find each other. 'Signalling' provides this service allowing peers to essentially 'find out' that they're being called (see part 4).

WebRTC can (and often does) use the SIP protocol as a means for kicking off this exchange. In additional to inviting a peer for connection, peers may also send initial IP addresses & ports that the peer 'thinks' it may be reachable upon. In webRTC these are exchanged in the format of a Session Description object, (you guessed it!) using the Session Description Protocol.

From the first point of contact, the peer can attach 'candidate addresses' to its initial 'hello' to expedite the process of generating a peer connection. This process is defined in the Interactive Connectivity Establishment (ICE 5245)protocol.

The ICE RFC RECOMMENDS that peers generate their local session description and send this along with their invite request to prevent perceived delays.

..to be continued