Voice over IP

This module outlines how two users in a room can set up a Voice over IP (VoIP) call to each other. Voice and video calls are built upon the WebRTC 1.0 standard. Call signalling is achieved by sending message events to the room. In this version of the spec, only two-party communication is supported (e.g. between two peers, or between a peer and a multi-point conferencing unit). This means that clients MUST only send call events to rooms with exactly two participants.

Events

m.call.answer


This event is sent by the callee when they wish to answer the call.

Event type: Message event

Content

Name Type Description
answer Answer Required: The session description object
call_id string Required: The ID of the call this event relates to.
version number Required:
Answer
Name Type Description
sdp string Required: The SDP text of the session description.
type enum Required: The type of session description.

One of: [answer].

Examples

{
  "content": {
    "answer": {
      "sdp": "v=0\r\no=- 6584580628695956864 2 IN IP4 127.0.0.1[...]",
      "type": "answer"
    },
    "call_id": "12345",
    "lifetime": 60000,
    "version": 0
  },
  "event_id": "$143273582443PhrSn:example.org",
  "origin_server_ts": 1432735824653,
  "room_id": "!jEsUZKDJdhlrceRyVU:example.org",
  "sender": "@example:example.org",
  "type": "m.call.answer",
  "unsigned": {
    "age": 1234
  }
}

m.call.candidates


This event is sent by callers after sending an invite and by the callee after answering. Its purpose is to give the other party additional ICE candidates to try using to communicate.

Event type: Message event

Content

Name Type Description
call_id string Required: The ID of the call this event relates to.
candidates [Candidate] Required: Array of objects describing the candidates.
version integer Required: The version of the VoIP specification this messages adheres to. This specification is version 0.
Candidate
Name Type Description
candidate string Required: The SDP ‘a’ line of the candidate.
sdpMLineIndex number Required: The index of the SDP ’m' line this candidate is intended for.
sdpMid string Required: The SDP media type this candidate is intended for.

Examples

{
  "content": {
    "call_id": "12345",
    "candidates": [
      {
        "candidate": "candidate:863018703 1 udp 2122260223 10.9.64.156 43670 typ host generation 0",
        "sdpMLineIndex": 0,
        "sdpMid": "audio"
      }
    ],
    "version": 0
  },
  "event_id": "$143273582443PhrSn:example.org",
  "origin_server_ts": 1432735824653,
  "room_id": "!jEsUZKDJdhlrceRyVU:example.org",
  "sender": "@example:example.org",
  "type": "m.call.candidates",
  "unsigned": {
    "age": 1234
  }
}

m.call.hangup


Sent by either party to signal their termination of the call. This can be sent either once the call has has been established or before to abort the call.

Event type: Message event

Content

Name Type Description
call_id string Required: The ID of the call this event relates to.
reason enum Optional error reason for the hangup. This should not be provided when the user naturally ends or rejects the call. When there was an error in the call negotiation, this should be ice_failed for when ICE negotiation fails or invite_timeout for when the other party did not answer in time.

One of: [ice_failed invite_timeout].

version integer Required: The version of the VoIP specification this message adheres to. This specification is version 0.

Examples

{
  "content": {
    "call_id": "12345",
    "version": 0
  },
  "event_id": "$143273582443PhrSn:example.org",
  "origin_server_ts": 1432735824653,
  "room_id": "!jEsUZKDJdhlrceRyVU:example.org",
  "sender": "@example:example.org",
  "type": "m.call.hangup",
  "unsigned": {
    "age": 1234
  }
}

m.call.invite


This event is sent by the caller when they wish to establish a call.

Event type: Message event

Content

Name Type Description
call_id string Required: A unique identifier for the call.
lifetime integer Required: The time in milliseconds that the invite is valid for. Once the invite age exceeds this value, clients should discard it. They should also no longer show the call as awaiting an answer in the UI.
offer Offer Required: The session description object
version integer Required: The version of the VoIP specification this message adheres to. This specification is version 0.
Offer
Name Type Description
sdp string Required: The SDP text of the session description.
type enum Required: The type of session description.

One of: [offer].

Examples

{
  "content": {
    "call_id": "12345",
    "lifetime": 60000,
    "offer": {
      "sdp": "v=0\r\no=- 6584580628695956864 2 IN IP4 127.0.0.1[...]",
      "type": "offer"
    },
    "version": 0
  },
  "event_id": "$143273582443PhrSn:example.org",
  "origin_server_ts": 1432735824653,
  "room_id": "!jEsUZKDJdhlrceRyVU:example.org",
  "sender": "@example:example.org",
  "type": "m.call.invite",
  "unsigned": {
    "age": 1234
  }
}

Client behaviour

A call is set up with message events exchanged as follows:

Caller                    Callee
[Place Call]
m.call.invite ----------->
m.call.candidate -------->
[..candidates..] -------->
                        [Answers call]
       <--------------- m.call.answer
 [Call is active and ongoing]
       <--------------- m.call.hangup

Or a rejected call:

Caller                      Callee
m.call.invite ------------>
m.call.candidate --------->
[..candidates..] --------->
                         [Rejects call]
         <-------------- m.call.hangup

Calls are negotiated according to the WebRTC specification.

Glare

“Glare” is a problem which occurs when two users call each other at roughly the same time. This results in the call failing to set up as there already is an incoming/outgoing call. A glare resolution algorithm can be used to determine which call to hangup and which call to answer. If both clients implement the same algorithm then they will both select the same call and the call will be successfully connected.

As calls are “placed” to rooms rather than users, the glare resolution algorithm outlined below is only considered for calls which are to the same room. The algorithm is as follows:

The call setup should appear seamless to the user as if they had simply placed a call and the other party had accepted. This means any media stream that had been setup for use on a call should be transferred and used for the call that replaces it.

Server behaviour

The homeserver MAY provide a TURN server which clients can use to contact the remote party. The following HTTP API endpoints will be used by clients in order to get information about the TURN server.

GET /_matrix/client/r0/voip/turnServer


This API provides credentials for the client to use when initiating calls.

Rate-limited: Yes
Requires authentication: Yes

Request

No request parameters or request body.


Responses

Status Description
200 The TURN server credentials.
429 This request was rate-limited.

200 response

Name Type Description
password string Required: The password to use.
ttl integer Required: The time-to-live in seconds
uris [string] Required: A list of TURN URIs
username string Required: The username to use.
{
  "password": "JlKfBy1QwLrO20385QyAtEyIv0=",
  "ttl": 86400,
  "uris": [
    "turn:turn.example.com:3478?transport=udp",
    "turn:10.20.30.40:3478?transport=tcp",
    "turns:10.20.30.40:443?transport=tcp"
  ],
  "username": "1443779631:@user:example.com"
}

429 response

Name Type Description
errcode string Required: The M_LIMIT_EXCEEDED error code
error string A human-readable error message.
retry_after_ms integer The amount of time in milliseconds the client should wait before trying the request again.
{
  "errcode": "M_LIMIT_EXCEEDED",
  "error": "Too many requests",
  "retry_after_ms": 2000
}

Security considerations

Calls should only be placed to rooms with one other user in them. If they are placed to group chat rooms it is possible that another user will intercept and answer the call.