Proprietary RTCP messages + key extensions

In the previous blogs in this series, we looked at what RTCP is and its key standard messages as defined in RFC 3550. In this final series entry we will look at how applications can send their own proprietary RTCP messages, and some widely supported RTCP extensions.

Application-Defined (APP)

Application-Defined packets are designed for proprietary and experimental extensions that have not been standardized.

The regular header has a Payload Type of 204 and an Item Count that is extension-dependent. It then includes a 32-bit SSRC or CSRC which it is associated with in a fashion that is also extension-dependent.

There is then a 32-bit field containing a 4 character ASCII string; this is the Name of the extension, which is used to identify the format of the extension in question, and so should be chosen to be unique among APP packets the application supports receiving.

The format of the remainder of the RTCP packet is then defined by the extension in question, as identified by the Name, which also defines the purpose of the item count and SSRC/CSRC.

A receiver should ignore Application-Defined packets with a Name it does not recognize.

Feedback (FB)

Feedback messages are defined in RFC 4585, which adds a mechanism for negotiating and sending RTCP feedback that can be used to respond to media issues. These feedback messages are sent by the recipient of the RTP media stream. Two messages in particular, which are used to request new video keyframes, are extremely important if video is being used.

The regular header can have a Payload Type of 205 or 206 depending on the feedback message in question – 205 corresponds to a Transport Layer message while 206 corresponds to a Payload-Specific message (which to use is defined in their specification). For feedback messages, the Item Count is named the Feedback Message Type, or FMT, and also plays a role in differentiating between different types of feedback messages.

Subsequent to this is a 32-bit field for the SSRC of packet sender of the feedback message (e.g., the same value as the Reporter SSRC in the Sender Report or Receiver Report), followed by the SSRC of media source: the RTP media stream being received by the sender of the feedback message on which feedback is being provided.

Finally, there is a Feedback Control Information (FCI) portion, the contents and length of which are dependent on the type of feedback message.

A number of feedback messages are defined in RFC 4585, but there are further specifications that define other messages, of which RFC 5104 contains several key types. This document will list some of the most important and when they are used but will not go through the FCI format for each – these are generally straightforward and can be found in their relevant specifications.

Note that Feedback messages of any given type should not be sent unless they have been negotiated by both sides in SDP – see the ‘’rtcp-fb’ portion of the SDP attributes blog for more details.

Name	Defined by	P	FMR
Picture Loss Indication (PLI)	RFC 4585	206	1
Full Intra Request (FIR)	RFC 5104	206	4
Generic NACK (NACK)	RFC 4585	205	1
Application Layer	RFC 4585	206	15
Temporary Maximum Media Stream Bit Rate Request (TMMBR)	RFC 5104	205	3
Temporary Maximum Media Stream Bit Rate Notification (TMMBN)	RFC 5104	205	4

Some key RTCP feedback messages for video conferencing

PLI and FIR messages both generally request a new video keyframe from the far end, but have different semantic meanings. A Picture Loss Indication feedback message signals that one or more packets required to decode a frame of video have been lost, while a Full Intra Request feedback message explicitly requests a keyframe. However, since the response of almost all receivers to a PLI is to send a keyframe, many implementations do not differentiate between them. Technically speaking, PLI is meant to be sent when a keyframe is required due to loss, and FIR when a keyframe is required because none have been received (e.g., at the start of a stream where the keyframe was missed) but many implementations treat them interchangeably, and indeed have the same code to handle receiving either message.

Implementing a method for requesting keyframes is a fundamental requirement of sending video over a lossy medium such as RTP over UDP, and PLI/FIR is the most common method for doing so in the field. An implementation concerned with wide interoperability should advertise and negotiate support for both; otherwise, of the two, PLI is most commonly supported and used.

Another mechanism for dealing with packet loss is the Generic NACK message, which is similar to PLI but allows the sender to specify exactly which RTP packets were not successfully received. In the simplest case this can serve as yet another mechanism for prompting a keyframe, but a more sophisticated media sender can instead choose to instead retransmit the missing packets. NACK is most commonly used in WebRTC devices, while PLI/FIR tends to be used in SIP devices.

Application Layer feedback messages are a way to include proprietary or non-standard feedback messages. The Feedback Control Information portion of these messages is application-dependent, but note that it is recommended that there is some mechanism for a receiver to identify what type of non-standard feedback message is being received. This is absolutely necessary for any implementation that supports two or more Application Layer messages and can negotiate both at the same time so they can be differentiated, but it is highly recommended, even if currently your implementation only has one Application Layer message, as changing the format if you later look to add another can pose significant concerns with backwards-compatibility. Not doing so often leads to ugly workarounds such as defining (and advertising) two versions of the same message with different formats. For instance, REMB, which will be discussed later in this section, achieves this by defining that the first 32 bits of the FCI contain the US-ASCII string “REMB”.

TMMBR and TMMBN are feedback messages related to bitrate control, used to throttle the media to avoid loss. Note that TMMBN messages are unusual in that, while most feedback messages are sent by the recipient of the RTP media stream, TMMBN messages are sent as feedback in response to receiving a TMMBR message, and hence are sent by the media sender.

TMMBR and TMMBN messages are most commonly seen in SIP devices. WebRTC devices instead use an application-level message named Receiver Estimated Maximum Bitrate (REMB). This is not an IETF standard, but is instead documented as an IETF draft, draft-alvestrand-rmcat-remb. It is very similar to TMMBR, but is designed to allow more fine-grained control for use cases where multiple media streams are being received on a single RTP session. Note that while still supported this is an older method of bandwidth control; in most cases WebRTC implementations now use Transport-Wide Congestion Control (TWCC or TransportCC) to do sender-side rate control.

Sending RTCP Messages

RTCP messages are generally sent using the same transport as the RTP messages they accompany, by convention with RTP being received on an even-numbered port and RTCP being received on a port number one higher. The “a=rtcp” attribute defined in RFC 3605 allows a receiver to advertise to receive RTCP on a different IP and/or port to the RTP, but support for this should not be assumed.

Meanwhile, RFC 5761 defines advertising support for multiplexing RTP and RTCP onto the same port via the “a=rtcp-mux” attribute, which does have a good level of support among various implementations (and is used in WebRTC), and reduces the number of ports a receiver must open. The specification goes into detail about the complexities of demultiplexing RTCP from RTP, STUN and other types that might be received on the same port.

Compounding

Thanks to each RTCP packet header containing a length parameter, a single RTCP packet can contain multiple RTCP messages— referred to as a compound RTCP packet. Each message can be processed individually, and there is no significance to the ordering of the messages within the packet. Messages of a given type can appear more than once if desired.

Less intuitively, the RTCP specification (RFC 3550) mandates that all RTCP messages are sent as compound packets of two or more messages, with the first message always being an SR or RR message and the second always being an SDES message containing a CNAME. This is the case even if the device sending the RTCP packet has not received or sent any media, in which case the initial message must be a Receiver Report with zero report blocks.

Thus, to send a BYE or PLI feedback message, the RTCP sender must construct an RTCP packet containing an SR or RR message, an SDES message, and then the BYE or PLI message. For these non-scheduled messages RFC 4585 suggests using a Minimal compound RTCP packet, which contains no additional RRs and limits the SDES message to just the CNAME, though in practice most implementations do not need an additional RR or use SDES items beyond the CNAME, making this optimization moot.

This requirement is not too onerous for normal RTCP transmission, which is generally relatively infrequent, even when using feedback messages. However, if an implementation is choosing to use proprietary RTCP messaging with a much higher transmission rate, the extra bandwidth the compounding requirements impose can impose a very high cost. In such circumstances, an implementation might choose to send RTCP packets which do not comply with RFC 3550’s requirements on compounding, though care should be taken when doing so with regards to demultiplexing, and this should only be done when using these application-specific messages; standard messages should be sent following the compounding requirements.

Transmission Intervals

RFC 3550 also defines a complex set of rules for calculating the transmission interval (the periodic rate at which regular RTCP updates should be sent) based on a database of participants in the meeting as determined via the SSRCs/CSRCs received.

The reason for this is to cope with distributed meetings where a server propagates RTCP across all participants, and there is a desire to prevent the bandwidth requirements for RTCP to escalate too much in conferences with very large numbers of participants.

In practice though, in modern conferencing, media servers generally do not forward all RTCP in this fashion, and participant information is shared via some other mechanism such as a roster list distributed over the signaling channel. As such, careful management of the transmission interval is much less relevant, and many devices do not implement the complex system defined in RFC 3550. Instead, they use a static transmission internal, often of 5 seconds, at which they send the SR/RR and SDES, and then send feedback messages and BYEs as necessary.

If there is the possibility of an implementation being used in large meetings in which all RTCP information is aggregated and forwarded to each receiver, then implementors should read RFC 3550 carefully and follow the detailed guidance and algorithms provided therein to calculate an appropriate transmission time based on the RTCP received.

About The Author

Rob Hanton Principal Engineer and Architect Cisco

Rob Hanton is a Principal Engineer and Architect for Webex.

Learn more

Topics

ASCII Audio-video synchronization BYE CNAME CSRC NACK Receiver Report Report Block RFC 3550 RFC 3605 RFC 4585 RFC 5761 RTCP RTCP format RTP SDES Sender Report SSRC TMMBR

Proprietary RTCP Messages and Key Extensions

Application-Defined (APP)

Feedback (FB)

Sending RTCP Messages

Compounding

Transmission Intervals

About The Author

Topics

More like this

Building voice AI that can keep up with real conversations

Resilience by Design: How Webex Contact Center Stays Up When the ...

LRAC Challenge 2025: Pushing the limits of speech coding

RTCP Receiver Reports and Stream Synchronization.