TelcomIQ

Navigate

Graph

SIP

Session Initiation Protocol β€” the signalling backbone of IMS voice and messaging

Type

protocol

Generations

4G5Gcross-gen

Threat level

medium
🧩

Quiz coming soon for this topic.

Overview

SIP β€” the Session Initiation Protocol β€” is the IETF application-layer signalling protocol defined in RFC 3261. It is the protocol that sets up, modifies, and tears down real-time multimedia sessions: voice calls, video calls, presence subscriptions, and messaging sessions. In telecom, SIP replaced ISUP as the call control protocol when operators moved voice services from circuit-switched to packet-based IP Multimedia Subsystem (IMS) architectures.

SIP is a text-based protocol modelled on HTTP. A SIP message is either a request (INVITE, REGISTER, BYE, CANCEL, OPTIONS, SUBSCRIBE, NOTIFY, MESSAGE, REFER) or a response (identified by a three-digit status code). Both requests and responses carry headers that identify the participants, route the message through the network, describe session parameters, and carry authentication credentials. This human-readable design made SIP relatively easy to implement and extend, which is part of why it became universal in VoIP and IMS β€” and why it carries a well-documented attack surface.

In 3GPP IMS networks, SIP is carried on the Gm interface between the UE and P-CSCF, on the Mw interface between CSCFs, on the ISC interface between the S-CSCF and application servers, and on the Mg, Mi, and Mn interfaces between IMS and the PSTN/CS domain. 3GPP TS 24.229 defines the extensive set of extensions and mandatory procedures that operators must implement on top of base RFC 3261.


How it works

SIP follows a request-response model. A user agent client (UAC) sends a request; one or more proxies may forward it; the user agent server (UAS) processes it and returns a response. For session establishment, the key messages are INVITE (propose a session), provisional responses (100 Trying, 180 Ringing), 200 OK (accept the session), ACK (confirm the 200), and BYE (terminate the session).

The SIP message structure:

  • Request line / Status line β€” Identifies the method and target URI (requests) or the response code and reason phrase.
  • Headers β€” Via (routing trace and response addressing), From, To, Call-ID (globally unique session identifier), CSeq (command sequence, request ordering), Contact (direct address for subsequent requests), Max-Forwards, Content-Type, and many extension headers.
  • Message body β€” Typically SDP (Session Description Protocol) in INVITE messages, describing the media streams to be established: codecs, IP addresses, ports, and media parameters.

Registration

Before a UE can receive calls, it must register with the IMS network via a REGISTER request to the P-CSCF:

  1. UE sends an initial REGISTER to the P-CSCF with no credentials. The P-CSCF forwards it to the I-CSCF, which queries the HSS to identify the correct S-CSCF for this subscriber.
  2. The S-CSCF returns a 401 Unauthorized response carrying an IMS AKA challenge (WWW-Authenticate header).
  3. The UE computes the authentication response using its USIM credentials and sends a second REGISTER with an Authorization header containing the computed response.
  4. The S-CSCF validates the response against the authentication vector retrieved from the HSS and returns 200 OK, completing registration.

The S-CSCF now stores the subscriber's registered contact address and is the node through which all subsequent inbound and outbound SIP requests for this subscriber will pass.

Session establishment (VoLTE call)

A VoLTE call from subscriber A to subscriber B:

  1. A's UE sends an INVITE to the P-CSCF. The SDP body describes A's media capabilities: supported codecs (AMR, AMR-WB, EVS), IP address, and RTP port. The P-CSCF adds the P-Access-Network-Info header and forwards to the S-CSCF via Mw.
  2. The S-CSCF applies service logic (via filter criteria on the ISC interface to application servers) and forwards the INVITE toward B's S-CSCF.
  3. B's S-CSCF delivers the INVITE to B's P-CSCF and then to B's UE. B's UE returns 180 Ringing (propagated back to A) and then 200 OK with SDP answer describing B's media parameters.
  4. A's UE sends ACK to confirm the session. Both UEs now have each other's media addresses and begin RTP streams directly (or via a Media Resource Function in some operator configurations).
  5. Either party ends the call by sending BYE, to which the other responds with 200 OK.

Architecture role

SIP is the signalling layer of IMS β€” every IMS service runs on top of it. The relationship between SIP and IMS is analogous to the relationship between MAP and the SS7 core: SIP is the transport for IMS signalling in the same way that MAP was the transport for 2G/3G subscriber management.

The IMS core (comprising the P-CSCF, I-CSCF, and S-CSCF) is the SIP proxy infrastructure. The P-CSCF is the subscriber's first SIP contact point β€” it performs security negotiation, compression, and sends the SIP path toward the IMS core. The I-CSCF is the entry point for inbound sessions from outside the operator and for routing SIP toward the correct S-CSCF. The S-CSCF is the registrar and service platform: it maintains registration state, evaluates filter criteria, and invokes application servers.

In 4G EPC with VoLTE: Every call setup, supplementary service activation, and SMS-over-IMS exchange is a SIP transaction. The S-CSCF is the central state machine. Diameter interfaces (S6a, Cx, Sh) are the data plane that the S-CSCF queries to authenticate subscribers and retrieve service profiles from the HSS.

In 5G SA, SIP remains the IMS signalling protocol. The underlying core is HTTP/2-based SBI, but IMS is not redesigned for 5G β€” the IMS architecture is overlaid on 5G in the same way it was on 4G. VoNR uses the same SIP/IMS stack as VoLTE; the difference is the radio access network and the 5G-specific QoS and policy framework.

SIP also replaced ISUP at the PSTN interconnect boundary. Where legacy operators used ISUP to signal call setup between exchanges, modern PSTN gateways use SIP trunking. The Mg and Mi interfaces in IMS connect the S-CSCF to ISUP-speaking PSTN networks via media gateways and session border controllers.


Key interfaces

InterfaceBetweenDirectionPurpose
GmUE ↔ P-CSCFBidirectionalUE-to-IMS SIP signalling; registration and sessions
MwP-CSCF ↔ I-CSCF ↔ S-CSCFBidirectionalInter-CSCF SIP routing within operator IMS
ISCS-CSCF ↔ ASBidirectionalApplication server invocation (telephony features)
MgMGCF ↔ IMSBidirectionalSIP-to-ISUP interworking at PSTN boundary
MiBGCF ↔ MGCFBidirectionalRouting selection for PSTN breakout
MnMGCF ↔ MGWUnidirectionalControl of media gateway for PSTN interworking

Security posture

SIP's security model is layered. At the UE-to-network interface (Gm), 3GPP mandates IMS AKA authentication (RFC 3310), which provides mutual authentication using the USIM as the credential anchor. This is a genuine security control β€” it ensures that a device cannot register as a subscriber without the correct USIM key.

Beyond the Gm interface, the trust model degrades. Between IMS proxies (Mw), SIP messages are trusted based on their origin IP address and identity headers. There is no message-level authentication between CSCFs in the same operator network, and inter-operator SIP interconnects are even less controlled. The P-Asserted-Identity header carries the authenticated caller identity but there is no end-to-end integrity protection β€” a proxy can assert any identity.

The greatest practical risk is at enterprise SIP trunks and unprotected SIP interfaces exposed to the public internet or to shared IMS platforms. Enterprise customers are frequently given SIP trunks with limited authentication (IP-based trust or weak digest authentication), and these are a major source of toll fraud incidents.


Attack surface

Registration hijacking via spoofed REGISTER

If a SIP infrastructure element accepts REGISTER requests without proper IMS AKA authentication β€” or if an attacker can replay credentials obtained from a legitimate device β€” they can register a new contact address for the victim subscriber. Subsequent inbound calls and messages are then delivered to the attacker's endpoint rather than the legitimate device.

Impact: All inbound calls and SMS/messaging sessions delivered to the attacker; subscriber unreachable; two-factor authentication via SMS intercepted.
Difficulty: Medium to High. Effective against enterprise SIP trunks with IP-only authentication; much harder against properly implemented IMS AKA.

Toll fraud via unauthorised INVITE

An attacker with access to a SIP trunk or an unauthenticated SIP endpoint sends INVITE requests to premium rate numbers. The call is established, the attacker's accomplice at the premium number answers, and the operator or enterprise is charged for the call duration. This is the single largest source of financial loss in telecommunications fraud.

Impact: Direct financial loss to the operator or enterprise hosting the SIP trunk; losses of millions of dollars are reported annually across the industry.
Difficulty: Low against unprotected SIP trunks; Medium against authenticated trunks where credentials have been compromised.

Caller ID spoofing via forged From: header

The SIP From: header carries the caller's identity as a free-form string. Any SIP client can set the From: header to any value. Unless the receiving network enforces P-Asserted-Identity and validates it against authenticated credentials, the displayed caller ID is unverified.

Impact: Social engineering enablement; impersonation of banks, emergency services, and government institutions.
Difficulty: Low. Trivial with any SIP softphone or SIP trunk without identity verification.

INVITE flood denial of service

Sending a large volume of INVITE or REGISTER requests to a SIP proxy exhausts its transaction processing capacity, causing legitimate calls to fail. Unlike a TCP SYN flood, SIP INVITE floods generate stateful transactions β€” each INVITE that reaches the server consumes memory for the duration of the transaction timeout.

Impact: SIP proxy degraded or unavailable; VoLTE calls fail network-wide.
Difficulty: Low. Requires only UDP packet generation at sufficient volume.


Mitigations

  • IMS AKA enforcement: All SIP REGISTER requests on the Gm interface must use IMS AKA (RFC 3310). IP-address-only authentication for subscriber registration should not be accepted. This eliminates registration hijacking in the subscriber access path.

  • P-Asserted-Identity validation at the S-CSCF: The S-CSCF must verify that the P-Asserted-Identity in outbound INVITE requests matches the authenticated identity from the subscriber's REGISTER. A UE claiming to be a different subscriber in its From: header should have the From: overwritten by the S-CSCF with the authenticated identity.

  • Rate limiting at the P-CSCF: Apply per-subscriber rate limits on REGISTER and INVITE requests. A subscriber generating hundreds of INVITEs per minute is either compromised or malfunctioning. The P-CSCF is the correct enforcement point because it is the first network element in the SIP path.

  • SIP trunk authentication: Enterprise SIP trunks must use digest authentication with strong credentials, not IP-address-based trust alone. IP address controls should be layered on top of credential authentication, not used as a replacement.

  • Anomaly detection on calling patterns: Toll fraud has recognisable signatures: calls to premium numbers, calls to unusual geographic destinations, sustained call duration, and calls at atypical hours. Real-time CDR analysis against per-subscriber baselines is effective at detecting compromise before significant financial exposure occurs.


Spec references

  • RFC 3261 β€” The foundational SIP specification. Section 8 covers registration; Section 10 covers the REGISTER method in detail; Section 13 covers the INVITE dialog; Section 17 covers the transaction state machine. Required reading before any 3GPP SIP work.

  • 3GPP TS 24.229 β€” The normative 3GPP IMS SIP specification. Defines all the 3GPP-specific SIP extensions, mandatory procedures for VoLTE, and the behaviour of each IMS functional entity. The primary reference for operator IMS implementations.

  • RFC 3711 β€” The Secure RTP (SRTP) specification. Section 3 defines the SRTP packet format; Section 9 defines the key derivation. Mandatory for understanding IMS media security.

  • 3GPP TS 33.203 β€” Access security for IP-based services. Defines IMS AKA authentication, the Security Mode procedure on Gm, and the IPsec requirements between UE and P-CSCF.


SIP is inseparable from IMS β€” it is the signalling protocol through which all IMS services operate. The IMS core (P-CSCF, I-CSCF, S-CSCF) is the SIP infrastructure. RTP carries the actual media streams that SIP sessions establish.

The services built on SIP include VoLTE, VoNR, and SMS over IMS. Diameter is the complementary signalling protocol β€” SIP handles session control while Diameter handles authentication (Cx), subscriber profile (Sh), and policy (Gx) in parallel with SIP transactions.

For the attack dimension, see SIP/VoIP attacks. SIP superseded ISUP as the call control protocol; understanding ISUP's role clarifies why the interconnect boundary between SIP and ISUP-speaking PSTN networks introduces additional attack surface.