This whitepaper surveys the 3GPP standardisation and research landscape for AI/ML integration in 6G systems. We cover 14 technical domains — from channel estimation and CSI feedback compression to federated learning and semantic communications — each grounded in active 3GPP study items and published research. System-level KPI targets from ITU-R IMT-2030 provide the performance context throughout.
This whitepaper draws from two distinct source categories. Readers and implementors should be aware of the difference:
- Use-case definitions: TR 38.843 UC1/UC2/UC3
- KPI targets & evaluation methodology
- Signalling & feedback formats (CSI reporting, model transfer TR 22.874)
- AI/ML architecture integration (TR 23.700-80, O-RAN WG2)
- IMT-2030 performance requirements (ITU-R M.2160)
- Data collection for AI/ML (TR 37.817)
- Specific NN architectures: ChannelNet, CsiNet, TransNet [A1–A3]
- Semantic comms: DeepJSCC [A4]
- E2E autoencoder transceiver [A5]
- FL algorithm: FedAvg, FedProx [A6, A7]
- RL schedulers: DQN, MADDPG, MAPPO [A8–A10]
- Privacy: DP, DP-SGD, FGSM [A11–A13]
Key principle: 3GPP standardises what an AI model must achieve (KPIs, conformance envelopes, signalling) — it does not mandate specific neural-network architectures. The academic models listed above are peer-reviewed, publicly available examples illustrating how 3GPP requirements can be met. They are cited in the References section under entries [11]–[14] and [A1]–[A13] and are not proprietary.
§1.1 — The 6G Imperative
3GPP Releases 15–17 delivered the three pillars of 5G NR: eMBB (enhanced Mobile Broadband), URLLC (Ultra-Reliable Low-Latency Communications), and mMTC (massive Machine-Type Communications). These addressed the 2020 decade's connectivity needs. Yet the horizon of 2030 demands something qualitatively different: a network where artificial intelligence is not a post-hoc optimisation layer bolted on top of engineered signal-processing blocks, but a first-class participant in every link budget, every scheduling decision, and every interference mitigation event.
The International Telecommunication Union Radiocommunication Sector (ITU-R) codified this vision in Recommendation ITU-R M.2160 (11/2023) — Framework and overall objectives of the future development of IMT for 2030 and beyond, which established the technical and conceptual baseline for what is broadly termed 6G. The document's four overarching design principles — sustainability, security and resilience, connecting the unconnected, and ubiquitous intelligence — already signal that AI is not optional but structural.
Against this backdrop, IMT-2030 targets represent step-changes rather than incremental improvements over IMT-2020:
- Spectral efficiency: 1.5–3× over IMT-2020 average (targeting ~100 bps/Hz aggregate system SE with massive antenna arrays)
- Energy efficiency: ~100× improvement per bit delivered
- E2E user-plane latency: 0.1 ms air-interface target (vs. 1 ms URLLC floor in 5G)
- Peak DL rate: up to 1 Tbps in specific scenarios (ITU range: 50–200 Gbps user experienced)
- Positioning accuracy: 1–10 cm indoor, <1 m outdoor
- Connection density: 106–108 devices/km²
- AI-native network: AI/ML inference as a defined network function, not an add-on optimisation
The engineering implication is profound: classical signal-processing designs rest on tractable mathematical models (Gaussian channels, i.i.d. noise, linear precoding capacity bounds). AI-native 6G must operate reliably in the regime where these models break down — near-field propagation at sub-THz frequencies, extreme multipath in dense urban environments, and heterogeneous ISAC scenarios where the channel is simultaneously a communications medium and a sensing target.
§1.2 — Key 6G Usage Scenarios (ITU-R M.2160)
ITU-R M.2160 defines six usage scenarios for IMT-2030. Unlike the IMT-2020 triangle (eMBB / URLLC / mMTC), the IMT-2030 usage map is a hexagon, explicitly adding ISAC and AI/ML Communication as peer scenarios alongside classical broadband and IoT paradigms.
| Scenario | Full Name | Key Applications | AI/ML Requirement |
|---|---|---|---|
| IMMB | Immersive Mobile Broadband | Holographic telepresence, extended reality (XR), 8K 360° video, tactile internet | Predictive pre-fetching, view-dependent compression, AI-driven beamforming for Tbps links |
| HRLLC | Hyper-Reliable Low-Latency Comms | Tele-surgery, autonomous vehicles (V2X), industrial automation, smart grid control | AI-HARQ failure prediction, proactive resource reservation, learned reliability models |
| MC | Massive Communication | Industrial IoT, smart city sensors, AMI, environmental monitoring, livestock tracking | Traffic prediction, anomaly detection, ML-driven sleep scheduling for 10-year battery |
| UC | Ubiquitous Connectivity | Rural broadband via HAPS/satellite, maritime, aviation, underserved regions | AI-driven HAPS beam control, satellite-terrestrial handover prediction, coverage ML models |
| ISAC | Integrated Sensing & Communications | Vehicular radar, weather sensing, gesture recognition, simultaneous localization and mapping | Joint waveform optimization (AI-designed ISAC waveform), clutter suppression NN, target classification |
| AIAC | AI/ML Communication | AI model distribution OTA, federated learning over-the-air, semantic communications | AI is the payload: model transfer protocol, gradient compression, semantic encoding/decoding |
Of these, AIAC is novel to the IMT-2030 framework. It elevates AI from a network management tool to an explicit communication scenario: the network must support efficient transfer of trained models, gradients for federated learning, and semantically compressed data representations. This creates a feedback loop — AI improves the network, and the network carries AI.
§1.3 — 6G KPI Targets: IMT-2030 vs. IMT-2020
The table below compares the minimum technical performance requirements for IMT-2020 (as defined in ITU-R M.2410) against the IMT-2030 targets (ITU-R M.2160), with the AI enabler mechanism that bridges the gap identified for each KPI.
| KPI | IMT-2020 (5G NR) | IMT-2030 (6G) | Primary AI Enabler |
|---|---|---|---|
| Peak DL Rate | 20 Gbps | 1 Tbps (scenario-dep.) | AI near-field beamforming, sub-THz beam prediction |
| Spectral Efficiency | 30 bps/Hz (system) | ~100 bps/Hz (system) | AI-driven precoding, learned interference coordination |
| Energy Efficiency | Baseline (IMT-2020) | 100× per-bit improvement | ML sleep-mode prediction, traffic forecasting |
| User-plane Latency | 1 ms (URLLC) | 0.1 ms air-interface | Predictive pre-scheduling, proactive resource allocation |
| Reliability | 99.999% (5 nines) | 99.99999% (7 nines) | AI-HARQ, proactive link failure prediction |
| Positioning Accuracy | 10 cm (indoor, FR1) | 1–10 cm (indoor/outdoor) | AI fingerprinting, multi-modal sensor fusion |
| Connection Density | 106 devices/km² | 106–108 devices/km² | ML-based access control, grant-free scheduling |
| Mobility | 500 km/h | 500–1000 km/h | AI channel prediction for high-Doppler environments |
| Area Traffic Capacity | 10 Mbps/m² | 30–50 Mbps/m² | AI spatial reuse, 3D cell shaping |
| Coverage | ~99% (terrestrial) | 99.999% (incl. HAPS/LEO) | AI-driven HAPS beam steering, LEO handover ML |
[1] ITU-R M.2160 (11/2023) — Framework and overall objectives of the future development of IMT for 2030 and beyond. Geneva: ITU-R, 2023.
[2] ITU-R M.2410-0 (11/2017) — Minimum requirements related to technical performance for IMT-2020 radio interface(s). Geneva: ITU-R, 2017.
§1.4 — The AI-Native Principle
"AI-native" is a loaded term that risks meaning everything and nothing. In the 6G context, ITU-R M.2160 and the emergent 3GPP 6G study items give it a precise technical meaning: AI/ML inference is a specified network function with defined interfaces, lifecycle management, and fallback behaviour — not an implementation detail hidden inside a vendor's baseband DSP.
Levels of AI Integration
We can characterise AI integration into wireless systems at three levels of increasing architectural depth:
Level 1 — Algorithm Replacement
A classical DSP block (e.g., MMSE channel estimator, Viterbi detector) is replaced by a trained neural network with the same I/O interface. The surrounding protocol stack is unchanged. This is the dominant paradigm in 3GPP Rel-18 AI work (TR 38.843 UC3: NN-based DMRS channel estimation).
Advantage: backward-compatible, low standardisation cost.
Limitation: gains bounded by the original algorithm's envelope.
Level 2 — Parameter Optimisation
AI optimises configuration parameters of existing algorithms in real time: beamforming codebook selection, HARQ round-trip prediction, handover thresholds, sleep mode timers. The algorithm structure is fixed; AI chooses the operating point. Prevalent in Rel-17 RAN Intelligence (TR 37.817) and Rel-18/19 network automation.
Advantage: moderate standardisation cost, deployable incrementally.
Limitation: cannot escape sub-optimality of the underlying algorithm.
Level 3 — Protocol Redesign
AI drives the protocol structure itself: semantic communication replaces bit-pipe abstraction; joint source-channel coding replaces the separation theorem's idealization; inference-driven scheduling replaces HARQ-RTT-bounded link adaptation. This is the 6G Phase-1 (2028+) ambition.
Advantage: fundamental capacity gains, new application scenarios.
Limitation: requires complete re-standardisation; interoperability risk.
Level 0 (reference) — Monitoring / SON
AI operates purely in the OAM plane: fault detection, KPI anomaly, capacity planning. No real-time RAN interaction. This covers Rel-15/16 SON/MDT and remains commercially widespread in 5G SA deployments today.
Advantage: no air-interface standardisation needed.
Limitation: cannot affect per-slot or per-beam decisions.
The Fundamental Learning Objective
Regardless of the level, every AI/ML component in the network can be framed as solving a regularised empirical risk minimisation problem. For a model parameterised by \(\boldsymbol{\theta} \in \mathbb{R}^p\):
where:
- \(f_{\boldsymbol{\theta}}: \mathbb{R}^n \to \mathbb{R}^m\) is the learned function (e.g., a deep neural network mapping received pilot observations to channel estimates)
- \(\mathcal{L}\) is a task-specific loss (e.g., NMSE for channel estimation, cross-entropy for beam classification, negative log-likelihood for link state prediction)
- \(\mathcal{D}\) is the joint distribution of inputs and targets, which in wireless systems is non-stationary — it shifts with mobility, frequency, and environment
- \(\Omega(\boldsymbol{\theta})\) is a regulariser (L2 weight decay, spectral norm constraint) that controls overfitting
- \(\lambda\) is the regularisation coefficient, often tuned via validation on a held-out channel scenario
Key Technical Challenges for AI-Native 6G
| Challenge | Technical Description | 3GPP Work Item |
|---|---|---|
| Distribution shift | Model trained in scenario A performs poorly in scenario B; environment non-stationarity (mobility, frequency, time) | TR 38.858 (Rel-19): online model adaptation; TR 23.700-80: model lifecycle |
| Inference latency | NN inference must complete within slot duration (~125 µs for 120 kHz SCS); constrained by UE compute budget | TR 38.843: inference endpoint definition (network-side vs. UE-side) |
| Model size / OTA transfer | Transferring a 10 MB model over the air consumes significant DL overhead; compression and delta-update mechanisms needed | TR 22.874: model transfer requirements; SA1 requirements for model metadata |
| Explainability | Regulators and operators require interpretable decisions; black-box NN scheduling is unacceptable for safety-critical HRLLC | Open study; ETSI ENI GS ENI 010 explainability framework |
| Privacy / data governance | Training data (channel measurements, UE locations) is sensitive; federated learning needed to keep data on-device | TR 23.700-80: federated learning support; 3GPP SA3 security requirements |
[3] ITU-R M.2516 (07/2022) — Future technology trends of terrestrial IMT systems towards 2030 and beyond. Geneva: ITU-R, 2022.
[4] Industry Consortium, "A Research Outlook Towards 6G," White Paper, 2024 (updated). Intelligence Everywhere, Distributed Data Infrastructure, Autonomous Operations, AIaaS concepts.
The 6G vision establishes the why; → §2 traces the how — the 3GPP standardization roadmap from Rel-15 MDT through Rel-18/19/20 normative AI work items.
§2.1 — Release Timeline: From MDT to AI-Native
3GPP's approach to AI/ML has evolved through three distinct phases: (i) measurement and data collection infrastructure (Rel-15/16/17), (ii) AI/ML inference for specific RAN use cases (Rel-18/19, the "5G-Advanced" era), and (iii) AI-native air interface redesign (Rel-20 / 6G Phase-1, 2026–2030).
| Release | Freeze Year | Designation | Key AI/ML Items | Primary TRs / TSs |
|---|---|---|---|---|
| Rel-15 | 2018 | 5G NR Phase 1 | SON (Self-Organising Networks) baseline; MDT (Minimisation of Drive Tests) measurement collection; no dedicated AI/ML inference specs | TS 37.320 (MDT); TS 32.422 (SON) |
| Rel-16 | 2020 | 5G NR Phase 2 | NR V2X with ML-assisted handover candidate; IAB (Integrated Access & Backhaul); MDT enhancements; NWDAF (Network Data Analytics Function) introduced in 5GC | TS 37.320; TS 23.288 (NWDAF); TR 38.867 (V2X) |
| Rel-17 | 2022 | 5G NR Rel-17 | RAN Intelligence framework (TR 37.817): data collection architecture, KPM definition; NWDAF enhancements; Timing Resilience; RAN-CN coordination for AI inference split | TR 37.817 (RAN intelligence); TS 23.288 Rel-17; TR 28.908 (ANS) |
| Rel-18 | 2024 | 5G-Advanced Phase 1 | AI/ML for NR Air Interface (TR 38.843): beam management (UC1), CSI feedback compression (UC2), channel estimation (UC3); Network automation Level 3 (TS 28.xxx series); ISAC Phase 1 (TR 22.837); Ambient IoT; XR & NR-Light enhancements | TR 38.843; TR 22.874; TR 37.817 v18; TS 23.288 v18 |
| Rel-19 | 2025–2026 | 5G-Advanced Phase 2 | AI/ML enhancements for NR (TR 38.858): extended UC coverage, overhead reduction, model compression; Channel modeling for AI (TR 38.901 ext.); SLA-aware AI; Federated learning framework; ISAC Phase 2 | TR 38.858; TR 22.874 v19; TR 23.700-80 enhancements |
| Rel-20 | 2026–2027 | 6G Foundation | 6G system concept study; AI-native air interface feasibility; Semantic communications study item; Sub-THz channel modelling; AI-driven waveform adaptation | TR 38.8xx (6G RAN study); TR 22.8xx (6G SA1 requirements) |
| 6G Ph-1 | 2027–2028 | First 6G Standard | IMT-2030-compliant air interface; AI/ML as mandatory PHY/MAC function; ISAC Phase 3; Semantic comm. pilot specs; Native FL support | TS 38.xxx (6G NR); IMT-2030 compliance evaluation per M.2160 |
§2.2 — TR 38.843: AI/ML for NR Air Interface (Rel-18)
3GPP TR 38.843 ("Study on artificial intelligence (AI) / machine learning (ML) for NR air interface") is the pivotal Rel-18 document that translates IMT-2030 AI ambitions into concrete, evaluable use cases for the 5G NR air interface. It was completed in 2024 and constitutes the first time 3GPP RAN formally specifies how AI/ML inference is integrated into the NR physical layer — defining inference endpoints, model transfer mechanisms, and evaluation criteria. Three use cases (UCs) are studied in depth.
Problem: In FR2 (mmWave) deployments, the beam sweeping overhead from SSB and CSI-RS measurements can consume 10–15% of slot resources. Beam failure is a leading cause of call drops at cell edge.
AI/ML approach: A neural network observes historical RSRP/RSRQ reports from N preceding slots and predicts the best beam index for the next K slots (beam prediction), reducing the beam sweeping interval. A separate classifier monitors BLER and SINR trends to trigger early beam failure detection before the physical layer declares BFR.
Inference endpoint: Either (a) gNB-side inference using UE measurement feedback, or (b) UE-side inference using locally received RSRPs — TR 38.843 evaluates both and concludes that gNB-side inference is preferred for Rel-18 to minimise UE compute requirements.
Evaluation criteria: Beam prediction accuracy (top-1 and top-K), overhead reduction ratio, L1-RSRP prediction NMSE, false alarm and miss detection rates for beam failure.
Key finding: NN-based beam prediction reduces beam sweeping overhead by 30–60% while maintaining >95% top-1 prediction accuracy in clustered urban macro scenarios — under the 3GPP CDL-D channel model. Generalisation to CDL-C (NLOS) requires either separate model training or domain adaptation.
Problem: In FDD massive MIMO, the UE must measure the downlink channel and feed it back to the gNB. For a 32-port antenna panel, the raw CSI occupies hundreds to thousands of bits per subframe — a substantial uplink overhead burden that scales with antenna count.
AI/ML approach: A neural-network autoencoder (encoder at UE, decoder at gNB) compresses the full CSI matrix to a low-dimensional codeword. The encoder and decoder are trained jointly end-to-end to minimise reconstruction NMSE subject to a target bit budget (compression ratio η). CsiNet (Wen et al., 2018) established the canonical framework; subsequent models (CsiNet+, TransNet) add multi-rate training and transformer-based attention for long-range spatial-frequency correlation capture.
Inference endpoint: Encoder at UE (compress); decoder at gNB (reconstruct). Both sides must use a compatible model pair — introducing cross-vendor interoperability requirements not present in UC1 or UC3.
Evaluation criteria: NMSE of reconstructed channel vs. perfect CSI; beamforming gain loss relative to perfect CSIT; uplink feedback overhead in bits; UE encoder inference latency.
Key finding: TransNet achieves −15 dB NMSE at η=1/4 using ~80 bits, matching 3GPP Type II Enhanced codebook performance at approximately 5× lower feedback overhead (CDL-C, 32Tx, 13 subbands). Standardisation of the I/O interface (model ID, codeword format) is the primary Rel-18/19 outcome; the internal encoder/decoder architecture remains implementation-defined.
Problem: Classical MMSE channel estimation performance degrades in high-mobility (Doppler) and multi-path-dense scenarios. Pilot density cannot be increased arbitrarily due to spectral efficiency loss.
AI/ML approach: A convolutional or attention-based neural network takes DMRS pilot observations (received pilot tones across frequency and multiple OFDM symbols) and produces an enhanced channel estimate \(\hat{\mathbf{H}}\) across all data subcarriers and time symbols in the slot — effectively performing interpolation, denoising, and Doppler extrapolation simultaneously.
Inference endpoint: UE-side (DL channel estimation) or gNB-side (UL channel estimation from SRS). TR 38.843 focuses primarily on DL, at the UE.
Evaluation criteria: Normalised Mean Square Error (NMSE) of channel estimate vs. perfect-CSI baseline; BLER vs. SNR curves; computational complexity (FLOP/slot) relative to MMSE baseline.
Key finding: NN-based estimators achieve 2–4 dB NMSE gain over LS estimation and 0.5–1.5 dB gain over MMSE at high Doppler (300 km/h, CDL-B), with 10–30 GFLOP/slot at the UE — feasible for mid-range UEs in 2024 silicon.
TR 38.843 Framework Architecture
| Framework Element | UC1 (Beam Mgmt) | UC2 (CSI Feedback) | UC3 (Ch. Estimation) |
|---|---|---|---|
| Inference endpoint | gNB (preferred Rel-18) | UE (encoder); gNB (decoder) | UE (DL); gNB (UL) |
| Training data source | Historical RSRP reports (MDT) | Simulated CDL channels; paired encoder-decoder training | Simulated CDL channels; OTA fine-tuning |
| Model transfer | gNB → UE optional (UE-side variant) | Network → UE (encoder model via Uu); gNB decoder vendor-internal | Network → UE (Xn/Uu signalling) |
| Fallback mode | Classical codebook beam sweeping (TS 38.214) | Type II codebook feedback (TS 38.214) | MMSE/LS estimation (TS 38.211) |
| Open issues (Rel-19) | Generalisation to CDL-C/D; online adaptation; overhead of RSRP history signalling | Cross-vendor encoder/decoder mismatch; quantization loss; online adaptation | Model compression for low-end UEs; SRS-based UL variant |
[5] 3GPP TR 38.843 v18.0.0 — Study on artificial intelligence (AI) / machine learning (ML) for NR air interface. 3rd Generation Partnership Project, 2024.
§2.3 — TR 22.874: AI/ML Model Transfer Requirements (SA1)
3GPP TR 22.874 ("Study on traffic characteristics and performance requirements for AI/ML model transfer in 5GS") is the SA1 requirements document that defines what the system must support for AI model distribution — the "logistics layer" without which inference at the UE is infeasible.
Key requirements established by TR 22.874 include:
Model Metadata
- Model ID and version number (semantic versioning)
- Input/output tensor format description (shape, dtype, normalisation)
- Training dataset descriptor: channel model family, SNR range, UE speed, frequency band — enables the UE to assess model applicability
- Computational complexity declaration (FLOP count, memory footprint)
- Validity conditions: the environmental envelope within which the model is certified to meet performance targets
Model Lifecycle Management
- Model versioning and rollback: network must maintain N–1 model version for rollback when performance degrades below threshold
- Model activation/deactivation signalling: RRC/MAC-CE mechanisms to switch active model at the UE without service interruption
- Delta update support: transmit only changed weights (sparse diff) to reduce OTA overhead for incremental re-training
- Model compression: quantisation (INT8/INT4), pruning, knowledge distillation are supported to reduce transfer size
Fallback Behaviour
- Mandatory fallback to rule-based algorithm when AI model performance monitor detects degradation (NMSE > threshold, beam prediction accuracy < threshold)
- Fallback trigger: can be autonomous (UE-monitored) or network-commanded (gNB sends model deactivation)
- Fallback must complete within one slot (125 µs at 120 kHz SCS) to avoid service interruption in URLLC scenarios
Transport Requirements
- QoS class: AI model transfer uses a dedicated QFI (QoS Flow ID) with high-reliability, large-MTU transport
- Typical model size range: 100 KB – 50 MB depending on architecture; background transfer preferred to avoid latency spikes
- Unicast model delivery to UE; multicast delivery to UE groups sharing the same model (cell-specific beam management models)
[6] 3GPP TR 22.874 v18.2.0 — Study on traffic characteristics and performance requirements for AI/ML model transfer in 5GS. 3GPP SA WG1, 2024.
§2.4 — TR 23.700-80: AI/ML Architecture (SA2)
3GPP TR 23.700-80 ("Study on enablers for network automation for the 5G system — Phase 3") is the SA2 document that defines the system-level architecture for deploying AI/ML in the 5G core and RAN. It builds on the NWDAF (Network Data Analytics Function) introduced in Rel-16 and elevates it to a full AI/ML orchestration plane.
Architecture Elements
| Element | Location | Function | Rel. introduced |
|---|---|---|---|
| NWDAF | 5GC | Network data analytics; model training pipeline; analytics service exposure via Nnwdaf API | Rel-16 |
| MTLF | 5GC (NWDAF sub-function) | Model Training Logical Function: manages training jobs, dataset curation, model versioning | Rel-17 |
| AnLF | 5GC (NWDAF sub-function) | Analytics Logical Function: serves inference requests from consumers (AMF, SMF, PCF, gNB) | Rel-17 |
| AI/MLNF | 5GC / RAN boundary | AI/ML Network Function: generic inference server, model repository, lifecycle manager; introduced as architectural evolution in TR 23.700-80 | Rel-18/19 |
| gNB AI | gNB (gNB-CU / gNB-DU) | RAN-side inference: beam management (UC1), CSI feedback decoding (UC2), channel estimation (UC3); receives models via O1/E2 or Xn | Rel-18 (TR 38.843) |
| UE AI | UE | Device-side inference: DL channel estimation (UC3), beam prediction (UC1 UE variant), CSI feedback encoding (UC2); model stored in UE non-volatile memory | Rel-18 (TR 38.843) |
Federated Learning Support
A key architectural feature of TR 23.700-80 (Rel-19 target) is federated learning (FL) support — enabling model training without transmitting raw measurement data to the network:
- FL server (hosted in NWDAF/MTLF) initialises a global model \(\boldsymbol{\theta}^{(0)}\) and distributes it to participating UEs via the model transfer mechanism (TR 22.874)
- Each UE \(k\) computes local gradients \(\mathbf{g}_k = \nabla_{\boldsymbol{\theta}} \mathcal{L}_k(\boldsymbol{\theta})\) on its local channel measurements
- UEs transmit compressed gradients (or model diffs) \(\Delta\boldsymbol{\theta}_k\) to the FL server over the NR uplink
- The FL server performs federated averaging: \(\boldsymbol{\theta}^{(t+1)} = \boldsymbol{\theta}^{(t)} + \eta \sum_k w_k \Delta\boldsymbol{\theta}_k\) where \(w_k = |D_k| / \sum_j |D_j|\) is the weight proportional to dataset size
- Updated global model is pushed back to all UEs; iterate until convergence
Data Collection Architecture
Training data for network-side AI models flows through the existing MDT/RAN measurement infrastructure (TS 37.320), augmented in Rel-17/18:
- Immediate MDT: UE reports RSRP/RSRQ with GPS timestamp; high latency but rich channel information
- Logged MDT: UE buffers measurements during RRC_IDLE and uploads when connected; used for CSI fingerprint dataset construction (UC2/UC3 training)
- CQI/RI traces: gNB-logged MAC-layer KPMs; primary source for beam management training data (UC1)
- RAN PM (Performance Management): 3GPP TS 28.552 KPIs streamed to NWDAF via O1 interface for network-level AI model training
[7] 3GPP TR 23.700-80 v19.0.0 — Study on enablers for network automation for the 5G system — Phase 3 (Rel-19). 3GPP SA WG2, 2025.
[8] 3GPP TR 37.817 v17.0.0 — Study on enhancement for data collection for NR and ENDC. 3GPP RAN3, 2022.
§2.5 — Standardization Ecosystem
3GPP does not operate in isolation. The AI/ML for 6G standardisation effort is a multi-body endeavour, with different organisations owning different layers of the stack:
| Body | Scope | Key AI/ML Deliverables | Interface to 3GPP |
|---|---|---|---|
| 3GPP RAN WGs | NR air interface PHY/MAC/RRC | TR 38.843 (UC1/2/3); TR 38.858 (Rel-19 AI enhancements); Rel-20/6G study | — (primary body) |
| 3GPP SA WGs | System architecture, services, security | TR 22.874 (model transfer); TR 23.700-80 (AI/ML arch.); SA3 AI security | SA1→RAN (requirements); SA2→RAN3 (architecture) |
| ITU-R WP5D | IMT spectrum, framework, evaluation | M.2160 (IMT-2030 framework); M.2412 (channel models); M.2516 (tech trends) | 3GPP submits RIT to ITU-R for IMT-2030 evaluation (2027–2028) |
| ETSI ENI | Experienced Networked Intelligence (management plane) | GS ENI 001 (terminology); ENI 005 (architecture); ENI 010 (explainability); ENI 019 (closed-loop automation) | Provides AI management framework complementary to 3GPP OAM |
| O-RAN Alliance | Open RAN architecture, xApp/rApp ecosystem | Near-RT RIC (xApp AI: <1 s loop); Non-RT RIC (rApp AI: >1 s loop); O1/A1/E2 interfaces for ML model delivery | O-RAN specs reference 3GPP NR; O-RAN xApp model delivery uses TR 22.874 mechanisms |
| IEEE 802.11bf | Wi-Fi sensing (ISAC for WLAN) | WLAN sensing amendment; AI-based gesture/presence detection | Orthogonal to 3GPP NR; convergence expected in 6G heterogeneous ISAC |
O-RAN xApp/rApp AI Framework
The O-RAN Alliance's RIC (RAN Intelligent Controller) provides a parallel standardisation track for AI at the network management and resource orchestration layer — complementary to, and operationally integrated with, 3GPP's air-interface AI work:
- Non-RT RIC (rApp): AI inference loop >1 s; use cases include traffic prediction, interference coordination policy, mobility load balancing. Communicates with gNB via A1 interface (policy delivery).
- Near-RT RIC (xApp): AI inference loop 10 ms – 1 s; use cases include beam management optimisation, handover control, slice SLA assurance. Communicates with gNB via E2 interface (direct RAN control).
- Real-Time RIC (future): inference loop <10 ms; targets Rel-18 UC1 beam prediction and UC3 channel estimation; currently studied in O-RAN WG2 and 3GPP RAN3 AI split architecture.
[9] O-RAN Alliance, "O-RAN Architecture Description v07.00," O-RAN.WG1.AD-v07.00, 2023. Near-RT RIC, Non-RT RIC, A1/E2/O1 interfaces.
[10] ETSI GS ENI 010 v1.1.1 — Experiential Networked Intelligence (ENI): Explainability of AI-based network management. ETSI ENI ISG, 2024.
[11] 5G Americas, "5G-Advanced Overview," White Paper, 2024. AI/ML automation, energy efficiency, 6G evolution path.
The standardization roadmap sets the regulatory and specification context. → §3 begins the technical deep-dive, examining AI-based channel estimation — the first and most mature AI use case in the 3GPP TR 38.843 framework.
§3.1 — The Channel Estimation Problem
In 5G NR, the receiver uses Demodulation Reference Signals (DMRS) to estimate the radio channel before data detection. The gNB or UE observes a pilot-bearing received matrix
where \(\mathbf{H} \in \mathbb{C}^{N_r \times N_t}\) is the channel matrix, \(\mathbf{X}_p\) are known pilot symbols, and \(\mathbf{N}\) is additive white Gaussian noise with variance \(\sigma_n^2\). The goal is to recover \(\hat{\mathbf{H}}\) for all resource elements, including those carrying data.
Classical Estimators
Least-Squares (LS)
Simple, closed-form, no channel statistics needed. Noise-amplifying — NMSE floor limited by pilot SNR alone.
Linear MMSE (LMMSE)
Statistically optimal. Requires the channel covariance matrix \(\mathbf{R}_H = \mathbb{E}[\mathbf{h}\mathbf{h}^H]\), which varies with propagation environment.
Performance Metric
Reported in dB: \(\text{NMSE}_{\text{dB}} = 10\log_{10}(\text{NMSE})\). Lower is better. A gain of 3 dB in NMSE corresponds to halving the mean squared estimation error, directly improving PDSCH throughput.
§3.2 — Neural Network Channel Estimator
The AI channel estimator replaces the classical interpolation + MMSE smoothing block. Three broad architecture families are in active study:
1. Interpolation CNN (ChannelNet style)
Takes the sparse pilot observations \(\mathbf{Y}_p \in \mathbb{C}^{N_p \times N_f}\) placed on the DMRS grid and outputs a dense channel estimate across all resource elements:
Convolutional layers exploit local time-frequency correlation. Upsampling layers (bilinear or transposed convolution) fill the data REs from pilot REs. Works well in ETU and EPA channels.
2. Denoising DNN
Uses the LS estimate as input and applies a deep residual network to suppress noise:
Residual learning stabilizes training. The network learns the noise pattern, not the channel directly. Lower computational cost vs interpolation CNN.
3. Transformer-based Estimator
Self-attention over pilot positions captures long-range channel correlations that are difficult to encode in local convolutional kernels. Particularly effective for sparse pilot configurations and high-order MIMO (8+ layers).
5G NR DMRS Pilot Density
For DMRS Type 1 (Release 15 baseline): 6 pilot REs per PRB per DMRS symbol, interleaved with data. For a 132-PRB allocation with 1 DMRS symbol per slot:
- Pilot REs: 132 \times 6 = 792
- Total REs per slot: 132 \times 12 \times 14 = 22{,}176
- Interpolation factor: \approx 28\times
The CNN must interpolate across this 28× gap, exploiting both frequency-domain correlation (coherence bandwidth) and time-domain correlation (coherence time).
Training Loss and Procedure
Training data is generated from a 3GPP channel model (CDL-C, ETU, EPA) using a system-level simulator. Complex-valued inputs are represented as two real channels (real + imaginary stacked along the feature axis). Adam optimizer, learning rate \(10^{-3}\) with cosine decay.
Y_p [real/imag stacked]
→ Conv2D(32, 3×3) → BatchNorm → ReLU
→ Conv2D(32, 3×3) → BatchNorm → ReLU
→ Conv2D(32, 3×3) → BatchNorm → ReLU
→ Bilinear Upsample (×pilot_spacing)
→ Conv2D(16, 3×3) → BatchNorm → ReLU
→ Conv2D(16, 3×3) → BatchNorm → ReLU
→ Conv2D(2, 1×1) [linear]
→ Ĥ (complex channel per RE)
Input: real/imag stacked, shape [2, N_p, N_f]. Output: complex channel per RE, shape [2, N_sym, N_f]. Parameters: ~85K (compares to classical MMSE covariance matrix of ~N_f^2 \approx 1{,}584^2 entries).
NMSE vs SNR Performance (ETU-70)
Under the 3GPP ETU-70 channel model (Extended Typical Urban, 70 Hz Doppler), carrier frequency 3.5 GHz, subcarrier spacing 30 kHz (NR μ=1):
- At SNR = −5 dB: NN → −16.5 dB NMSE; MMSE → −14.5 dB; LS → −0.5 dB
- At SNR = 0 dB: NN → −21.5 dB; MMSE → −19.0 dB; LS → −5.5 dB
- At SNR = +10 dB: NN → −28.5 dB; MMSE → −26.5 dB; LS → −15.5 dB
- At SNR = +20 dB: NN → −35.2 dB; MMSE → −34.5 dB; LS plateaus ≈ −21 dB (interpolation error floor limits further improvement)
[3] 3GPP TR 38.843 v18.0.0 — Study on Artificial Intelligence (AI)/Machine Learning (ML) for NR Air Interface, §6.3 (Channel Estimation Use Case), 2024.
§3.3 — TR 38.843 Use Case 3 (Channel Estimation)
3GPP's TR 38.843 formally studies AI/ML for the NR air interface. Use Case 3 directly addresses AI-based channel estimation. Key findings from §6.3:
| Parameter | TR 38.843 Specification | Notes |
|---|---|---|
| Inference endpoint | UE-side or network-side | UL: gNB estimates; DL: UE estimates |
| Model input | DMRS measurements (pilot REs) | No waveform changes required |
| Model output | Channel estimate on data REs | Replaces legacy interpolation block |
| Standardization scope | Input/output interface only | Internal AI model not standardized |
| Signaling needed | Model ID, capability flag | RRC/MAC-CE based negotiation |
| Performance target | NMSE gain 1–3 dB vs MMSE | High-Doppler: v > 120 km/h |
| Evaluation channels | CDL-A/B/C/D/E, ETU, AWGN | Per TR 38.901 channel models |
| Fallback mechanism | Revert to MMSE if NMSE > threshold | Robustness requirement |
Evaluation Scenarios
TR 38.843 defines three evaluation scenarios for Use Case 3 (channel estimation):
- Scenario A — Indoor Hotspot (InH-Office): Low Doppler (pedestrian 3 km/h), CDL-A channel. Baseline methods already perform well; AI gain is modest (~1 dB).
- Scenario B — Urban Macro (UMa): Medium Doppler (30 km/h), CDL-C. Moderate AI benefit (~1.5–2 dB) due to intra-slot channel variation.
- Scenario C — High Mobility (V2X): High Doppler (120–500 km/h), ETU-70. Maximum AI benefit (~2–4 dB) because MMSE static-within-slot assumption breaks down.
§3.4 — Doppler and Multi-path Extension
The channel estimation challenge intensifies for high-mobility UEs (vehicular, high-speed rail, V2X). At 120 km/h and 3.5 GHz carrier frequency, the Doppler spread is:
For NR μ=1 (slot duration 0.5 ms), the channel coherence time \(T_c \approx 1/(4f_D) \approx 0.64\,\text{ms}\) is comparable to the slot duration. The classical quasi-static assumption — channel constant within one slot — fails.
Temporal Extrapolation via Recurrent Networks
Neural networks address this by learning the temporal evolution of the channel. A recurrent estimator (LSTM or temporal CNN) takes a history of past channel estimates and predicts the current slot:
where \(\mathbf{h}(t) \in \mathbb{C}^{N_r N_t}\) is the vectorised channel (stacked columns of H), and k is the prediction history length (look-back window).
This mirrors the classical Wiener-Hopf predictor:
where \(\mathbf{r}_{hh}(\delta)\) is the temporal autocorrelation vector (Jakes model for isotropic scattering) and \(\mathbf{R}_{hh}\) is the \(k \times k\) channel correlation matrix.
Multi-path Structure Learning
In frequency-selective channels, the impulse response consists of L discrete paths:
Exploiting this sparse delay-domain structure:
- Delay-domain NN: Transform pilot observations to delay domain via IDFT, apply sparse recovery (ISTA-Net), transform back. Effective when L \ll N_f.
- Angle-delay domain (massive MIMO): For large antenna arrays, channel is sparse in the angle-delay domain. 2D-CNN on angle-delay representation achieves near-oracle NMSE.
- Super-resolution: Off-grid path delay estimation via atomic norm minimization — AI version uses learned dictionaries.
§3.5 — Overhead and Deployment Considerations
Model Size and UE Feasibility
The ChannelNet architecture (~85K parameters) requires approximately:
- Storage: 340 KB (FP32) or 85 KB (INT8)
- Multiply-accumulate ops: ~15 M MACs per slot
- Inference latency: 0.1–0.5 ms on ARM Cortex-A75
- Comparable to legacy MMSE covariance update cost
Larger transformer-based estimators (300K–2M parameters) target gNB-side uplink estimation where compute constraints are relaxed.
Model Delivery Mechanism
TR 22.874 (Requirements for AI/ML management) defines a framework for over-the-air model transfer:
- Model transmitted via PDSCH (gNB → UE) or PUSCH (UE → gNB)
- Model identified by a Model ID signaled in RRC
- UE capability flag: "AI_CE_supported = 1"
- Model update triggered by network OAM on environmental change
- Delta updates possible (fine-tuning weights only)
Fallback and Robustness
A critical requirement for any deployed AI component is a graceful degradation path:
| Condition | Action | Trigger |
|---|---|---|
| NMSE > −5 dB (runtime) | Switch to MMSE fallback | Online NMSE monitoring |
| Model ID mismatch | Re-request model from gNB | RRC model negotiation failure |
| UE compute overload | Use LS estimator + gNB-side equalization | UE thermal throttling signal |
| No AI model loaded | Full MMSE (Rel-15 behavior) | Default state on power-up |
Summary: §3 Key Takeaways
- AI channel estimation replaces the MMSE interpolation block with a learned function that implicitly captures channel statistics.
- NMSE gains of 2–4 dB vs MMSE are achievable in high-Doppler scenarios (ETU-70, v > 120 km/h).
- Model size (~85K parameters, ~340 KB) is compatible with UE storage and inference latency budgets.
- 3GPP TR 38.843 Use Case 3 standardizes the I/O interface, leaving internal architecture to implementation choice.
- Fallback to MMSE/LS is mandatory for robustness; model ID negotiation via RRC enables multi-environment deployment.
Channel estimation forms the first stage of the AI radio pipeline. → §4 extends these ideas to CSI feedback compression, where the estimated channel must be encoded and reported back to the gNB.
§4.1 — The CSI Feedback Bottleneck
Massive MIMO beamforming requires the gNB to know the downlink channel matrix. In FDD, the UE must estimate the channel and report it back. The feedback overhead scales with antenna count — and in Release 15/16 massive MIMO configurations this becomes a significant resource burden.
Raw CSI Dimensionality
For a 32TRX panel (typical commercial deployment at sub-6 GHz):
In practice, 5G NR codebook feedback compresses this substantially, but still requires significant uplink resources:
| Feedback Type | Bits per Subband | Total Bits (13 SB) | NMSE | BF Gain |
|---|---|---|---|---|
| Type I Single Panel | 4–11 bits | 52–143 bits | −8 dB | 8–10 dB |
| Type II Basic | 10–16 bits | 130–208 bits | −12 dB | 12–14 dB |
| Type II Enhanced (Rel-16) | 16–32 bits | 208–416 bits | −14 dB | 14–16 dB |
| CsiNet (η=1/4) | ~100 bits total | −10 dB | 12 dB | |
| TransNet (2023) | ~80 bits total | −15 dB | 15 dB | |
§4.2 — CsiNet: The Autoencoder Framework
CsiNet (Wen et al., 2018) established the canonical deep-learning framework for CSI feedback compression. It frames the problem as a learned vector quantization via an autoencoder:
Encoder (UE side)
The encoder compresses the full channel matrix \(\mathbf{H} \in \mathbb{C}^{N_t \times N_r \times N_f}\) into a low-dimensional codeword \(\mathbf{c}\). The compression ratio is:
For N_t=32, N_r=1, N_f=13 and \(\eta = 1/4\): k = 2 \times 32 \times 1 \times 13 / 4 = 208 real values → quantized to ~100 bits.
Decoder (gNB side)
The decoder reconstructs the full channel estimate from the compressed codeword. Trained end-to-end:
The rate constraint is handled by training with a fixed k (bottleneck dimension) and applying post-training scalar quantization on \(\mathbf{c}\).
CsiNet Architecture Detail
H [2, N_t, N_f] (real/imag)
→ Conv2D(2, 3×3) → BN → LeakyReLU
→ Flatten [2 × N_t × N_f]
→ FC(k) [bottleneck]
→ c ∈ ℝ^k
CsiNet Decoder (gNB)
c ∈ ℝ^k
→ FC(2 × N_t × N_f) → Reshape [2, N_t, N_f]
→ [RefineNet Block] × 2
(Conv2D(8,3×3) → BN → ReLU → Conv2D(16,3×3) → BN → ReLU
→ Conv2D(2,3×3) → BN + skip connection)
→ Sigmoid → Ĥ
Total parameters: ~2.1 M (encoder 0.3 M + decoder 1.8 M)
Performance Progression: CsiNet → CsiNet+ → TransNet
| Model | Year | η = 1/32 NMSE | η = 1/16 NMSE | η = 1/8 NMSE | η = 1/4 NMSE | Key Innovation |
|---|---|---|---|---|---|---|
| CsiNet | 2018 | −6.0 dB | −8.0 dB | −9.5 dB | −10.0 dB | Baseline autoencoder + RefineNet |
| CsiNet+ | 2022 | −8.5 dB | −11.0 dB | −12.5 dB | −14.0 dB | Multi-rate training + dense connections |
| TransNet | 2023 | −10.0 dB | −12.5 dB | −14.0 dB | −15.0 dB | Transformer encoder + cross-attention decoder |
| Type II Enhanced | Rel-16 | (fixed overhead, not variable η) | −14.0 dB | 3GPP codebook baseline | ||
§4.3 — 3GPP Type II vs AI CSI: Full Comparison
| Method | Bits/Slot (13 SB) | NMSE | BF Gain | UL Overhead | UE Complexity |
|---|---|---|---|---|---|
| Type I Single Panel | 52–143 bits | −8 dB | 8–10 dB | Low | Very low |
| Type II Basic | 130–208 bits | −12 dB | 12–14 dB | Moderate | Low |
| Type II Enhanced | 208–416 bits | −14 dB | 14–16 dB | High | Moderate |
| CsiNet (η=1/4) | ~100 bits total | −10 dB | 12 dB | Very low | Moderate (encoder) |
| CsiNet+ (η=1/4) | ~100 bits total | −14 dB | 14 dB | Very low | Moderate |
| TransNet (2023) | ~80 bits total | −15 dB | 15 dB | Very low | High (transformer) |
Practical Throughput Impact
The UL feedback overhead directly reduces DL capacity. For a 20 MHz uplink (30 kHz SCS, 51 PRBs):
- Type II Enhanced (416 bits) consumes ~2.3 PRBs per slot purely for CSI feedback — approximately 4.5% of UL capacity.
- AI CSI (80–100 bits) consumes <0.6 PRBs — <1.2% of UL capacity.
- The freed UL resources can carry data, SRS, or additional reference signals, yielding a 3–4× reduction in feedback overhead at equivalent reconstruction quality.
§4.4 — Transformer-based CSI Compression (TransNet)
CsiNet's convolutional encoder treats the channel matrix as a 2D image. This misses long-range correlations across the subband dimension (frequency coherence) and the port dimension (spatial coherence in large aperture arrays). TransNet introduces self-attention to capture these global dependencies.
Attention Mechanism Review
For CSI feedback, the queries, keys, and values are derived from the channel matrix as follows:
- Q = f_Q(\mathbf{H}_{\text{subband},\,i}) — query: representation of subband i
- K = V = f_{KV}(\mathbf{H}_{\text{all subbands}}) — keys/values: full channel across all subbands
This allows each subband's encoding to attend to all other subbands, learning the frequency correlation structure that classical codebooks approximate with a fixed DFT basis.
TransNet Architecture
H [2, N_t, N_f] (real/imag stacked)
→ Linear Embedding → token sequence [N_f tokens × d_model]
→ Positional Encoding (subband index)
→ Transformer Encoder Layer × L_e
(Multi-Head Self-Attention [8 heads, d_k=64]
→ LayerNorm + skip → FFN(d_ff=256) → LayerNorm + skip)
→ [CLS] token extraction → FC(k) → c ∈ ℝ^k
TransNet Decoder (gNB)
c ∈ ℝ^k
→ FC(N_f × d_model) → Reshape [N_f × d_model]
→ Transformer Decoder Layer × L_d
(Cross-Attention [c as query, latent as key/value]
→ FFN → LayerNorm)
→ Linear → Ĥ [2, N_t, N_f]
Parameters: ~4.2 M (encoder 1.8 M + decoder 2.4 M). Inference: ~2 ms on Snapdragon 888 (UE encoder only).
Multi-Head Attention over Port Dimension
For massive MIMO (N_t ≥ 32), an additional attention head is applied across the port (antenna) dimension:
This captures spatial correlation across the antenna array — the beam domain structure — without requiring explicit DFT pre-coding to an angle domain representation.
NMSE vs Compression Ratio
Key performance comparison across compression ratios \(\eta \in \{1/32,\,1/16,\,1/8,\,1/4\}\):
§4.5 — 3GPP Standardization (TR 38.843 §6.2)
TR 38.843 §6.2 evaluates AI/ML for CSI feedback enhancement as Use Case 2. The standardization scope is broader than Use Case 3 (channel estimation) because the feedback traverses the air interface: the encoder (UE) and decoder (gNB) are implemented by different vendors, creating a need for interoperability specification.
3GPP Architecture Options
| Architecture | Encoder Side | Decoder Side | Standardization | Status |
|---|---|---|---|---|
| Option 1 | UE (AI) | gNB (AI) | I/O interface + model ID | Primary candidate |
| Option 2 | UE (AI) | gNB (legacy codebook) | Encoder output = legacy codeword | Backward-compatible |
| Option 3 | UE (legacy) | gNB (AI decoder only) | No UE changes needed | Incremental upgrade path |
Open Issues Identified in TR 38.843
1. Encoder specification:
2. Decoder standardization:
Two approaches under discussion in Rel-18:
- Standardized decoder: A reference gNB decoder is specified in the standard. UEs must train their encoders against this reference. Ensures interoperability; limits decoder innovation.
- Signaled model pair: The network signals a Model ID pair (encoder ID + decoder ID) to the UE via RRC. The UE downloads the encoder; the gNB uses the paired decoder. Flexible but requires model management infrastructure.
3. Online learning and adaptation:
Models trained offline on synthetic CDL channels may mismatch real deployment channels. TR 38.843 proposes:
- Online fine-tuning: UE collects CSI samples during operation, performs gradient updates to the encoder using feedback from gNB (requires a new feedback loop for gradient or loss information).
- Model selection: UE chooses from a library of pre-trained encoders (indexed by environment type: indoor/outdoor, LoS/NLoS) based on detected channel statistics.
- Meta-learning: Encoder trained to be quickly fine-tunable with few environment-specific samples (MAML-style approach).
4. Quantization and entropy coding:
The continuous bottleneck vector \(\mathbf{c}\) must be quantized for transmission. Two approaches:
- Scalar quantization: Each element of \(\mathbf{c}\) quantized to b bits independently. Simple, adds ~1–2 dB NMSE loss vs unquantized.
- Vector quantization (learned codebook): VQ-VAE style — jointly optimize encoder + VQ codebook. Achieves near-unquantized performance but higher UE complexity.
RRC Signaling for AI CSI
The following new information elements are under study for Rel-18/19:
| IE Name | Layer | Content |
|---|---|---|
ai-CSI-Config |
RRC | AI CSI enabled flag, model ID, encoder resolution (k) |
ai-Model-Request |
RRC | UE requests model download; specifies environment index |
ai-CapabilityReport |
RRC | UE reports max k, supported model IDs, inference latency |
ai-CSI-Feedback |
PUCCH/PUSCH | Quantized codeword c (B_target bits) |
[4] 3GPP TR 38.843 v18.0.0, §6.2 — CSI Feedback Enhancement Use Case, 2024.
[5] W. Wen et al., "Deep Learning for Massive MIMO CSI Feedback," IEEE Wireless Communications Letters, vol. 7, no. 5, pp. 748–751, 2018.
§4.6 — Summary: §3–4 Key Takeaways
§3 Channel Estimation
- AI replaces MMSE interpolation; no waveform changes.
- NMSE gains: 2–4 dB at low SNR and high Doppler (ETU-70, v > 120 km/h).
- Model size feasible for UE: ~85K params, ~340 KB FP32, <0.5 ms inference.
- 3GPP TR 38.843 Use Case 3 standardizes I/O interface only; model architecture is implementation-defined.
- Mandatory fallback to MMSE/LS when NMSE exceeds threshold.
§4 CSI Feedback Compression
- AI autoencoder (CsiNet/TransNet) compresses 32Tx CSI to 80–100 bits vs 208–416 bits for Type II Enhanced.
- TransNet achieves −15 dB NMSE at η=1/4, matching or exceeding Type II Enhanced at 5× lower overhead.
- Core challenge: encoder (UE) and decoder (gNB) are from different vendors — model mismatch requires standardized decoder or Model ID negotiation.
- Online adaptation essential for non-stationary real environments.
- Rel-18/19 normative work targets first interoperable AI CSI mode.
References for §3–4
- 3GPP TR 38.843 v18.0.0, §6.3 — Channel Estimation Use Case, 2024.
- 3GPP TR 38.843 v18.0.0, §6.2 — CSI Feedback Enhancement Use Case, 2024.
- W. Wen et al., "Deep Learning for Massive MIMO CSI Feedback," IEEE Wireless Communications Letters, vol. 7, no. 5, pp. 748–751, 2018.
- J. Guo et al., "Convolutional Neural Network-based Multiple-rate Compressive Sensing for Massive MIMO CSI Feedback: Design, Simulation, and Analysis," IEEE Transactions on Wireless Communications, vol. 20, no. 4, pp. 2827–2840, 2021.
CSI compression solves the feedback bottleneck on the uplink; the resulting beam selection and precoding are applied on the downlink. → §5 examines how AI-based beam management exploits this CSI to predict optimal beams and handle blockage events.
§5 AI-based Beam Management
§5.1 The Beam Management Challenge
Millimetre-wave (FR2) and sub-6 GHz massive-MIMO deployments in 5G NR rely on a codebook of narrow beams to overcome the high path-loss at these frequencies. The base station (gNB) and UE must continuously search for, and track, the best-aligned beam pair — a process that consumes significant time and energy as the number of antennas scales.
5G NR Beam Sweeping — How It Works Today
- SSB burst: the gNB transmits Synchronization Signal Blocks (SSBs) across all beam directions in a fixed burst set. For FR2 the maximum is 64 SSBs per 20 ms half-frame; a 32-beam grid occupies 32 OFDM symbols per sweep.
- RSRP measurement: the UE measures L1-RSRP for every received SSB and reports the index of the strongest beam together with its measurement value.
- Overhead cost: with 32 beams, 32 consecutive SSB transmissions are required every 20 ms — even when the channel is stable and no beam change is needed. At 64 beams the sweep overhead grows proportionally.
- Beam failure: sudden LOS blockage (a pedestrian crossing, a vehicle turning a corner) causes the serving beam RSRP to drop below threshold. The UE must then trigger Beam Failure Recovery (BFR), adding 10–50 ms of outage before a new beam is acquired.
5G P1–P2–P3 Procedure
Beam management in 5G NR is a three-phase procedure defined in 3GPP TS 38.214:
gNB transmits SSB burst across all beams; UE selects best TX beam and reports L1-RSRP. Periodicity: typically 20 ms.
Narrower CSI-RS beam refinement within the P1 winner neighbourhood. Finer angular resolution; periodicity 5–10 ms.
P3 — Beam Tracking — UE-side beam tracking using aperiodic CSI-RS triggered after detected mobility. The full P1→P2→P3 chain is purely reactive: the system responds to a degraded measurement after blockage has occurred.
6G Target: Predictive Beam Management
6G beam management aims to anticipate blockage before it happens. By training a neural predictor on past RSRP time series — and optionally on sensing side-information — the system can pre-switch to the next best beam proactively, eliminating BFR latency and reducing sweep overhead.
§5.2 Beam Prediction — TR 38.843 Use Case 3
3GPP TR 38.843 (Rel-18) defines Use Case 3 (UC3) as AI/ML-assisted beam management. The study evaluates neural predictors that consume a window of past L1-RSRP measurements and output a predicted best-beam index one to four slots in the future.
Model Input / Output
Input vector at time t:
xt = [ r(t-k), r(t-k+1), …, r(t-1), r(t) ] ∈ ℝNB×(k+1)
where r(t) is the vector of RSRP measurements across all NB beams at slot t, and k is the look-back window (typically 4–8 slots).
Output: predicted best-beam index at t+δ (δ = 1…4 slots).
LSTM-based Predictor Architecture
The baseline architecture in TR 38.843 evaluations is a single-layer LSTM followed by a linear classification head:
The hidden state ht ∈ ℝ64 captures temporal correlation in the RSRP time series. A Transformer variant replaces the LSTM with multi-head self-attention over the input window, providing better long-range dependency modelling at slightly higher complexity.
Training Loss
A composite loss is used to simultaneously optimise beam-classification accuracy and RSRP prediction quality:
The first term is the standard cross-entropy classification loss over NB beam classes. The second term penalises RSRP prediction error, encouraging the hidden representation to encode channel quality faithfully. λ ≈ 0.1 balances the two tasks.
TR 38.843 Evaluation Results
| Metric | δ = 1 slot | δ = 2 slots | δ = 4 slots |
|---|---|---|---|
| Top-1 beam accuracy | 72–85 % | 65–78 % | 55–68 % |
| Top-3 beam accuracy | 90–95 % | 85–92 % | 78–87 % |
| UE energy saving | ~35 % | ~30 % | ~22 % |
| Latency saving (skip P1) | 2–4 ms | 1.5–3 ms | 0.5–1.5 ms |
| Model size (parameters) | ~10 K (LSTM) / ~25 K (Transformer) | ||
The top-3 accuracy of 90–95 % means that the true best beam is in the predicted shortlist with very high probability — allowing the system to skip the full P1 sweep and only test 3 candidate beams instead of 32, a 10× reduction in sweep overhead.
§5.3 Blockage Prediction with Side Information
Pure RSRP-based prediction cannot anticipate blockage events that have not yet affected the measured channel — e.g. a fast-moving obstacle entering the Fresnel zone for the first time. Multi-modal fusion addresses this by incorporating sensing side-information alongside RF measurements.
Side Information Sources
- Radar point cloud: mmWave radar returns detected at gNB or UE side provide 3-D velocity vectors of nearby scatterers/obstacles.
- Camera / LiDAR: object detection bounding-boxes give object distance, trajectory, and classification (pedestrian, vehicle).
- GPS / IMU trajectory: UE-reported velocity and heading allow dead-reckoning of future UE position relative to beam grid.
- DeepMIMO dataset: publicly available ray-tracing dataset (Alkhateeb et al.) used in multiple 3GPP TR 38.843 evaluation contributions; provides ground-truth channel + UE position for supervised training.
Blockage Probability Model
A binary classifier predicts the probability that LOS will be blocked at time t+δ given the fused input sequence:
where rt−k:t is the windowed RSRP sequence, st−k:t is the optional side-information vector (radar returns, GPS velocity, object-detection features), σ is the sigmoid activation, and fθ is the trained neural network (LSTM or Transformer body).
Joint Beam + Blockage Decision Logic
- At each slot, compute Pblock(t+δ) from multi-modal input.
- If Pblock > threshold τB (e.g. 0.7): trigger proactive beam switch to top-3 predicted candidates without waiting for RSRP drop.
- If Pblock < τB: remain on current beam; suppress P1 sweep (energy saving mode).
- If prediction confidence is low: fall back to classical P1–P2–P3 procedure.
§5.4 Standardization Impact — TR 38.843
3GPP TR 38.843 was the primary Rel-18 study item for AI/ML-based air-interface enhancements. Beam management (UC3) was the first use case to reach conclusion status, providing key architectural decisions that will feed Rel-19 normative work.
Agreed Architectural Elements
Inference endpoint:
- UE-side inference: UE runs beam predictor locally; requires model transfer from gNB via PDSCH + RRC configuration IE.
- gNB-side inference: UE reports raw L1-RSRP measurements; gNB performs prediction and signals result via DCI or MAC-CE.
- Both options agreed for study; Rel-19 to decide normative split.
Input feature set:
- L1-RSRP (mandatory) — per-beam, per-slot window.
- L3 measurement reports (optional) — filtered RSRP/RSRQ.
- UE velocity (optional) — from UE capability reporting.
- Time-domain index within SSB burst (optional).
Output format:
- Top-K beam indices (K = 1, 2, or 3).
- Per-beam confidence score (quantised, e.g. 3-bit).
- Prediction horizon δ configured by gNB.
Signaling:
- No new over-the-air signal required — existing measurement reports and DCI formats reused.
- Model transfer: gNB pushes model binary via PDSCH; model ID referenced in RRC reconfiguration message.
- Model lifecycle: activate / deactivate / update controlled by gNB RRC.
3GPP 3GPP TR 38.843 §6.2 — AI/ML for NR Air Interface, Beam Management Use Case (UC3). Study concluded Rel-18; normative impact expected Rel-19 (TS 38.214 amendments).
Beam Prediction Accuracy vs UE Velocity
The chart below illustrates how prediction accuracy degrades with UE velocity for the three approaches: classical P2 (no AI), LSTM predictor, and Transformer predictor (δ = 1 slot).
§5.5 6G Extensions — ISAC-aided Beam Management
6G introduces Integrated Sensing and Communications (ISAC) as a first-class physical layer function. The same OFDM waveform simultaneously carries data and performs radar sensing within the same time-frequency resource, enabling an entirely new category of AI beam management.
ISAC Frame Structure
- Dual-function waveform: PDSCH symbols carry both user data and radar-sensing phase codes; the gNB processes the reflected echo from the same transmission — no dedicated radar waveform needed.
- Sensing-aided beam predictor: the gNB constructs a point cloud of nearby objects from round-trip Doppler/range profiles, and feeds this into the AI beam predictor as the side-information vector s described in §5.3.
- Beamforming direction error: joint radar + ML achieves a predicted beam-direction error below 1° RMS in simulation studies, compared to ~5° for RSRP-only LSTM.
6G Beam Management Targets (IMT-2030)
| Parameter | 5G NR (achieved) | 6G Target (IMT-2030) |
|---|---|---|
| Beam sweep overhead | ~5–10 % of frame | < 1 % (AI-suppressed sweeps) |
| Beam failure recovery latency | 10–50 ms (reactive BFR) | < 1 ms (proactive switch) |
| Max supported UE velocity | 500 km/h (HST) | 1000 km/h (hypersonic) |
| Beam direction prediction error | N/A | < 1° (ISAC + ML) |
| UE power saving vs 5G P1 | — | 35–50 % |
[6] 3GPP TR 38.843 v18.0.0, §6.2 — Beam Management Use Case (UC3): AI/ML-based beam prediction, model transfer, and inference endpoint architecture.
AI beam management naturally complements AI positioning. → §6 covers how fine-grained location estimates feed back into beam selection and link adaptation.
§6 AI-based Positioning Enhancement
§6.1 5G Positioning — Limitations and Baseline
3GPP Release 16 introduced a dedicated positioning layer for 5G NR, standardised in TS 38.305 and studied in TR 38.855. Release 17 further refined the techniques and tightened accuracy requirements. Despite these advances, the current architecture faces fundamental physical-layer challenges that AI/ML can substantially address.
5G NR Positioning Methods (Rel-16/17)
| Method | Principle | Direction | Accuracy (3GPP Req.) |
|---|---|---|---|
| DL-TDOA | Downlink Time Difference of Arrival across multiple TRPs | Downlink | 3 m outdoor / 0.5 m indoor |
| UL-TDOA | Uplink TDOA — network measures UE SRS across TRPs | Uplink | 3 m outdoor / 0.5 m indoor |
| DL-AoD | Downlink Angle of Departure from gNB antenna array | Downlink | 5 m typical |
| UL-AoA | Uplink Angle of Arrival at multi-antenna TRP | Uplink | 5 m typical |
| Multi-RTT | Round-Trip Time measurements from multiple TRPs | Bi-directional | 1–3 m |
The NLOS Problem
All geometry-based methods (TDOA, AoD, AoA, RTT) assume that signal propagation paths are line-of-sight or can be accurately modelled. In practice, rich multipath environments (indoor corridors, dense urban canyons) cause Non-Line-of-Sight (NLOS) propagation, where the first detectable signal path has bounced off walls, floors, or obstacles before reaching the receiver.
- NLOS timing error: excess path length Δd introduces a timing bias Δτ = Δd/c, translating to 3–15 m range error even with picosecond synchronisation.
- NLOS angle error: the apparent angle of arrival shifts by 10–30° from the true UE direction, rendering AoD/AoA unreliable in indoor factories.
- Geometry diversity: dense-urban deployments often have poor TRP geometry (all TRPs on rooftops in one direction), causing ill-conditioned TDOA hyperboloids.
§6.2 Fingerprinting-based AI Positioning
Radio fingerprinting treats the measured channel (CSI, RSRP, CIR) as a unique spatial signature of a physical location. A neural network is trained offline to map these signatures to 3D coordinates, implicitly encoding NLOS geometry into learned feature representations.
Two-Phase Fingerprinting Pipeline
Phase 1 — Offline Training (Database Construction):
- Survey team (or automated robot) visits Ncal calibration positions {p1, …, pN} with known ground-truth coordinates.
- At each position, collect M channel snapshots: {H(fk, m)} for k = 1…K subcarriers, m = 1…Na antennas.
- Construct dataset {|Hi|², pi}i=1N·M and train neural network.
Phase 2 — Online Inference:
- UE measures current channel |H(fk)|² and reports to gNB (or runs inference locally).
- Trained NN maps measured feature vector to estimated position (x̂, ŷ, ẑ).
- Optional: uncertainty quantification output flags low-confidence estimates for fallback to TDOA.
CNN Architecture for CSI Fingerprinting
The channel amplitude matrix |H(f, a)|² ∈ ℝNa × K (antennas × subcarriers) exhibits spatial correlation analogous to a 2D image — frequency selectivity along one axis, antenna-domain spatial variation along the other. A Convolutional Neural Network (CNN) exploits this structure:
- Input layer: |H|² ∈ ℝNa × K — channel amplitude per antenna per subcarrier.
- Conv block 1: 32 filters, 3×3 kernel, ReLU — extract local frequency-antenna correlation patterns.
- Conv block 2: 64 filters, 3×3 kernel, ReLU + max-pool — spatial downsampling.
- Conv block 3: 128 filters, 3×3 kernel, ReLU — high-level spatial feature maps.
- Global average pooling + FC(256): reduce to 256-dim feature vector.
- Output FC(3): predict (x, y, z) in metres; linear activation.
Positioning Loss Function
The training objective combines Euclidean distance minimisation with an NLOS regularisation penalty:
The NLOS penalty term discourages the network from over-fitting to NLOS-corrupted training samples. It can be implemented as a consistency loss: samples from neighbouring calibration positions should produce smoothly varying predicted positions (Lipschitz regularisation on the output manifold). The weighting coefficient α ≈ 0.05–0.2 is tuned per deployment.
§6.3 TR 38.843 Positioning Use Case — UC2
3GPP TR 38.843 Use Case 2 (UC2) evaluates AI/ML-enhanced positioning in the 3GPP Indoor Factory (InF) scenario — a representative deployment for Industry 4.0 automation where sub-meter accuracy is operationally required.
Evaluation Scenario Parameters
| Parameter | Value | Notes |
|---|---|---|
| Numerology (μ) | 1 | 30 kHz SCS |
| Bandwidth | 100 MHz | FR1 n78 band |
| Allocated PRBs | 132 | Full 100 MHz allocation |
| BS antenna ports | 32 (64 physical) | Dual-polarised 16×2 UPA |
| Channel feature input | |H|² ∈ ℝ32 × 132 | 4224 input features per snapshot |
| CNN output dimension | 256-dim vector | Before final position head |
| Calibration dataset | 1000 positions × 10 snapshots | 10 000 training samples total |
| Inference output | (x, y, z) ∈ ℝ3 | 3-coordinate absolute position |
Positioning Accuracy Results
| Method | Scenario | Error @ CDF 50 % | Error @ CDF 90 % | vs Requirement |
|---|---|---|---|---|
| DL-TDOA (Rel-16 baseline) | InF LOS | 0.6 m | 1.2 m | Fails 0.5 m req. @90% |
| DL-TDOA (Rel-16 baseline) | InF NLOS | 1.8 m | 4.5 m | Fails by 9× |
| AI CNN Fingerprint | InF LOS | 0.15 m | 0.32 m | Meets 0.5 m req. |
| AI CNN Fingerprint | InF NLOS | 0.22 m | 0.48 m | Meets 0.5 m req. |
| Hybrid TDOA + AI | InF Mixed | 0.18 m | 0.41 m | Best overall |
Key finding: massive antenna arrays (32+ ports) provide sufficient angle diversity to make the fingerprint nearly unique at sub-50 cm resolution. The 4224-dimensional input feature space |H|² ∈ ℝ32×132 encodes both frequency-selective multipath structure and spatial beam-domain patterns — a combination that classical TDOA geometry cannot exploit.
§6.4 6G Positioning — Sub-centimetre Target
6G IMT-2030 requirements push positioning accuracy by an order of magnitude beyond 5G NR, targeting applications that 5G cannot support: factory robot arms, surgical tool tracking, and vehicular lane-level positioning.
6G Positioning Requirements
| Use Case | Environment | Accuracy Target | Dimension |
|---|---|---|---|
| Factory Automation | Indoor | < 10 cm | 3D + Orientation |
| Surgical Robotics | Indoor | < 1 cm | 6-DoF |
| V2X Lane-Level | Outdoor | < 0.5 m | 3D |
| UAV/Drone Traffic | Aerial | < 1 m | 3D + Heading |
| Extended Reality (XR) | Indoor | < 5 cm | 6-DoF |
Physical Layer Enablers for 6G Positioning
Sub-THz Bands (100–300 GHz):
The spatial resolution of any angle-based positioning method is fundamentally limited by the wavelength λ. At 140 GHz (D-band), λ = 2.1 mm — more than 100× smaller than sub-6 GHz. This translates directly to:
- Angular resolution of a 64-element array: Δθ ≈ λ/(N·d) → 0.017° at 140 GHz vs 1.1° at 3.5 GHz for equivalent aperture.
- Range resolution: with 10 GHz bandwidth, ΔR = c/(2BW) = 1.5 cm.
- Shorter wavelength → denser spatial sampling in the fingerprint feature space.
Large Intelligent Surfaces (LIS / RIS):
Reconfigurable Intelligent Surfaces (RIS) create a synthetic aperture effect by reflecting signals from hundreds of passive phase-shifting elements. For positioning, an RIS of area A at distance d from the UE achieves an effective angular resolution equivalent to a physical antenna array of aperture A. AI learns to optimise the RIS phase profile for maximum position-discriminability rather than signal strength.
Joint AI Positioning + ISAC:
The same ISAC waveform used for beam management (§5.5) simultaneously acts as a positioning radar. The gNB extracts both the communication channel estimate H (for fingerprinting) and the radar echo (for geometry estimation), fusing both into a joint AI positioning engine.
Cramér-Rao Lower Bound (CRLB) for AI Positioning
The fundamental limit on position estimation error — regardless of algorithm — is set by the Fisher information of the observed signal. For a wideband channel with bandwidth BW and Na receiving antennas at SNR:
This expression reveals three independent scaling levers for 6G:
- Bandwidth BW ↑: 10 GHz at sub-THz reduces the CRLB range floor to ~1.5 cm (vs ~150 cm for 100 MHz at sub-6 GHz).
- Antenna count Na ↑: 1024-element LIS array improves CRLB by ~5.7× vs 32-element 5G array (scales as √(Na): √1024/√32 ≈ 5.7).
- SNR ↑: improved link budget (beamforming gain, lower noise figure at sub-THz amplifiers) directly tightens the bound.
- AI role: the neural estimator approaches the CRLB in multipath/NLOS environments where classical MLE cannot, by learning the full posterior distribution of position given channel observations.
Positioning Accuracy: 5G vs 6G Methods
The chart below compares the 90th-percentile positioning error (metres) across the key positioning paradigms from 5G to 6G, illustrating the progressive improvement enabled by AI and new physical-layer features.
Note the logarithmic y-axis: ISAC-AI achieves ~0.02 m (2 cm) in LOS — a 60× improvement over the DL-TDOA 5G baseline. Even in NLOS, the 6G ISAC-AI result (0.06 m) surpasses the 5G LOS DL-TDOA baseline (1.2 m) by 20×, demonstrating that AI + bandwidth + antenna aperture jointly break the classical NLOS barrier.
§6.5 Open Challenges in AI Positioning
Despite the impressive results from TR 38.843 and simulation studies, several engineering challenges remain before AI positioning can be deployed at 6G scale.
Calibration Data Burden
Fingerprinting requires an offline site survey: 1000 calibration positions is feasible for a single factory floor but scales poorly across tens of thousands of deployment cells. Active research directions:
- Transfer learning: pre-train on ray-tracing simulations (DeepMIMO, COST-2100 channel model) then fine-tune with only 50–100 real measurements per cell.
- Self-supervised learning: use UE-reported positioning from GPS / barometer as weak labels to continuously update the fingerprint database without explicit site surveys.
- Generative augmentation: train a conditional GAN/VAE to synthesise additional {channel, position} pairs from sparse calibration data.
Model Adaptation to Environment Change
Physical environments are not static: furniture is rearranged, new machinery is installed, seasonal changes alter multipath. A fingerprint trained in January may degrade by 50% by July in a dynamic factory. Online continual learning with forgetting prevention (Elastic Weight Consolidation, experience replay) is required to maintain accuracy without full retraining.
Privacy and Security
Latency and Signaling Overhead
Positioning inference must complete within the application latency budget:
- Factory robot control loop: < 5 ms end-to-end position update.
- CNN inference time: ~1–3 ms on embedded NPU; acceptable for Rel-19 targets.
- Feature reporting overhead: |H|² ∈ ℝ32×132 = 4224 floats ≈ 16.5 kB per snapshot. Compressed reporting (PCA projection to 64-dim) reduces to ~0.5 kB.
[7] 3GPP TR 38.843 v18.0.0, §6.2.2 — Positioning Enhancement Use Case (UC2): AI/ML fingerprinting, CNN architecture, InF evaluation results.
[8] 3GPP TR 38.855 v16.0.0 — Study on NR Positioning Support: baseline positioning methods, accuracy requirements, NLOS analysis, Release 16/17 performance benchmarks.
Precise positioning feeds into network-level resource management. → §7 examines how AI applies these location and traffic insights to drive energy efficiency in the RAN.
§7AI for Network Energy Efficiency
The proliferation of 5G base stations and the anticipated 10–100× traffic growth toward 6G has placed network energy consumption at the centre of both operator economics and global sustainability commitments. AI-driven energy management promises to decouple traffic growth from power consumption — a critical requirement for the next decade of wireless infrastructure.
§7.1 — The Energy Crisis in Mobile Networks
Global mobile networks consumed approximately 200 TWh per year in 2020, representing roughly 0.7 % of worldwide electricity usage. With 5G densification — smaller cells, massive MIMO antenna arrays, millimetre-wave deployments — energy consumption is on track to grow substantially unless counteracted by efficiency gains. The 6G vision sets an ambitious target:
gNB Power Consumption Breakdown
Understanding where power is consumed in a gNB is the first step toward AI-guided optimisation. The breakdown below is representative of a 5G massive-MIMO macro cell 3GPP TS 28.310:
| Component | Share of Total Power | Primary Cause | AI Lever |
|---|---|---|---|
| RF / Power Amplifier (PA) | ~65 % | Poor PA back-off efficiency at low load | Sleep modes, load-adaptive PA biasing |
| Digital Baseband Processing | ~15 % | FFT, MIMO detection, channel coding | Algorithm-off / clock-gating on idle symbols |
| Cooling / HVAC | ~10 % | Waste heat from PA and BBU | Indirectly reduced by PA/BBU savings |
| Other (power supply, backhaul, control) | ~10 % | Static overhead | Limited; centralised pooling helps |
The dominance of the RF/PA component immediately motivates sleep-mode strategies: even a short-duration shutdown of RF chains during traffic troughs translates directly into large absolute power savings.
3GPP Energy Efficiency Metric
3GPP TS 28.310 defines a standardised energy efficiency KPI for the RAN:
Measured over a reference time window (e.g., one hour). For a macro cell with 100 Mbps average throughput and 500 W average power, \(\eta_{EE} = 100 \times 10^6 / 500 = 200\) kbits/J. Improving sleep-mode penetration raises this metric directly.
§7.2 — BS Sleep Mode Prediction with AI
Base station sleep mode operation is perhaps the highest-impact single AI application in the RAN energy domain. The core challenge is prediction accuracy: waking up too late causes dropped calls; sleeping too aggressively causes coverage holes.
AI Model Architecture
- Traffic load history: T(t-k : t), where k may span 24–168 hours (daily/weekly periodicity)
- Time-of-day encoding (cyclic sine/cosine features for 24h and 7d period)
- Day-of-week and public-holiday indicator
- Neighbouring-cell load (spatial correlation)
- Current CQI distribution across active UEs
- Predicted sleep duration \(\Delta t_{\text{sleep}}\) for each available sleep tier
- Confidence interval (enables risk-aware threshold setting)
Sleep Mode Types — 3GPP TS 38.300
3GPP TS 38.300 defines a hierarchy of sleep modes with different wake-up latencies and power savings:
| Sleep Mode | Wake-up Latency | Power Saving | AI Prediction Horizon | Use Case |
|---|---|---|---|---|
| Symbol Shutdown | <1 OFDM symbol (~71 µs) | 5–15 % | 1–10 ms | Idle OFDM symbols within a slot |
| Carrier Sleep | 1–10 ms | 30–50 % | 100 ms – 1 s | Low-load periods within minutes |
| Cell Off (Deep Sleep) | 10–30 s | 70–90 % | Minutes to hours | Night-time, stadium off-hours |
Energy Saving Ratio (ESR)
The net energy saving when AI-controlled sleep is applied relative to an always-on baseline:
where \(t_{\text{sleep}}\) is the fraction of time in sleep state, \(P_{\text{sleep}}\) is power during sleep (typically 10–30 % of \(P_{\text{active}}\)), and \(E_{\text{baseline}} = P_{\text{always\_on}} \cdot T\). A cell spending 40 % of time in carrier-sleep at 40 % of active power achieves: \(\text{ESR} = 1 - (0.4 \times 0.4 + 0.6 \times 1.0) = 1 - 0.76 = 24\%\) saving.
Implementation Pipeline
- Data collection: PM counters (DL PRB utilisation, connected UE count) collected via O1 at 15-minute granularity.
- Model training: Offline on 90-day historical data; retrained weekly via Non-RT RIC rApp.
- Inference: Near-RT RIC xApp queries model every 100 ms; issues sleep/wake commands via E2 interface to gNB.
- Guard rails: Minimum coverage threshold enforced: if predicted load exceeds 20 % PRB utilisation in adjacent cell, cell-off mode is suppressed.
§7.3 — 3GPP SON / Coverage and Capacity Optimisation
Self-Organising Network (SON) functions have been part of 3GPP specifications since LTE, but their AI incarnation in 5G-Advanced and 6G moves from rule-based heuristics to learned policies. 3GPP TR 37.816 defines the SON framework for 5G.
Coverage and Capacity Optimisation (CCO)
- AI model adjusts antenna tilt (mechanical or electrical remote tilt) and TX power.
- Optimisation target: maximise coverage while minimising pilot pollution and inter-cell interference.
- Input state: RSRP/RSRQ maps, UE distribution heat maps, handover failure rates.
Mobility Load Balancing (MLB)
- NN distributes UEs across cells/beams to equalise load.
- Adjusts cell individual offset (CIO) and A3/A5 handover thresholds.
- Reduces both overloaded cell outage and underloaded cell idle power.
Multi-Agent RL for SON
State \(s_t\): Local load, interference level, neighbour-cell loads, handover KPIs.
Action \(a_t\): {antenna tilt ±2°, TX power ±3 dB, CIO ±2 dB, beam index}.
Reward function:
Algorithm: Centralised training with decentralised execution (CTDE). Global reward shared during training; at inference each agent uses only local observations.
Interference Rejection via Beamforming Adaptation
Massive MIMO arrays (32–256 antenna elements) provide spatial degrees of freedom that AI can exploit for interference mitigation:
- AI predicts UE mobility trajectory → proactively steers beam to reduce inter-cell leakage.
- Null steering toward dominant interferers identified via uplink SRS sounding.
- Energy efficiency gain: focusing energy in the right direction reduces required TX power to maintain same SINR.
§7.4 — O-RAN Energy Optimisation Architecture
The O-RAN Alliance has defined an end-to-end architecture for AI-driven energy management that cleanly separates training (Non-RT RIC, minutes-to-hours timescale) from inference (Near-RT RIC, 10–100 ms timescale) and execution (O-DU/O-RU, symbol timescale).
Non-RT RIC (rApp)
- Collects energy KPMs via O1 interface (TS 28.311).
- Trains global energy optimisation model offline.
- Pushes AI policy to Near-RT RIC via A1 interface.
- Timescale: minutes to hours; model update: hourly or event-triggered.
Near-RT RIC (xApp)
- Receives policy from Non-RT RIC.
- Applies per-cell sleep decisions every 10–100 ms.
- Uses E2 interface to command gNB sleep/wake transitions.
- Feedback loop: reports decision outcomes back via O1.
O1 Energy KPMs
| KPM Name | Description | Typical Granularity |
|---|---|---|
DL.PRB.UsedRatio | DL PRB utilisation fraction (0–1) | 15 min PM window |
RF.TX.PowerAvg | Average DL TX power (dBm) | 15 min PM window |
Cell.DowntimeRatio | Fraction of time cell was in sleep/off state | 15 min PM window |
RRC.ConnSuccRate | UE connection success rate | 15 min PM window |
Energy.Consumed.kWh | Energy meter reading (where available) | 1 hour |
Energy Optimisation — End-to-End Data Flow
- O-RU / O-DU: measures RF output power, temperature, and traffic load every slot.
- O-DU → SMO/O1: aggregates 15-min PM reports; energy meter if equipped.
- Non-RT RIC rApp: ingests PM data, trains LSTM forecaster and MARL policy.
- A1 policy push: serialised policy (e.g., sleep thresholds, cell-off schedule) pushed to Near-RT RIC.
- Near-RT RIC xApp: queries policy, evaluates current cell state, issues sleep/wake command via E2 SM (E2 Service Model for RAN control).
- gNB RRC: executes sleep transition; broadcasts updated SIB if coverage changes.
- Outcome reporting: xApp logs decision outcome; rApp updates model with new reward signal (online RL).
Quantitative Impact
| Mechanism | Energy Saving vs Always-On | AI Gain vs Fixed Schedule | Coverage Impact |
|---|---|---|---|
| Symbol Shutdown (AI-guided) | 5–15 % | +3–5 % | Negligible (PDCCH always on) |
| Carrier Sleep (AI-guided) | 30–50 % | +8–12 % | <1 dB RSRP degradation |
| Cell Off (AI-guided) | 70–90 % | +15–25 % | Neighbour cells compensate |
| CCO tilt optimisation | 5–10 % | +4–8 % | +0.5–1 dB coverage improvement |
| MLB load balancing | 8–15 % | +5–10 % | Improved edge UE throughput |
Figure 7.1 — Energy savings (%) for different sleep mode tiers, comparing fixed schedules versus AI-guided prediction. AI prediction consistently adds 15–25 % relative improvement across all tiers.
AI energy management optimises when and where the network transmits; → §8 tackles how spectrum resources are allocated efficiently — the radio resource management and scheduling problem.
§8AI for Radio Resource Management and Scheduling
Radio Resource Management (RRM) is the real-time control plane of the RAN: allocating PRBs, selecting modulation and coding schemes, managing MIMO layers, and balancing load across cells. Classical schedulers, designed for single-cell optimisation with limited context, are inadequate for the densely heterogeneous, interference-coupled topology of 6G. AI-native schedulers promise to replace hand-crafted heuristics with learned policies that generalise across environments and adapt to changing channel conditions.
§8.1 — Classical Scheduling — Capabilities and Limitations
5G NR supports three canonical scheduler families, each representing a different trade-off between spectral efficiency and fairness:
| Scheduler | Objective | CQI Dependency | Key Limitation |
|---|---|---|---|
| Round Robin (RR) | Equal PRB time share | None | Spectral efficiency ignored; poor for mixed UE geometries |
| Max-C/I | Maximise instantaneous throughput | Full CQI required | Starves cell-edge UEs; fairness index ≈ 0.5 |
| Proportional Fair (PF) | \(\max \sum_k \log R_k\) | CQI feedback every 5–10 ms | Reactive only; no interference prediction; single-cell scope |
The Proportional Fair scheduler maximises the sum of logarithmic rates, which is equivalent to maximising the Nash bargaining solution for resource allocation fairness. However, it is strictly reactive:
- Cannot anticipate channel quality fluctuations caused by UE mobility or blocking events.
- Does not model inter-cell interference — the dominant impairment in dense deployments.
- CQI feedback latency (5–10 ms) introduces stale information for fast-fading channels.
- QoS differentiation (latency, reliability) must be implemented as ad-hoc overrides rather than as integrated optimisation objectives.
§8.2 — Deep Reinforcement Learning Scheduling
The scheduling problem maps naturally onto the Markov Decision Process (MDP) framework, enabling direct application of deep RL techniques.
MDP Formulation
- CQI per UE per PRB group (wideband or subband, normalised to [0,1])
- Buffer occupancy per UE (bytes, normalised to maximum buffer size)
- QoS class indicator (QCI/5QI) and remaining latency budget per UE
- Historical throughput ratio \(R_k(t) / \bar{R}_k\) (PF score history)
- Inter-cell interference estimate (from neighbouring cell X2/Xn reports)
- PRB allocation mask per UE (which PRBs assigned to which UE)
- MCS index per UE (0–27 for 5G NR)
- MIMO rank and precoding matrix indicator (PMI)
- Beam index (for mmWave / massive MIMO)
DQN-Based Scheduler
For discrete action spaces (quantised PRB allocations), a Deep Q-Network is applied:
where \(\theta^-\) denotes the frozen target network updated every \(C=100\) steps (DQN target network trick). Experience replay buffer size: 105 transitions. Mini-batch size: 64. The action space is factored by UE to avoid combinatorial explosion: each UE's PRB allocation is decided by a separate Q-head sharing a common state encoder.
Simulated Performance vs PF Scheduler
| Metric | PF Scheduler | DQN Scheduler | Gain |
|---|---|---|---|
| 50th %ile UE throughput (Mbps) | 42.1 | 47.2 | +12.1 % |
| 5th %ile (cell-edge) UE throughput (Mbps) | 8.3 | 8.97 | +8.1 % |
| QoS satisfaction rate (%) | 88.0 | 95.0 | +7 pp |
| Latency violation rate (%) | 4.2 | 1.8 | −57 % |
| Jain's fairness index | 0.87 | 0.91 | +4.6 % |
Simulation: 132 PRBs, 30 kHz SCS, 32T32R massive MIMO, 20 UEs per cell, 3GPP 3D-UMa channel model, 10 MHz × 100 ms evaluation window.
Actor-Critic for Continuous Action Spaces
When the action space is treated as continuous (e.g., power fractions, beamforming vectors), Proximal Policy Optimisation (PPO) or Soft Actor-Critic (SAC) is preferred:
- PPO: On-policy; stable convergence with clipped surrogate objective; well-suited for deployment where environment rollouts are expensive.
- SAC: Off-policy; entropy-maximisation encourages exploration; handles stochastic channels well; sample-efficient.
- Both have been demonstrated in 5G NR simulations to exceed PF by 10–20 % system throughput.
§8.3 — Multi-Cell Coordination with Multi-Agent RL
Single-cell scheduling, even with deep RL, cannot resolve inter-cell interference because each gNB observes only its own UEs and local channel state. Multi-Agent RL (MARL) extends the framework to coordinate decisions across a cluster of base stations.
Problem: Inter-Cell Interference in Dense Networks
- Neighbouring cells reuse the same spectrum (full frequency reuse in 5G NR).
- A cell-edge UE served by cell A experiences strong interference from cells B and C on the same PRBs.
- If each cell individually maximises its own throughput, the resulting non-cooperative Nash equilibrium is far from the global optimum.
- Traditional approaches (ICIC, eICIC) use static partitioning — inefficient for dynamic traffic.
MARL Architecture — MADDPG
Agent: Each base station \(k \in \{1,\ldots,K\}\) maintains actor \(\pi_k(\cdot; \theta_k)\) and critic \(Q_k(\cdot; \phi_k)\).
Key CTDE (Centralised Training, Decentralised Execution) property:
- Training: Critic \(Q_k\) receives the joint action \(a_{1:K,t}\) and joint state \(s_{1:K,t}\) — full information available offline.
- Execution: Actor \(\pi_k\) uses only local observation \(o_{k,t}\) — practical for deployment.
- MADDPG (Lowe et al., NeurIPS 2017): off-policy, DDPG-based, good sample efficiency, can be unstable with many agents.
- MAPPO (Yu et al., NeurIPS 2021): on-policy, PPO-clipped, better training stability and cooperative task performance — recommended baseline for 6G deployments.
- MAAC (Iqbal & Sha, ICML 2019): attention-based centralised critic, handles dynamic K (variable number of active cells), suited for heterogeneous 6G topologies.
6G Ultra-Dense HetNet Context
6G deployments will combine macro cells, pico cells, femto cells, and Reconfigurable Intelligent Surfaces (RIS) in a single coordinated network:
Coordination Scale
- MARL coordinates 10–100 cells simultaneously.
- Hierarchy: cluster head (macro) aggregates local decisions from pico/femto agents.
- RIS controller is a zero-power agent (no active TX) optimising phase shifts.
Performance Gains
- +5–20% system throughput vs single-cell DRL; +17–43% vs Proportional Fair (load-dependent).
- −40–60 % inter-cell interference at cell edge.
- Fairness index improvement: 0.83 → 0.94 (Jain's index).
Graph Neural Network Extensions
Recent work models the cellular network as a graph: nodes are base stations, edges are interference links. A Graph Neural Network (GNN) encodes spatial interference structure directly into the RL state representation:
- Node features: local load, CQI distribution, beam configuration.
- Edge features: estimated inter-cell interference power (from UE reports or X2 measurement sharing).
- GNN output: per-node policy logits for resource block assignment.
- Benefit: generalises to unseen network topologies (different number of cells) without retraining.
§8.4 — 6G Integrated Access and Backhaul (IAB) Scheduling
Integrated Access and Backhaul (IAB) — specified in 3GPP TR 38.874 — allows gNBs to serve both UEs (access link) and relay traffic to/from the core via wireless backhaul over the same NR spectrum. AI scheduling in IAB networks addresses the unique challenge of joint access–backhaul resource management.
IAB Topology Optimisation
- Parent selection: RL policy selects optimal parent node for each IAB relay, trading off backhaul capacity vs coverage.
- Dynamic re-parenting: AI triggers topology change when link quality degrades (blockage, mobility).
- Cycle prevention: Graph-constraint layer in RL policy ensures acyclic routing tree.
Backhaul Pre-Allocation
- Traffic prediction (LSTM on PM counters) forecasts access-link demand 100–500 ms ahead.
- Pre-allocates backhaul PRBs in advance to avoid buffering stalls.
- TDD frame structure adapted: backhaul slots and access slots proportioned dynamically by AI.
Hybrid TDD+FDD Policy with RL
Some 6G IAB deployments operate dual-band (sub-6 GHz FDD for access, mmWave TDD for backhaul). An RL policy manages cross-band resource coupling:
- Action: fraction of TDD downlink slots allocated to backhaul vs local access.
- State: per-relay buffer occupancy, backhaul SINR, access-link CQI histogram.
- Reward: end-to-end UE throughput (access + backhaul) minus relay queueing delay penalty.
- Convergence: PPO converges within ~500 episodes in a 5-relay IAB simulation.
Multi-Hop IAB Scheduling
For multi-hop IAB chains (gNB-donor → IAB-1 → IAB-2 → UE), the scheduling problem becomes a pipeline optimisation:
The effective UE rate is bottlenecked by the weakest link. An AI scheduler identifies the bottleneck hop and preferentially allocates resources to relieve it — behaviour that emerges naturally from an end-to-end reward but cannot be captured by hop-by-hop greedy schedulers.
Figure 8.1 — System throughput (Gbps) vs number of active UEs for Proportional Fair, single-cell DRL, and MARL schedulers. Simulation: 132 PRBs, 30 kHz SCS, 32T32R massive MIMO, 3GPP 3D-UMa channel. MARL benefits increase with UE count due to greater inter-cell interference at higher load.
Deployment Path to Live Networks
Transitioning AI schedulers from simulation to live 5G/6G networks involves several practical steps governed by 3GPP and O-RAN specifications:
| Phase | Activity | 3GPP/O-RAN Anchor | Risk |
|---|---|---|---|
| Phase 1 — Shadow Mode | AI scheduler runs in parallel with PF; no actual resource assignment | O-RAN WG2 A1/E2 interfaces | Low — no UE impact |
| Phase 2 — A/B Test | Subset of cells (5–10 %) use AI scheduler; rest use PF | TS 28.552 PM framework | Medium — monitor KPIs closely |
| Phase 3 — Controlled Rollout | Expand to 50 % with automatic rollback trigger | TS 28.316 SON management | Medium — requires rollback automation |
| Phase 4 — Full Deployment | 100 % cells; continuous online learning via rApp | O-RAN ML workflow (WG2) | Ongoing model drift monitoring required |
§8.5 — AI-based HARQ Retransmission Prediction
The key insight is that HARQ failure probability is not uniformly distributed — it is strongly correlated with observable radio conditions (CQI trend, RSRP, interference level, UE velocity) in the immediately preceding slots. A classifier trained on these features can predict, before transmitting, whether a given transport block will likely require retransmission.
Proactive Resource Pre-positioning
When the AI predicts a high retransmission probability for a scheduled UE, the scheduler can:
- Pre-reserve HARQ IR resources: allocate incremental redundancy (IR) resources one slot ahead, eliminating the RTT delay on the expected retransmission.
- Select more robust MCS: proactively back off MCS by 1–2 steps to reduce the probability of first-transmission failure for latency-critical bearers.
- Early stopping: for predicted-failure blocks, trigger PDCCH signaling one slot early to release HARQ process buffers, reducing head-of-line blocking at the MAC layer.
| Approach | Latency impact | Complexity | 3GPP status |
|---|---|---|---|
| Classical HARQ (Chase / IR) | Full RTT (4–8 ms) per retransmission | Low (fixed protocol) | Normative (TS 38.212/213) |
| AI-HARQ prediction (proactive MCS back-off) | 0 ms added — avoids retransmission | Light classifier (MLP or GBT) | Under study (TR 38.843, Rel-19) |
| AI-HARQ prediction (pre-reserved IR) | RTT removed for predicted failures | Scheduler integration required | Research / O-RAN xApp proposals |
§8 Summary
- Classical PF scheduling is optimal for single-cell, single-step scenarios but cannot handle multi-cell interference or QoS heterogeneity.
- DQN/PPO-based schedulers achieve +8–12 % cell-throughput gain and +7 pp QoS improvement over PF in 5G NR simulations.
- MARL (MADDPG) with CTDE achieves a further +18–30 % gain by resolving inter-cell interference through cooperative policy optimisation.
- IAB AI scheduling unifies access and backhaul resource management, essential for 6G ultra-dense relay topologies.
- Sample efficiency: live network training is slow; sim-to-real transfer and meta-RL are active research areas.
- Action space explosion: 100+ UEs × 132 PRBs × MCS levels exceeds practical Q-table sizes; factored and hierarchical RL needed.
- Interpretability: regulators may require explainable scheduling decisions for QoS audit.
- Standardisation: E2 SM for RL-based schedulers is not yet finalised in O-RAN.
Optimising how bits move through the RAN is one dimension of AI-native 6G. → §9 takes a more radical step: rethinking what is transmitted, moving from bit-pipe semantics to goal-oriented and semantic communications.
§9 Semantic & Goal-Oriented Communications
§9.1 Beyond Shannon: The Semantic Layer
Classical communications theory, as formulated by Shannon in 1948, defines a single objective: transmit bits reliably across a noisy channel. The celebrated capacity formula encodes that objective in one line:
Every generation of cellular technology — from 2G voice codecs to 5G LDPC and polar codes — has pushed relentlessly toward this bound. Having nearly reached it for individual links, 6G asks a different question: is reliable bit delivery the right goal?
Semantic communications reframe the problem. If both transmitter and receiver share a background knowledge base K, the channel only needs to convey the difference between the message and what the receiver already knows — a concept pioneered by Weaver in 1949 but now made practical by large neural networks acting as shared world-models. In image and video trials this alone reduces effective transmitted information by 90 % while preserving communicative fidelity.
Three Levels of Communication (Shannon & Weaver, 1949)
| Level | Question answered | Metric | 6G support |
|---|---|---|---|
| 1 — Technical | Were bits transmitted correctly? | BER, BLER, Shannon capacity | Inherited from 5G-Advanced |
| 2 — Semantic | Was meaning transmitted correctly? | BLEU, BERT-score, semantic similarity | New in 6G SA1 study item |
| 3 — Effectiveness (Goal) | Was the intended effect achieved? | Task accuracy, control error, MOS | New in 6G SA1 study item |
5G and earlier generations operate exclusively at Level 1. 6G natively incorporates Levels 2 and 3 into the radio access network design, creating new KPIs and new protocol layers that did not exist in prior 3GPP releases.
§9.2 Joint Source–Channel Coding (JSCC)
Shannon's separation theorem guarantees that, in the limit of infinite block length, independently optimising source compression and channel coding achieves capacity. In practice, block lengths are finite, latency is bounded, and the source is often correlated with a task at the receiver. Joint Source–Channel Coding (JSCC) breaks the abstraction barrier and co-designs both stages as a single end-to-end learned system.
DeepJSCC Architecture
Transmitter path
- Input: source signal x (image, audio, sensor)
- Encoder network fe(·; θe)
- Output: complex baseband vector z ∈ ℂk
- Power normalisation to meet PTX
Receiver path
- Input: noisy received vector ŷ = Hz + n
- Decoder network fd(·; θd)
- Output: reconstruction x̂
- Optional: task head for downstream inference
The JSCC training loss jointly penalises reconstruction distortion, transmit power, and semantic distortion:
where:
- d(x, x̂) — pixel-level distortion (MSE for signals, perceptual loss for images and video)
- PTX — average transmit power constraint
- 𝒟sem — semantic distortion: task accuracy loss, BERT embedding distance, or MOS penalty depending on modality
- λ1, λ2 — Lagrange multipliers trading off the three objectives
Performance: Classical Pipeline vs. DeepJSCC
The most dramatic difference appears at low SNR. A conventional pipeline (JPEG2000 source compression + LDPC channel coding) exhibits a cliff effect: image quality is high above a threshold SNR, then collapses catastrophically as the channel deteriorates below that threshold. DeepJSCC shows graceful degradation — quality reduces smoothly, never catastrophically.
| System | SNR = 10 dB (PSNR) | SNR = 2 dB (PSNR) | SNR = −2 dB (PSNR) | Cliff? |
|---|---|---|---|---|
| JPEG2000 + LDPC (0.1 bpp) | 34.2 dB | 33.8 dB | < 5 dB (collapse) | YES |
| DeepJSCC (same bandwidth) | 35.1 dB | 30.5 dB | 25.2 dB | NO |
| DeepJSCC + semantic head | 36.4 dB | 31.9 dB | 26.8 dB | NO |
§9.3 Task-Oriented Communications
In machine-to-machine 6G scenarios — factory automation, vehicular sensing, drone swarms — there is no human receiver interpreting the content. The communication exists solely to enable a downstream task: binary classification, object detection, state estimation, or control command generation. Optimising for bit fidelity in these scenarios is not just suboptimal — it is the wrong objective function.
Formulation
Let x be a raw feature (sensor frame, LiDAR point cloud, RF sample). The downstream task model ftask maps a reconstruction x̂ to label ŷ. The task-oriented transmission loss is:
where R is the transmission rate (bits per channel use) and ℒCE is the cross-entropy classification loss. The encoder learns to discard pixels that do not influence task accuracy — a strict information-theoretic compression beyond any hand-designed codec.
3GPP Alignment: TR 22.874 AI/ML Data Communication
Key quantified benefits from 3GPP feasibility studies:
| Scenario | Data type | Uplink reduction | Task accuracy |
|---|---|---|---|
| Factory camera QA | HD image (2 MP) | 70–80 % | 98.2 % vs. 98.5 % (raw) |
| Vehicular LiDAR | Point cloud (65k pts) | 85–92 % | 94.1 % vs. 94.8 % (raw) |
| Drone RF sensing | IQ samples (1 ms burst) | 90–95 % | 96.3 % vs. 97.0 % (raw) |
The 1–2 % accuracy loss is traded for a 10–20× reduction in uplink channel occupancy — a favourable exchange in dense IoT deployments where radio resources are the binding constraint.
§9.4 3GPP Semantic Communications Study Items
3GPP has formally opened study items on semantic and goal-oriented communications in Release 19 under SA1, with the intent to define new KPI classes and service requirements. This section maps academic concepts to 3GPP specification artefacts.
Key Specification References
| Reference | Scope | Release |
|---|---|---|
| TR 22.874 §5.5 | AI/ML data communication use case; feature compression; edge inference | Rel-18 |
| TR 22.874 §6.3 | AI/ML model lifecycle: training, transfer, update, versioning | Rel-18 |
| SA1 Rel-19 SI | Semantic and goal-oriented communications — KPI framework | Rel-19 |
| SA2 Rel-19 WI | Architecture for semantic layer (proposed): semantic entity, knowledge DB | Rel-19 (study) |
Proposed Semantic KPI Candidates
Language / Text modality
- BLEU score — n-gram overlap between transmitted and reconstructed text
- BERTScore — cosine similarity in transformer embedding space; captures paraphrase
- Semantic accuracy — intent classification accuracy after reconstruction
Image / Video / Sensor modality
- Task accuracy — object detection mAP, classification Top-1/Top-5
- MOS (Mean Opinion Score) — perceptual quality for media use cases
- Control error — RMS deviation for closed-loop actuation tasks
Figure 9-1 — Bandwidth–Fidelity Trade-off: Classical vs. JSCC vs. Semantic
- What is the fundamental difference between Level 1 (technical) and Level 2 (semantic) communication goals?
- Explain the cliff effect in classical coded transmission. Why does DeepJSCC avoid it?
- Write the task-oriented loss function and identify which term drives the encoder to discard task-irrelevant information.
- Name two proposed semantic KPIs from the 3GPP SA1 Rel-19 study item and explain how each is measured.
- A factory deploys 1000 cameras each transmitting 2 MP images at 30 fps. Using a 70 % uplink reduction figure, calculate the released capacity in Gbps if raw transmission requires 120 Mbps per camera.
Semantic communications redefine the purpose of the link; → §10 takes the final step — replacing the entire classical transceiver with an end-to-end learned system in the Native AI Interface paradigm.
§10 Native AI Interface in 6G
§10.1 The End-to-End Learning Paradigm
Every component in a classical communications chain — modulator, forward error correction encoder, channel estimator, equaliser, FEC decoder, demodulator — was designed independently, each individually optimal under idealised assumptions about the channel and the adjacent blocks. The assembled pipeline performs well only when those assumptions hold simultaneously.
End-to-end (E2E) learning abandons that decomposition entirely. The entire transmitter–channel–receiver is modelled as a differentiable autoencoder and trained jointly with stochastic gradient descent (SGD) on a single objective: minimise message error probability over the real channel distribution. O'Shea & Hoydis (2017) demonstrated the concept on AWGN and fading channels, showing that a learned autoencoder rediscovers classical constellations and can exceed them on short block lengths.
Autoencoder Transceiver Architecture
Transmitter (encoder)
- One-hot message vector m ∈ {0,1}M
- Dense layer (M → 2N) + BatchNorm
- Power normalisation: s ← s / √(𝔼[‖s‖²])
- Complex baseband signal s ∈ ℂn
Receiver (decoder)
- Received vector r = Hs + n
- Dense layers (2N → 2N → M)
- Softmax activation → class probabilities
- Hard decision: m̂ = argmaxi pi
The E2E training objective is the cross-entropy loss over the message alphabet ℳ:
Back-propagation through the decoder, then through a differentiable channel model (or a surrogate for non-differentiable hardware), and finally into the encoder adjusts both ends simultaneously. The trained encoder weights define a learned signal constellation; the trained decoder weights define a near-MAP detector for that constellation and channel.
Empirical Performance on AWGN
| Configuration | Code rate | FER = 10−2 at SNR | vs. baseline |
|---|---|---|---|
| BPSK + Hamming(7,4) | 4/7 ≈ 0.57 | 6.2 dB | baseline |
| Autoencoder (n=7, k=4) | 4/7 ≈ 0.57 | 4.7 dB | −1.5 dB |
| QPSK + uncoded | 2 bpcu | 5.1 dB | — |
| Autoencoder (n=2, k=4) | 2 bpcu | 3.6 dB | −1.5 dB |
§10.2 Constellation Learning
One of the most striking outputs of E2E training is a learned signal constellation that adapts its geometry to the statistics of the channel — something no classical codebook can do at deployment time.
Channel-Specific Adaptation
| Channel type | Learned constellation shape | Reason |
|---|---|---|
| AWGN | Hexagonal lattice ≈ QAM | Euclidean distance maximised; matches sphere-packing bound |
| Rayleigh flat fading | Near-PSK ring structure | Phase-invariant to random amplitude; concentrates energy on circle |
| ISI channel (multipath) | Pre-distorted / equalized | Encoder pre-inverts channel; constellation accounts for inter-symbol overlap |
| Adversarial (eavesdropper) | Obfuscated / non-standard | Encoder hides structure from passive observer; physical layer security |
Constrained Constellation Design
The formal optimisation is a mutual-information maximisation under a power constraint:
The mutual information I(s; r) is not analytically tractable for arbitrary channel distributions. Two practical approaches are used:
- Differentiable surrogate: Lower-bound I with a Gaussian auxiliary channel approximation; back-propagate through the bound. Fast convergence; slightly suboptimal for highly non-Gaussian channels.
- Reinforcement learning (RL) over hardware channel: Treat the real channel as a black-box reward function. Policy-gradient or evolutionary strategies optimise the constellation without a channel model. Enables optimisation directly on hardware — relevant for mmWave/sub-THz where accurate simulation is difficult.
3GPP RAN1 has begun feasibility studies on AI-based constellation shaping for 6G physical layer. The key open question is whether learned constellations can be represented compactly enough for standardised signalling — or whether each device pair must negotiate constellation parameters over an out-of-band channel.
§10.3 The Generalisation Challenge
E2E learning achieves strong results on the channel distribution it was trained on. The critical weakness is domain shift: performance degrades when the deployment channel differs from the training distribution. This is not a minor engineering concern — it is an existential challenge for standardised AI transceivers.
Sources of Domain Shift in 6G Deployments
- Site-specific multipath: CDL-A training, urban canyon deployment → different cluster geometry, angular spread
- Mobility: Trained on quasi-static channel; fast fading at highway speeds (v > 120 km/h) introduces Doppler unseen during training
- Hardware impairments: Phase noise, IQ imbalance, non-linear PA — unique to each device, not captured in simulation
- Interference topology: Number and positions of interferers change with cell load and UE density
Mitigation Strategies
Train over a broad distribution of channel realisations: CDL-A/B/C/D/E, SNR ∈ [−10, 30] dB, velocity ∈ [0, 500] km/h. Model learns a robust policy; sacrifices peak performance on any single channel for resilience across all.
Model-Agnostic Meta-Learning trains a parameter initialisation θ* such that a small number of gradient steps on pilot symbols at the new site reaches good performance. Adaptation requires only ~10–50 pilots — compatible with 5G NR reference signal overhead budgets.
Pre-train on synthetic CDL channels; fine-tune on real over-the-air samples collected during initial deployment. Encoder/decoder lower layers (generic features) are frozen; only upper layers are updated. Reduces fine-tuning data requirement by 10–100×.
Continue gradient updates during deployment using decoded-then-re-encoded symbols as pseudo-labels. Risk: catastrophic forgetting if adaptation rate is too high. Mitigation: elastic weight consolidation (EWC) penalty.
3GPP Implication: Model Update Mechanism
§10.4 Hybrid AI + Model-Based Design
A fully learned transceiver is not standardisable in the 3GPP sense: each trained model produces a unique, deterministic mapping from bits to waveforms that cannot be replicated by a compliant implementation from another vendor without access to the exact same weights. Interoperability — the cornerstone of cellular standards — would be lost.
The hybrid approach resolves this tension by maintaining the standardised structural skeleton of OFDM while replacing computationally intensive DSP blocks with AI components at well-defined interfaces.
What Stays Classical vs. What Becomes AI
| Layer / Block | Classical (standardised) | AI-enhanced | 3GPP status |
|---|---|---|---|
| Waveform | OFDM + CP (TS 38.211) | — | Fixed for interoperability |
| Resource grid | Slot/subframe structure | — | Fixed for interoperability |
| Channel estimation | LS / MMSE on DMRS | NN interpolator (§5 of this document) | AI allowed (Rel-18 WI) |
| Equalization | ZF / MMSE-IRC | Deep unfolded LISTA | AI allowed (Rel-18 WI) |
| Symbol detection | Hard/soft ML demapper | NN demapper (learned LLRs) | AI allowed (Rel-18 WI) |
| FEC decoder | LDPC / Polar BP | Neural BP (learnable edge weights) | Under study (Rel-19) |
| HARQ | Chase combining / IR | — | Fixed (impacts scheduler) |
| MAC scheduler | Rule-based PF/RR | RL-based scheduler (§7 of this document) | AI allowed (non-standard) |
| RRC state machine | Defined per 3GPP ASN.1 | — | Fixed (protocol correctness) |
This stratification — standardise the interface, liberalise the implementation — is the architectural principle adopted in 3GPP Rel-18 for AI-assisted RAN functions. It permits competitive AI implementations from multiple vendors while guaranteeing interoperability at the waveform and protocol levels.
Deep Unfolding: The Principled Hybrid
Deep unfolding provides a theoretical justification for the hybrid approach. A classical iterative algorithm (e.g., belief propagation for LDPC decoding, ISTA for sparse channel estimation) is unrolled for a fixed number of iterations T, and the algorithm parameters at each iteration are made learnable:
where 𝒮θ is a learned shrinkage / activation function and A(t), b(t) are iteration-specific weight matrices. The result is a network that:
- Has a known computational graph depth (T layers → T decoder iterations)
- Converges to the classical algorithm as T → ∞ if weights are set to their analytical values
- Learns residual corrections that account for model mismatch (finite SNR, non-Gaussian noise, correlated interference)
- Can be quantised and deployed on fixed-point DSP/ASIC without accuracy loss > 0.2 dB
Figure 10-1 — BLER vs. SNR: Classical, Hybrid AI, and E2E Autoencoder
- Draw the autoencoder transceiver block diagram and label the encoder, channel model, and decoder. Identify which blocks contain trainable parameters.
- Write the E2E training loss and explain why cross-entropy is the appropriate objective for constellation design.
- A Rayleigh fading channel causes random amplitude variations. Which constellation geometry does E2E learning converge to, and why?
- Explain the four domain-shift mitigation strategies. Which is most practical for a 6G base station that cannot communicate training labels back to the UE?
- Why is a fully E2E transceiver not standardisable in 3GPP? Describe the hybrid approach adopted in Rel-18 and identify two receiver blocks where AI is currently permitted.
E2E learning sets the vision for AI-native transceivers; deploying these models at scale requires a well-defined infrastructure. → §11 covers the AI/ML functional architecture — how 3GPP and O-RAN specify the training, inference, and model lifecycle management framework for production 6G networks.
§11 — AI/ML Architecture for 6G Networks
The evolution from 5G to 6G is not merely a capacity upgrade — it represents a fundamental architectural shift in which Artificial Intelligence and Machine Learning become first-class citizens of the radio standard. Where 5G treated AI as an optional optimisation bolt-on, 6G embeds AI/ML as a core functional layer: every major plane — RAN, core, OAM, UE — carries standardised AI service models, training pipelines, and model lifecycle interfaces.
§11.1 — The AI/ML Functional Architecture (TR 23.700-80)
3GPP SA2 Release 18 captured the baseline in TR 23.700-80 v18.0.0. The central construct is the AI/ML Network Function (AI/MLNF), a logical entity that may be instantiated at four physical loci:
- OAM (Management Plane) — global model training, policy authoring, network-wide analytics
- gNB / O-DU / O-CU — RAN-side inference for beam management, scheduler, link adaptation
- UE — on-device inference for channel estimation, CSI compression, positioning
- Edge / MEC Server — latency-sensitive tasks offloaded from UE; split-inference endpoint
- Model Training — supervised, unsupervised, or RL, operating on collected measurements
- Model Storage — versioned repository with metadata (scenario, SNR range, channel model)
- Model Inference — real-time application of a trained model to live inputs
- Model Performance Monitoring — statistical drift detection, accuracy tracking
- Model Transfer — delivery of model weights to the inference endpoint (O1/E2/Uu)
Standardised Interfaces Carrying AI/ML Traffic
| Interface | Endpoints | AI/ML Role | Typical Latency Budget |
|---|---|---|---|
| O1 | OAM → gNB / O-DU | Model delivery, KPM collection, performance report upload | seconds–minutes (management plane) |
| E2 | Near-RT RIC → gNB | xApp policy enforcement, per-UE/per-cell inference outputs | 10 ms – 1 s |
| A1 | Non-RT RIC → Near-RT RIC | AI policy objectives, enrichment information | 1 s – 1 min |
| R1 | rApp ↔ Non-RT RIC | rApp service registration, ML model lifecycle API | seconds |
| Uu / PC5 | gNB/eNB ↔ UE | On-device model delivery, feature upload (split inference) | sub-10 ms (6G target) |
§11.2 — AI/ML Data Pipeline
A production 6G AI pipeline is a closed loop, not a one-shot batch process. The canonical stages are:
- Data Collection — measurements harvested from UE, gNB, core
- Pre-processing — normalisation, feature engineering, outlier removal
- Training — model fitting (see modes below)
- Validation — held-out dataset evaluation; acceptance criteria check
- Deployment — model transfer to inference endpoint
- Monitoring — drift and accuracy tracking in production
- Re-training Trigger — if drift > threshold, restart from step 1
Data Sources
- RSRP, RSRQ, SINR per serving + neighbour cells
- CQI, RI, PMI feedback (CSI-RS based)
- Timing advance (positioning proxy)
- UE velocity estimate (Doppler / accelerometer)
- Battery / compute state (for offloading decisions)
- Per-PRB utilisation and interference maps
- Beam RSRP / SINR per beam index (NR SSB / CSI-RS)
- Handover success/failure rates, call drop rate
- Core: session throughput, mobility patterns, slice SLA compliance
- External context: weather (mmWave rain fade), crowd density, time-of-day
Training Modes
| Mode | Mechanism | Typical Use Case | Latency to Update |
|---|---|---|---|
| Offline Batch | Periodic retraining on accumulated historical data (daily/weekly cron) | Long-term mobility prediction, network capacity planning | Hours–days |
| Online Learning | Continuous gradient updates from streaming real-time data | Adaptive MCS selection, real-time interference map | Seconds–minutes |
| Transfer Learning | Adapt a pre-trained base model to a new cell or environment via fine-tuning | Rapid deployment in newly deployed cell sites | Minutes (few-shot) |
| Active Learning | Model queries for labels on high-uncertainty samples; reduces labelling effort | Anomaly detection with scarce labelled faults | Depends on labelling loop |
§11.3 — Model Lifecycle Management
3GPP TR 22.874 §6 codifies model metadata as a standard object, ensuring interoperability between vendor training systems and operator inference endpoints. A model package includes:
- Identity: model ID, version tag, creation timestamp, owning entity
- Training provenance: dataset descriptor (scenarios, SNR range, mobility class, 3GPP channel model used, geographic region)
- I/O specification: input tensor shape/dtype, output tensor shape/dtype, required feature pre-processing
- Performance claim: validated NMSE / accuracy / overhead ratio on held-out dataset
- Expiry condition: absolute timestamp, or event-triggered (e.g., >X% handover failure increase)
Model Drift Detection
Drift is detected by comparing the current output distribution against the training-time distribution using KL divergence. Formally, for model output distribution p:
When drift(t) > ε, the model lifecycle manager either:
(a) initiates a retraining job with fresh data, or
(b) rolls back to the previous version (fallback), while retraining proceeds
asynchronously.
[Deployed] → drift > ε → [Retraining]
[Deployed] → hard failure → [Fallback → prev version]
[Retraining] → complete → [Validation] → [Deployed]
[Validation] → FAIL → [Discarded / alert]
§11.4 — O-RAN AI/ML Architecture
The O-RAN Alliance WG2 specification O-RAN.WG2.AI-ML-v01.03 defines a three-layer AI/ML hierarchy aligned with control-loop latency:
| Layer | Component | Granularity | AI/ML Functions |
|---|---|---|---|
| Non-RT RIC | rApps | > 1 s | Global optimisation, policy authoring, model training, enrichment information |
| Near-RT RIC | xApps | 10 ms – 1 s | Per-UE/per-cell beam decisions, fast interference management, admission control |
| O-DU / O-RU | PHY-embedded | < 1 ms | Symbol-level beam tracking, low-latency channel estimation, CSI compression |
Inference Deployment Options — Comparison
| Location | Latency | Data freshness | Model complexity | Example use case |
|---|---|---|---|---|
| On-device (UE) | < 1 ms | Highest — live per-slot inputs | Low (< 1 MB, < 107 FLOPs) | On-device channel estimation, CSI compression, positioning |
| Near-RT RIC (xApp) | 10 ms – 1 s | High — per-UE/cell KPMs streamed over E2 | Medium (10–100 MB, runs on x86/GPU server) | Per-UE beam decisions, fast interference management, admission control |
| Non-RT RIC (rApp) | 1 s – 1 min | Moderate — aggregated historical KPMs | High (100 MB+, cloud-class training and inference) | Global network optimisation, model training, policy authoring, enrichment |
| Cloud / MEC | Variable (10 ms – minutes) | Low — batch or periodic data uploads | Very high (multi-GB models, large-batch training) | Federated learning aggregation, digital twin training, long-horizon planning |
E2 Service Models Relevant to AI
- RC SM (RAN Control) — xApp sends per-UE decisions: target beam, scheduler hint, MCS upper bound. The E2 node applies the policy within its local scheduler.
- KPM SM (KPI Monitoring) — streams per-PRB, per-UE, per-beam KPIs to the Near-RT RIC. Primary data source for AI feature construction.
- CCC SM (Call Control) — admission control decisions driven by ML-predicted load; prevents SLA violations before congestion occurs.
- CU-level AI: embed a lightweight NN beam predictor directly in the gNB CU (not in the RIC), operating on per-slot CSI within the CU's own processing pipeline at sub-1 ms latency.
- Real-Time RIC (RT-RIC): O-RAN WG2 has begun studying a “Real-Time RIC” tier with <10 ms inference loop, deployed co-located with the O-DU to reduce transport latency. This is not yet standardised as of O-RAN Release 4.
O-RAN AI Data Flow
The centralized AI/ML pipeline described here sets the stage for distributed training. → §12 covers Federated Learning and Split Inference — the mechanisms that scale this architecture while preserving UE privacy.
§12 — Federated Learning & Split Inference
Two of the most consequential AI techniques for 6G are Federated Learning (FL) and Split Inference (SI). FL addresses the fundamental tension between AI's hunger for data and users' privacy rights. SI addresses the mismatch between UE compute budgets and the inference complexity demanded by 6G channel models. Together they define a practical architecture for distributed AI at the network edge.
§12.1 — Why Federated Learning for 6G
UE measurement sequences encode location trajectories, daily routines, social-graph patterns, and health-correlated mobility. Uploading raw measurements to a central server violates GDPR / regional privacy law and undermines user trust. 6G must achieve AI performance without centralising personal data.
A fleet of 106 IoT/UE devices each generating 1 MB/day of channel measurements produces 1 TB/day to be uploaded. At typical uplink rates this is simply impractical. Federated Learning shifts the bandwidth requirement from raw data to compressed model gradients — a reduction of 100× or more.
The Federated Learning Solution
In FL, training data never leaves the device. Each participating UE:
- Receives the current global model weights θ(t) from the server
- Runs several epochs of local SGD on its private dataset
- Uploads only the model gradient (or weight delta) — not raw data
- Server aggregates contributions via the FedAvg algorithm
The aggregation step computes a weighted average over K participating UEs:
where θk(t+1) is UE k's local model after E local SGD epochs on its nk-sample dataset, and N is the total number of training samples across all participating UEs.
§12.2 — FedAvg and Communication Efficiency
FedAvg (McMahan et al., 2017 [18]) is the foundational FL algorithm. Its key parameters are:
| Parameter | Symbol | Typical 6G Value | Effect |
|---|---|---|---|
| Fraction of UEs per round | C | 0.01 – 0.1 | Higher C → faster convergence, more uplink load |
| Local epochs | E | 1 – 5 | Higher E → fewer rounds, but client drift risk |
| Local mini-batch size | B | 16 – 128 | Smaller B → more stochastic, better generalisation |
| Global rounds | T | 50 – 500 | Convergence typically within 100–200 rounds for channel tasks |
Raw Communication Cost
Without compression, each FL round transfers the full model in both directions:
- Downlink: server → K UEs:
sizeof(θ) × K ≈ 200 KB × 100 = 20 MB - Uplink: K UEs → server: same 20 MB (gradients same size as weights)
- Per day with hourly rounds: 20 MB × 2 × 24 = 960 MB/day
Compression Techniques
| Technique | Mechanism | Compression Ratio | Accuracy Impact |
|---|---|---|---|
| Gradient Sparsification | Transmit only top-1% largest-magnitude gradient entries; accumulate the rest locally (error feedback) | ~100× | < 1% accuracy loss with error feedback |
| Quantisation | Reduce gradient precision from 32-bit float to 8-bit int (QSGD) or 4-bit (1-bit extreme) | 4× – 32× | 1–3% accuracy loss at 4-bit; negligible at 8-bit |
| Low-Rank Approximation | Decompose gradient matrix: G ≈ UVT, transmit U, V separately | 10× – 50× | Depends on intrinsic rank of gradient |
| Local Differential Privacy | Add calibrated Gaussian noise before upload (privacy guarantee (ε,δ)-DP per Dwork et al., 2014; Laplacian noise for pure ε-DP) | 0× (privacy not compression) | Noise σ ∝ sensitivity/ε; accuracy–privacy tradeoff; (ε,δ)-DP allows tighter noise at cost of δ failure probability |
μ/2 · ||θ − θ(t)||2
to each local objective, penalising excessive deviation from the global model.
§12.3 — 6G Federated Channel Estimation
Channel estimation is an ideal FL application: each UE accumulates its own channel measurement history (pilot observations vs. true channel), which is deeply personal (encodes the UE's physical location and environment) yet highly informative for a local model.
Architecture
- Global phase: FL rounds train a shared base model capturing universal channel statistics (power delay profile shape, angular spread statistics). This uses pilot-to-channel pairs from all participating UEs.
- Personalisation phase: After FL convergence, each UE fine-tunes its local copy on its own data for 10–20 additional epochs. The fine-tuned model captures that UE's specific multipath environment (e.g., reflections from a specific building along its daily commute).
Key Results (Literature)
| Method | NMSE (dB) | Raw Data Shared | Rounds to Converge |
|---|---|---|---|
| Centralised training (oracle) | −12.3 | All (100%) | N/A |
| Local-only (no federation) | −8.7 | None | N/A (local) |
| FedAvg (E=1) | −11.4 | None | 120 |
| FedAvg + personalisation | −11.8 | None | 120 + 15 local |
| FedProx + personalisation | −11.9 | None | 100 + 15 local |
§12.4 — Split Inference
When a UE lacks the compute budget to run a full inference model locally, the neural network is partitioned across the UE and an edge server. This is termed split inference (also: collaborative inference, device-edge co-inference). It is standardised in 3GPP TR 22.874 §5.3 as a dedicated 6G use case.
Split Point Optimisation
Let layer k be the split point. The total latency is:
The optimal k minimises Ttotal subject to the 6G 1 ms E2E latency target. Practical constraints:
- Communication budget: Tcomm ≤ 0.2 ms ⇒ feature tensor ≤ 0.2 ms × 1 Gbps = 25 KB (at a conservative 1 Gbps uplink rate; 6G mmWave at 10 Gbps yields 250 KB budget).
- UE compute budget: Typical 6G device at 1 TOPS can execute approximately 109 MACs in 1 ms — enough for the first 3–5 convolutional layers of a ResNet-style model.
- Privacy: Intermediate feature tensors at layer k may still leak raw input information. Techniques: noise injection into features (DPFE), adversarial perturbation to obfuscate raw signal reconstruction from features.
6G mmWave Scenario Analysis
| Scenario | Uplink Rate | Feature Budget (0.2 ms) | Suitable Split |
|---|---|---|---|
| 6G mmWave (100 GHz) | 10 Gbps | 250 KB | Mid-network (layer 6–8 of 12) |
| 6G sub-7 GHz | 1 Gbps | 25 KB | Early split (layer 2–3) |
| 5G NR (FR2) | 500 Mbps | 12.5 KB | Very early split (layer 1–2) |
| IoT / NTN link | 10 Mbps | 0.25 KB | Full local inference or cloud only |
3GPP standardises what is communicated at the split point: the feature tensor format (shape, dtype, compression codec), the inference session ID (to correlate partial results across the split), and the QoS profile (latency class, reliability class). This enables multi-vendor split inference: a UE splitting inference to an edge MEC node, coordinated over a standard 6G Uu interface.
§12.5 — Knowledge Distillation for Model Compression
Pre-training large teacher models in the cloud and distilling them into compact student models for UE deployment is a key enabler of AI at the UE. Knowledge distillation (Hinton et al., 2015) trains the student to mimic the teacher's soft output distribution, not just the hard class labels.
KD Loss Function
where:
- σ(·) = softmax function
- zt = teacher logits (pre-softmax outputs)
- zs = student logits
- T = temperature parameter (T > 1 softens the distribution, revealing inter-class relationships)
- α = balance weight (typically 0.5–0.9; higher α emphasises teacher mimicry)
Why Temperature T Matters
At T = 1 (standard softmax), a confident teacher assigns probability ≈ 0.99 to the correct class and < 0.01 to all others — nearly indistinguishable from a hard one-hot label. At T = 4, the distribution is much softer (e.g., 0.6 / 0.2 / 0.1 / …), which encodes rich similarity structure. The student learns not just "class A is correct" but "class A is most likely, class B is somewhat similar, class C is dissimilar." This structured information transfer enables strong generalisation even from a tiny student.
| Model | Parameters | Size | Inference Latency (UE) | NMSE / Accuracy |
|---|---|---|---|---|
| Teacher (cloud) | 10 M | 40 MB | Not deployed on UE | Baseline (100%) |
| Student (no KD) | 1 M | 4 MB | 2.1 ms | −4.2% accuracy |
| Student (KD T=4, α=0.7) | 1 M | 4 MB | 2.1 ms | −1.8% accuracy |
| Student (KD + quantisation 8-bit) | 1 M | 1 MB | 0.9 ms | −2.1% accuracy |
FL + KD Synergy
A powerful 6G architecture combines both techniques in a two-stage pipeline:
- Stage 1 — Federated Teacher Training: A large teacher model is trained via FL across a fleet of high-capability edge nodes (MEC servers, O-DU co-processors). No raw UE data leaves the edge nodes.
- Stage 2 — KD to UE Student: The federated teacher is used to distil a UE-deployable student model. The student is distributed to UEs over O1/Uu as a standard model package (per TR 22.874 model metadata format).
This pipeline provides: (a) privacy preservation via FL, (b) UE deployability via KD compression, (c) personalisation via post-deployment fine-tuning — the complete 6G on-device AI stack.
FL Convergence Comparison
Federated learning and split inference define how AI models are trained and executed across the network. The quality of these models depends critically on the accuracy of the channel models used for training and evaluation. → §13 surveys AI-based channel modeling — from generative models and neural ray tracing to digital twins.
§13 AI for 6G Channel Modeling
§13.1 — Why AI for Channel Modeling?
Classical 3GPP channel models defined in TR 38.901 (CDL / TDL families) are parameterized stochastic models: a fixed cluster-ray structure with empirically fitted path-loss, shadow-fading, and angular-spread parameters. They have served 5G NR design well, but they carry structural assumptions that break down for the scenarios envisioned in 6G.
- Fixed cluster / ray count — cannot capture site-specific geometry (building material, street canyons, vegetation).
- Parameterized up to 100 GHz — no validated model for sub-THz (100–300 GHz) where molecular absorption dominates.
- Stationary clusters — inadequate for RIS-assisted links where the scattering environment is actively re-configured per slot.
- Far-field plane-wave assumption — fails for Large Intelligent Surfaces (LIS) and Holographic MIMO operating in the near-field Fresnel region.
- No coupling with ISAC — radar sensing path and communication path share the same environment but are modelled independently.
- Static obstacle model — cannot represent human body blockage dynamics or moving vehicles in XR / V2X scenarios.
6G New Channel Modeling Requirements
- D-band: 110–170 GHz
- G-band: 140–220 GHz; 275–300 GHz under study
- Molecular absorption peaks at 60, 119, 183, 325 GHz
- Near-field threshold < 10 m for 10 cm apertures
- RIS phase-reconfigurable scattering
- Spatially non-stationary channels (Holographic MIMO)
- High-Doppler: V > 500 km/h (HST), fD > 5 kHz
- Joint radar-communication bistatic paths (ISAC)
- Orbital angular momentum (OAM) mode coupling
§13.2 — Generative AI Channel Models
Two generative architectures dominate current AI channel modeling research: Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). Both learn the statistical distribution of measured channels and can synthesize unlimited additional samples matching that distribution.
GAN-based Channel Generation
A GAN for channel impulse response synthesis consists of two competing networks trained adversarially:
- Generator G: maps a latent noise vector z ∈ ℝd to a complex channel matrix H ∈ ℂNr × Nt × Nf (receive antennas × transmit antennas × subcarriers).
- Discriminator D: maps H to a probability score estimating whether H is a real measured channel.
- Training objective: minimax game where G minimizes and D maximizes the value function V(G, D).
- Conditional GAN (cGAN): condition on SNR, delay spread, Doppler class to generate scenario-specific channels.
- Wasserstein GAN (WGAN-GP): gradient penalty replaces original discriminator clipping; more stable training.
- Progressive growing: generate low-resolution channel (few subcarriers) first, then upscale — avoids mode collapse.
- GAN-generated channels for CDL-A/B/C match power-delay profile within 0.5 dB across all delays.
- Channel estimators trained on GAN data generalize 15% better than those trained on CDL-only data when deployed on real measured channels.
- Latency: trained GAN generates 104 channel realizations in < 1 s (vs hours of ray tracing).
VAE-based Channel Compression and Representation
A Variational Autoencoder learns a low-dimensional latent representation of the channel, useful both for data augmentation and for compressed CSI feedback (replacing traditional codebooks).
- Encoder qφ(z|H): maps channel H to Gaussian parameters (μ, σ) in latent space of dimension dz.
- Reparameterization trick: z = μ + σ ⊙ ε, ε ~ 𝒩(0,I) — enables backpropagation through the sampling step.
- Decoder pθ(H|z): reconstructs channel Ĥ from latent code z.
Joint VAE + CsiNet design: VAE encoder runs at UE (compressed feedback), VAE decoder runs at gNB (channel reconstruction) — a principled replacement for Type II CSI codebooks.
§13.3 — Neural Ray Tracing
Classical deterministic ray tracing (RT) solves Maxwell's equations geometrically: launch rays from transmitter, trace reflections / diffractions / transmissions through a 3D CAD environment model, collect at receiver. Accuracy is excellent (< 1 dB path loss error in validated scenarios), but computational cost is extreme — hours per scenario per frequency.
- Ray-object intersection: O(Nobjects) per ray bounce.
- Diffraction: UTD coefficients at wedge edges — expensive for dense urban geometry.
- Sub-THz: many more dominant rays (specular reflection dominates); need 10–20 bounces for accuracy.
- Moving scenes: must re-run per snapshot (not real-time).
- Represent propagation environment as implicit neural field: fθ(x,y,z,d̂) → {attenuation, phase delay}.
- Query: parameterize a ray as a sequence of 3D points along path; integrate neural field outputs → channel coefficient.
- Training: reconstruct scene from sparse channel measurements (few hundred snapshots suffice).
- 10,000× faster than classical RT post-training.
Input: (x, y, z, d̂in, d̂out) — 3D point + incoming direction + outgoing direction (6D).
Network: 8-layer MLP, 256 units/layer, ReLU activations, skip connection at layer 4 (same as original NeRF).
Output: {σextinction ∈ ℝ, φdelay ∈ ℝ, polarization matrix T ∈ ℂ2×2}.
Channel coefficient: integrate along ray path using numerical quadrature.
Performance vs classical RT: path loss within 2 dB, delay spread within 10 ns, angle spread within 3°. Inference: 1 ms/link.
§13.4 — Digital Twin Channel Model
A Digital Twin (DT) of the radio environment combines a real-time 3D geometric replica of the physical environment (buildings, furniture, vehicles) with a neural RT engine to generate site-specific, time-varying channel predictions.
DT Architecture
- Sensing layer: LiDAR, RGB-D cameras, satellite imagery, plus GPS/IMU for moving objects (vehicles, pedestrians) — continuously updates 3D scene model.
- Neural scene representation: environment encoded as a neural implicit field (neural RT model, Sec. 13.3) — updated incrementally as scene changes.
- Channel prediction engine: given UE trajectory prediction, generate channel H(t+δ) before the UE arrives.
- AI training sandbox: AI/ML models for scheduling, beamforming, and estimation are trained in DT and then hot-deployed to live network.
where environment(t+δ) is predicted from sensor fusion + object tracking, and UE\_pos(t+δ) is the output of an AI trajectory predictor (LSTM or Kalman filter).
- Proactive handover: predict link quality 500 ms ahead → zero-interruption HO.
- Beam pre-computation: compute optimal beam before UE moves into new sector.
- RIS configuration: pre-compute RIS phase profile for next time slot (RIS has no feedback channel).
- AI model transfer: train on DT, fine-tune on live network with minimal over-the-air overhead.
- SA2 RP-220162: Study on Digital Twin Network (DTN) — architecture and information model.
- TS 28.535: Management of network slices and digital twins.
- RAN discussions: DT channel for AI training data collection under TR 38.843 Rel-18 scope.
- 6G phase: DT elevated from study to normative in projected Rel-21 (2028+).
§13.5 — Sub-THz Channel Characteristics
Sub-THz bands (100–300 GHz) targeted for 6G backhaul, indoor XR, and sensing exhibit propagation physics qualitatively different from 5G mmWave. AI channel models must be calibrated to these effects.
Path Loss at Sub-THz
where αmol(f) is the molecular absorption coefficient (dB/km), with sharp resonance peaks at approximately 60, 119, 183, and 325 GHz due to O2 and H2O rotational transitions. At 140 GHz (a relatively clear window), αmol ≈ 3–5 dB/km in typical atmosphere.
Key Sub-THz Channel Parameters
| Parameter | 5G mmWave (28 GHz) | 6G D-band (140 GHz) | Notes |
|---|---|---|---|
| Coherence bandwidth Bc | ~200 MHz | ~1 GHz | Bc ∝ 1/στ; fewer clusters at THz → larger Bc |
| Near-field threshold dNF | 2D²/λ = 0.5 m (D=5 cm) | 2D²/λ = 10 m (D=10 cm) | Most indoor links are near-field at D-band! |
| Reflection loss (concrete) | ~5 dB/bounce | ~15 dB/bounce | Quasi-optical — 1–2 bounces max |
| Human body blockage | ~20 dB | ~35 dB | Near-complete link outage without diversity |
| Max practical range (indoor) | ~50 m | ~10–20 m | Absorption + high reflection loss limit range |
| RMS delay spread στ | 50–100 ns | 10–20 ns | Fewer multipath components; sparser channel |
AI Modeling Strategies for Sub-THz
- Sub-THz channels are inherently sparse in the delay-angle domain (few scatterers survive high reflection losses).
- Compressed Sensing + DNN: use ℓ1-regularized sparse recovery as a physics-informed layer within the neural estimator.
- Result: 30% better NMSE than dense-network estimators by exploiting sparsity structure.
- Far-field: array response vector a(θ) = phase shifts only.
- Near-field: a(r,θ) = phase + amplitude taper (spherical wavefront).
- Polar domain transform: replace DFT with polar Fourier transform — creates sparse near-field representation.
- AI channel estimator trained in polar domain: 5 dB gain over FFT-domain estimator at d < dNF.
Accurate channel models are the foundation for rigorous AI/ML evaluation. → §14 covers the KPI framework and evaluation methodology that 3GPP TR 38.843 defines for measuring AI performance gains against these channel models.
§14 Performance Evaluation KPIs
3GPP Rel-18 TR 38.843 defines the formal evaluation methodology for AI/ML use cases in NR. It establishes quantitative KPIs, reference scenarios, baseline algorithms, and calibration procedures — the evidentiary standard that AI proposals must meet for normative adoption. KPI evaluation is inherently tied to the channel model used; → §13 provides the AI-based channel modeling context that underpins the scenario definitions here.
§14.1 — AI/ML KPI Framework (3GPP TR 38.843 §7)
Channel Estimation KPIs
NMSE (Normalized Mean Square Error) — primary accuracy metric:
Expressed in dB. Lower is better. Genie-aided MMSE (oracle channel statistics) sets the theoretical lower bound; AI model target is within 1 dB of this bound.
- BER / BLER: end-to-end link performance (MCS-specific) with AI estimator active — captures the impact on real throughput.
- Inference latency: time from last pilot symbol to H̃ available at receiver (ms) — must fit within L1 processing timeline (< 1 ms).
- Model size: KB of weights stored at UE — drives flash/SRAM cost.
- Pilot overhead: AI estimators may require additional pilots (must be accounted for in spectral efficiency).
CSI Feedback KPIs
- Channel NMSE at gNB: NMSE of reconstructed channel H̃ after AI feedback + decoding — analogous to channel estimation NMSE but measured at the transmitter.
- Beamforming gain loss: ΔG = Gideal − GAI (dB) — difference between ideal precoder gain (full channel knowledge) and AI-based precoder gain. Target: ΔG < 1 dB.
- Feedback overhead ratio: η = BAI / BType II basic — ratio of AI feedback bits to 5G NR Type II baseline. Target: η < 0.5 (50% overhead reduction).
- Robustness: NMSE degradation when input SNR differs from training SNR (distribution shift) — captured by NMSE vs SNR curve across the full operating range.
Beam Management KPIs
- Top-1 beam accuracy: P(\hat{b} = b*) — probability that the AI-predicted beam equals the optimal beam (highest received power). Baseline: Procedure 2 (P2) exhaustive sweep.
- Top-K beam accuracy: P(b* ∈ {b̂1,...,b̂K}) — optimal beam is within the AI's top-K candidates (relevant for beam recovery / fallback).
- Beam recovery time: time from link failure to recovery with new beam (ms) — AI proactive prediction target: zero-interruption recovery.
- UE energy saving: reduction in beam-sweep reference signal overhead (RSRP measurements) that translates to UE battery saving (%).
Positioning KPIs
- Horizontal Positioning Error (HPE) at 90th percentile: distribution of 2D position error; 90%-ile target tightened from 5G NR (3 m, indoors) to 6G-AI (< 0.3 m indoors).
- Vertical Positioning Error (VPE) at 90th percentile: relevant for multi-story buildings and UAV scenarios.
- Time to First Fix (TTFF): latency from UE initialization to first valid position estimate (s).
- Integrity: probability that actual error exceeds Protection Level — mandatory for safety-critical verticals (autonomous vehicles, industrial automation).
§14.2 — Evaluation Scenarios and Baselines
TR 38.843 specifies a two-tier evaluation structure: (1) a calibration tier using CDL/TDL channels to verify baseline compliance, and (2) a performance tier using scenario-specific channels to demonstrate net gain over existing 5G mechanisms.
| Use Case | Primary Metric | 5G Baseline | 6G-AI Target | Spec Reference |
|---|---|---|---|---|
| Channel Estimation | NMSE (dB) | MMSE: −15 dB @ SNR = 5 dB | −18 dB @ SNR = 5 dB | TR 38.843 §7.1 |
| CSI Feedback | Feedback bits/SB | Type II: 16 bit/subband | 8 bit/subband (AI) | TR 38.843 §7.2 |
| Beam Management | Top-1 accuracy @ 120 km/h | P2 sweep: 65% | 85% | TR 38.843 §7.3 |
| Positioning | HPE 90%-ile (m) | DL-TDOA: 1.2 m | 0.3 m | TR 38.843 §7.4 |
| Energy Efficiency | Network power saving | SON-based: 20% | ML-SON: 40% | TS 28.310 |
| Scheduling | 5th-percentile UE rate | Proportional Fair: baseline | DRL scheduler: +15% | TR 37.817 |
Reference System Configuration
Subcarrier spacing: 30 kHz | Bandwidth: 100 MHz | PRBs: 132 | gNB: 32 TRX (8H × 4V × 2pol) | UE: 2 RX antennas
Carrier: 3.5 GHz | PDSCH: DMRS Type 1, 1 symbol | MCS: QPSK–256QAM adaptive
Channel: CDL-A (NLOS), CDL-C (NLOS), CDL-D (LOS) | Evaluation: 104 channel realizations minimum per Monte Carlo point
Note: 6G evaluations will extend to FR3 (7–24 GHz) and sub-THz bands.
§14.3 — Evaluation Datasets
3GPP endorsed several open datasets for AI model training, validation, and benchmarking — moving toward reproducible, community-standard evaluation (analogous to ImageNet for computer vision).
- Ray-tracing based using Wireless InSite (commercial RT tool).
- Scenarios: O1 (outdoor-to-indoor), I3 (indoor office), O1_28B (28 GHz outdoor).
- Configurable: frequency, antenna size, UE grid resolution.
- Open source; widely used in 3GPP evaluations.
- deepmimo.net
QuaDRiGa
- Quadrature Dual-Reflector Antenna model — geometry-based stochastic (GBSM) with spatially consistent mobility.
- Reference implementation used in several 3GPP TR 36.873 and TR 38.901 simulations.
- Handles base station cooperation (multi-cell), elevation, and 3D mobility trajectories.
- Multi-link MIMO channel dataset from European COST 2100 action.
- Measured scenarios: indoor hall, semi-urban outdoor.
- Publicly available; captures multi-user interference structure.
- Used for massive MIMO precoder learning benchmarks.
Raymobtime
- Outdoor V2X ray tracing with realistic vehicular mobility (SUMO traffic simulator + Wireless InSite).
- Temporal sequences of channels along vehicle trajectories — critical for high-Doppler AI model evaluation.
- Available in multiple frequency bands (5.9 GHz, 28 GHz, 60 GHz).
- NMSE or BER vs. SNR curves — evaluated across the full operating SNR range (typically −10 to +30 dB), not a single operating point.
- Complexity in FLOPs and inference latency — reported for the target hardware tier (UE or gNB), demonstrating L1 timing budget compliance (< 1 ms).
- Overhead in bits — additional pilot or feedback overhead relative to the 5G NR baseline must be explicitly accounted for in the spectral efficiency comparison.
- Two deployment scenarios — at minimum one LOS (e.g., CDL-D) and one NLOS (e.g., CDL-A or CDL-C) scenario; results on a single channel type are insufficient.
- Comparison against classical baselines — both LS and MMSE (or equivalent for the use case) must be evaluated on identical channels and SNR points.
Calibration Procedure (CDL channels)
Before scenario-specific evaluation, AI models undergo a CDL calibration:
- Train model on CDL-A with v = 3 km/h (near-static, low Doppler).
- Evaluate on CDL-A/B/C/D/E at SNR ∈ {−10, 0, 5, 10, 20, 30} dB — verifies that model does not overfit to training channel type.
- Evaluate on CDL-A with v = 30, 120, 500 km/h — verifies Doppler robustness (or explicit retraining per velocity class).
- Compare NMSE vs MMSE (genie) and LS baselines on identical channels.
- Report: NMSE gap to MMSE at SNR = 5 dB (key operating point for coverage-limited UEs).
§14.4 — Overhead and Complexity KPIs
AI/ML inference in real-time L1 processing imposes concrete hardware constraints. 3GPP TR 38.843 §7.5 defines complexity KPIs that prevent academically compelling but practically infeasible models from entering the specification.
| Complexity KPI | Typical Range (AI models) | Practical Target | Comparator |
|---|---|---|---|
| FLOPs per inference | 106 – 108 | < 107 for UE | LDPC decoding: ~107 FLOPs |
| Inference latency | 0.1 – 5 ms | < 1 ms (L1 budget) | OFDM symbol duration: 35.7 μs (μ=1) |
| Model size (weights) | 10 KB – 10 MB | < 1 MB UE, < 10 MB gNB | LTE turbo code LUT: ~100 KB |
| Memory bandwidth | 1–100 MB/inference | < 10 MB/inference | L2 cache hit rate critical for latency |
| Update frequency | per-slot / per-frame / offline | per-frame or slower for UE | Model update signaling via RRC |
| Online training cost | N/A (frozen) or incremental | No online training at UE | Gradient compute = 3× forward pass |
FLOPs Analysis — ChannelNet Family
The ChannelNet family of channel estimators (ReEsNet, CsiNet, InterpolateNet variants) spans a wide complexity-accuracy trade-off:
- LS + 1-layer CNN: LS estimate → 1 conv layer denoiser. NMSE: −11 dB @ SNR = 5 dB. Near LS performance, minimal gain.
- CHEST-DNN: fully connected, 3 layers, 128 units. NMSE: −13 dB. Fast but misses spatial structure.
- ReEsNet (ResNet-based): 5 residual blocks, 3×3 conv. NMSE: −16 dB @ SNR = 5 dB. Near MMSE at 5.2×106 FLOPs.
- InterpolateNet: LS on pilots → learned 2D interpolation kernel. NMSE: −15.5 dB.
- Transformer estimator: attention over OFDM subcarriers. NMSE: −17.5 dB. Computationally heavy; requires hardware accelerator.
- OAMP-Net: deep unrolled OAMP (approximate message passing). NMSE: −18 dB. Converges in 5 iterations; each iteration ~2×106 FLOPs.
- Diminishing returns above 107 FLOPs — transformer gains < 1 dB over ReEsNet at 5× cost.
- Architecture-aware quantization: INT8 inference reduces FLOPs by ~4× with < 0.3 dB NMSE loss.
- Pruning: 70% weight pruning with retraining recovers 95% of original NMSE — reduces model size without accuracy loss.
- Optimal operating point for UE: ~5×106 FLOPs (ReEsNet-class) at < 500 μs latency on mobile SoC.
Model Lifecycle and Signaling Overhead
Beyond per-inference complexity, 3GPP must standardize the model lifecycle: how AI models are delivered to UE, updated, and version-managed:
- Model delivery: baseline model pre-loaded in UE firmware; delta updates via RRC (broadcast or dedicated) — target < 50 KB per update.
- Model activation: gNB signals which model ID to activate per cell / per UE via MAC-CE or RRC reconfiguration.
- Model monitoring: UE measures KPI metric (e.g., BLER) and reports to gNB when AI model performance degrades — triggers model switch or fallback to classical algorithm.
- Fallback: UE MUST always support classical baseline (LS, MMSE, P2 sweep) — AI model is an optional enhancement layer, not a replacement of baseline functionality.
- Model ID space: how many bits for model ID in MAC-CE? (Proposal: 4 bits → 16 models per cell.)
- Generalization specification: should standard mandate minimum cross-scenario NMSE (e.g., train on CDL-A, test on CDL-C must be within X dB)?
- Online adaptation: allowed at gNB side only (for Rel-18); UE-side online training deferred to Rel-19/20.
- Split inference: part of inference at UE (feature extraction), remainder at gNB — reduces UE complexity but introduces air-interface latency for intermediate tensor transmission.
Even with rigorous KPIs, significant challenges stand between current AI/ML results and live network deployment. → §15 examines the open research challenges — generalization, standardization gaps, computational constraints, and security — that must be resolved on the path to 6G.
Despite the substantial progress documented in §§3–14, the path from promising research results to production-grade 6G AI/ML remains blocked by a set of fundamental open problems. This section catalogues the five most critical challenge domains, examines their technical depth, and summarises the mitigation strategies currently under evaluation in 3GPP, O-RAN Alliance, and the broader academic community. None of these problems is fully solved; each represents an active frontier whose resolution will determine the pace and scope of AI integration in commercial networks.
15.1 The Generalization Problem
The single most consequential obstacle to deploying AI at the wireless physical layer is the generalization gap: AI/ML models trained on simulated or laboratory channels routinely underperform when transferred to real deployment sites. The gap has been reproduced across use-cases — channel estimation, CSI compression, beam prediction — and across hardware platforms, confirming that it is a structural property of how models are trained, not an artifact of any single implementation.
Taxonomy of generalization failure modes
- Distribution shift: The joint distribution P(x, y) differs between the training environment (CDL-A with TDL-A Doppler) and the deployment site (dense urban canyon with NLOS clusters). The model has never seen the deployment distribution and cannot extrapolate reliably.
- Covariate shift: The marginal input distribution P(x) changes while the conditional P(y|x) is unchanged. Examples include new UE types with different RF chain characteristics, or a new mobility pattern (e-scooter vs. pedestrian). Models that relied on implicit priors over P(x) fail silently.
- Concept drift: The underlying generative process changes over time. Seasonal foliage alters multipath delay spreads by 5–15 ns; new building construction introduces permanent scatterer clusters; infrastructure upgrades change antenna geometry. A static trained model degrades monotonically until retrained.
Quantifying the gap
Classical statistical learning theory provides a useful bound. Given a hypothesis class with VC dimension dVC and ntrain i.i.d. training samples, the generalization error is bounded with high probability by:
For deep networks, dVC scales with the number of parameters (often millions), so this bound is vacuous without domain-specific inductive biases. In practice, NN channel estimators trained exclusively on CDL-A have been shown to lose 3–5 dB NMSE when evaluated on measured urban channels — a loss that completely eliminates the gain over LS estimation that motivated the AI approach in the first place.
Mitigation strategies under study
| Strategy | Description | Status in 3GPP |
|---|---|---|
| Domain adaptation | Fine-tune a pre-trained model on a small set of real-world samples collected at the deployment site (100–1000 pilots). Transfer learning dramatically reduces the number of site-specific samples required. 3GPP AI use-case UC1 (channel estimation) considers online adaptation as a key feature of the Rel-19 framework. | Active — TR 38.843 Rel-19 |
| Domain randomization | Train on a wide distribution of channel conditions — spanning CDL-A/B/C, multiple delay spreads, Doppler spreads, and site-specific ray-tracing augmentation. The model is forced to learn invariant features rather than distribution-specific shortcuts. Cost: larger models and longer training time. | Research phase |
| Causal learning | Instead of learning P(y|x) from observational data, learn the causal structure of the wireless channel: which physical phenomena (reflection, diffraction, scattering) cause which channel impulse response features. Causal models are by construction more robust to distribution shift because the underlying physics does not change. | Research phase |
| Physics-informed NNs (PINNs) | Embed Maxwell's equations or simplified propagation models as soft constraints in the loss function. The NN is free to learn from data but is penalised for solutions that violate physical laws. Demonstrated 1.8 dB NMSE improvement on out-of-distribution channels vs. unconstrained NNs in recent academic results. | Pre-standardization |
15.2 The Standardization Gap
AI improves performance in simulation, but improving performance is not sufficient for standardization: 3GPP standards must guarantee interoperability between equipment from different vendors. AI introduces new sources of non-interoperability that have no precedent in the existing specification framework.
Root causes of non-interoperability
- Non-deterministic output: Two UEs running nominally the same AI model on different hardware (different NPU architectures, different floating-point rounding) may produce subtly different CSI feedback vectors. The gNB's decoder, optimized for a specific encoder, may fail to reconstruct the channel accurately from the "wrong" encoder's output. This breaks the fundamental assumption of 3GPP CSI standardization, where a known codebook guarantees decoder-side reconstruction.
- Model versioning: A gNB deployed by Vendor A ships a specific encoder model. A UE deployed by Vendor B ships a different encoder trained on different data. Even if both claim "3GPP Rel-18 AI CSI", they are not interoperable. The standard currently has no mechanism for model identity negotiation.
- Inference endpoint negotiation: For CSI feedback, should the encoder run at the UE (on-device inference, low latency) or at the gNB (model fully known, no interoperability issue)? For channel estimation, should the model run at the UE receiver or be provided as a service by the network? These architectural questions remain open.
- Graceful fallback: When the AI procedure fails — due to model mismatch, distribution shift, hardware fault, or deliberate misconfiguration — what is the standardized fallback? Current proposals suggest reverting to legacy (non-AI) procedures, but the triggering condition and transition mechanism are unstandardized.
3GPP working items addressing the gap
| Document | Scope | Status (2024) |
|---|---|---|
| TR 22.874 §7 | AI/ML model transfer and lifecycle management requirements | Complete (Rel-18) |
| TR 38.843 §8 | Standardization impact analysis for UC1/UC2/UC3 | Complete (Rel-18) |
| SA2 WI (Rel-19) | AI model metadata format, versioning, fallback procedures | In progress |
| RAN1 WI (Rel-19) | Normative CSI feedback AI encoder/decoder interface | Study phase |
The open question that will define the architecture of 6G AI: should 3GPP standardize the interface (encoder output format, as in today's codebook feedback), the training procedure (dataset, loss function, evaluation metric), or the model itself (fixed binary weights delivered over OTA update)? Each choice has profoundly different implications for vendor differentiation, update agility, and certification burden.
15.3 Computational and Energy Constraints
AI inference at the PHY layer (L1) is not a background task — it sits on the critical timing path. A 5G NR slot is 0.5–1 ms (depending on numerology), and channel estimation or CSI feedback must complete within a fraction of that budget. This imposes hard real-time constraints on AI inference that are fundamentally different from the datacenter inference workloads for which most deep learning hardware is designed.
Compute requirements
Consider a representative NN channel estimator: 3 dense layers, 200K parameters, Float32 arithmetic. Per-inference floating-point operations:
At a slot duration of 0.5 ms and assuming inference must complete in 50% of slot time:
This is well within current UE SoC capability (representative mobile SoC: 2–5 TOPS). However, the situation is more demanding for larger models or when multiple UEs share a gNB processing pool: with 256 simultaneous UE streams, gNB-side inference demand reaches 400+ GFLOPS, requiring dedicated AI accelerators.
Energy cost at the UE
Current NPU efficiency (2024 generation): ~1 TOPS/W. For a 200K-parameter NN:
- Per-inference energy: ~0.4 mJ
- At 2000 inferences/s (one per slot): 800 mW
- UE power budget for radio: ~500–1000 mW total
This means AI inference at full rate consumes a significant fraction of the entire UE radio power budget — before accounting for RF chain, baseband DSP, and application processor. Models must be aggressively compressed.
Compression techniques:
- Quantization (INT4/INT8): 4–8× reduction in compute and memory; typically <2% accuracy loss for well-calibrated models
- Pruning: Remove >90% of weights with <1 dB NMSE penalty for structured channel estimators
- Knowledge distillation: Train a small "student" model to mimic a large "teacher"; student can be 10× smaller with 80% of the performance gain
- Early exit: 60% of inputs (easy channel conditions) exit at layer 3; only hard cases traverse full depth — reduces average compute by 2–3×
15.4 Privacy and Security
Federated learning and AI-assisted network operation introduce attack surfaces that have no analogue in classical wireless system design. The wireless channel itself can be used as both an attack vector and a side-channel for exfiltrating model information.
Federated learning threat model
| Attack type | Mechanism | Impact | Primary defence |
|---|---|---|---|
| Gradient inversion | Reconstruct private training data from shared gradient updates | User location, movement pattern, device identity leakage | Differential privacy (add Gaussian noise to gradients) |
| Model poisoning | Malicious UEs submit adversarial gradients to corrupt global model | Degraded global model performance, targeted misclassification | Byzantine-robust aggregation (Krum, coordinate-wise median) |
| Free-rider attack | UE downloads global model without contributing genuine updates | Unfair resource consumption; model quality degradation over time | Contribution verification via secure aggregation |
| Membership inference | Determine whether a specific UE's data was used in training | Privacy violation; regulatory non-compliance (GDPR) | (ε, δ)-differential privacy guarantee |
Differential privacy tradeoff
Adding Gaussian noise with variance σ² to each gradient provides (ε, δ)-DP with:
where Δf is the L2 sensitivity of the gradient. Practical challenge: the noise level required for strong privacy (ε < 1) degrades model accuracy by 5–15% compared to non-private training. The privacy–utility tradeoff remains an open research problem in the wireless FL context, where gradient dimension is high (104–106) and UE participation is sparse.
AI model security beyond FL
- Adversarial perturbations: An attacker who can control or predict channel measurements (e.g., by using a reconfigurable intelligent surface as a signal reflector) may be able to perturb AI beam predictor inputs and induce wrong beam selection, causing service disruption without any packet injection.
- Model extraction attacks: An adversary queries the AI-assisted network with crafted channel probes and reverse-engineers the model parameters from the output — effectively stealing the vendor's intellectual property without access to training data.
- Physical adversarial attacks: Reflecting surfaces strategically placed near a base station can inject artificial multipath that systematically biases the training data for an online-learning channel estimator — a novel class of physical-layer data poisoning.
Adversarial examples — formal definition
The Fast Gradient Sign Method (FGSM) generates adversarial inputs by perturbing in the direction of the loss gradient:
where εadv is the adversarial perturbation budget (L∞ norm); this symbol is distinct from the DP privacy budget ε used earlier in this section. For beam management AI, x corresponds to the pilot measurement vector and the attacker uses RIS phase control to inject the perturbation over the air.
15.5 Regulatory and Spectrum Considerations
AI-learned waveforms and constellations — particularly those generated end-to-end by an autoencoder — present a novel regulatory challenge. Regulatory bodies worldwide grant spectrum licenses for defined transmission formats; an AI that autonomously generates a new waveform may operate in a regulatory grey area even if its physical emission envelope complies with existing masks.
Core regulatory constraints
- ITU-R conformance: ITU-R Radio Regulations require that systems operating in allocated bands conform to spectral emission masks defined in ITU-R recommendations. An AI waveform must be constrained to stay within these envelopes; unconstrained autoencoder output frequently violates out-of-band emission limits unless spectral masks are explicitly included in the training loss.
- Per-jurisdiction approval: A learned waveform not conforming to a recognized standard (IEEE 802.11, 3GPP NR) typically requires type approval in each regulatory jurisdiction separately. The approval timescales (6–18 months per jurisdiction) are incompatible with the iterative retraining cycles of deployed AI models.
- Interpretability requirements: Some regulators (FCC Rule 2.983, ETSI EN 301 893) require that manufacturers be able to demonstrate that their emission control mechanisms operate as specified. A black-box AI power controller cannot trivially satisfy this requirement — interpretable AI models may be mandated.
3GPP approach: constrained AI output
The working consensus in 3GPP RAN1 is to constrain AI outputs to comply with existing spectral emission masks rather than seeking new regulatory approvals. This means:
- AI models that generate waveform parameters (e.g., constellation symbols, precoding vectors) must have their output projected onto a feasible set defined by the RF mask.
- AI-controlled transmit power must remain bounded by existing maximum power levels (TS 38.101).
- AI beam management must not generate beams directed outside the equipment's regulatory geographic area.
These constraints limit some of the theoretically achievable performance gains from unconstrained AI but are necessary for regulatory tractability. The interpretability challenge — demonstrating to regulators that constrained AI reliably stays within bounds — remains an open problem.
Challenge maturity landscape
This whitepaper has surveyed the state of AI/ML integration across every major function of the 5G NR air interface and charted the trajectory toward a 6G that is AI-native by design. We close with a structured summary of achievements, a realistic timeline to 2030, and a perspective on what the transition to AI-native mobile networks means at the level of standardization methodology and systems engineering practice.
16.1 Summary of Achievements and 3GPP Status
The table below consolidates the key findings across §§3–14, mapping each AI/ML domain to its 3GPP standardization status, the primary specification reference, and the headline performance gain demonstrated over legacy (non-AI) baselines in 3GPP-defined evaluation scenarios.
| Domain | 3GPP Status | Key Specification | Performance Gain |
|---|---|---|---|
| Channel Estimation AI | Study complete (Rel-18) | TR 38.843 UC1 | +2–4 dB NMSE over LS/MMSE baselines |
| CSI Feedback AI | Study complete (Rel-18) | TR 38.843 UC2 | 50% overhead reduction at equal reconstruction quality |
| Beam Management AI | Study complete (Rel-18) | TR 38.843 UC3 | 85% top-1 accuracy; 30% beam sweeping reduction |
| Positioning AI | Under study (Rel-19) | TR 38.843 UC4 | <30 cm indoor (vs. <1 m for legacy RSTD) |
| Energy Efficiency AI | Ongoing (TS 28.310) | TR 37.816 | 40% RAN energy saving in low-load scenarios |
| Semantic Comms | Research phase (pre-normative) | TR 22.874 | 10–100× effective bandwidth for task-oriented data |
| Federated Learning | Study item (SA2) | TR 23.700-80 | Privacy-preserving model training without raw data sharing |
| 6G AI-Native | Vision document (Rel-20+) | SP-221500 | Full AI integration as first-class PHY/MAC function |
The arc is clear: AI began as an external optimisation layer applied to fixed 5G NR procedures (Rel-18 study items), is moving toward standardised interfaces and model management (Rel-19 work items), and will eventually become the primary design paradigm for the air interface itself in 6G. Each stage builds on the infrastructure — data collection, model transfer, monitoring — laid in the previous stage.
16.2 The Road to 2030
Translating the current research and study-item landscape into a deployment timeline requires accounting for both 3GPP normative timelines and the typical 2–3 year lag between specification completion and commercial network deployment.
Milestone summary
2024 — Release 18 (Rel-18 Frozen): AI plugged into existing 5G NR as optional enhancements. Three use cases (channel estimation, CSI feedback, beam management) are fully studied with evaluation methodology, performance benchmarks, and standardisation impact analysis. No normative AI procedures — legacy codebooks and reference signals remain mandatory fallback.
2025 — Release 19 (Rel-19 Work Items Active): AI-enhanced procedures move from study items to work items. Model management framework (metadata, versioning, lifecycle) standardised. First normative signalling for AI CSI feedback expected. O-RAN WG2 AI/ML workflow specification reaches v2.0 with normative xApp APIs.
2026–2027 — 6G Phase 1 Study / Release 20: 3GPP begins 6G Phase 1 specification (Rel-20 target: ~2027). AI-native air interface design is a primary architectural theme. Key questions: whether channel coding, MIMO precoding, and waveform generation can be partially replaced by end-to-end learned procedures while maintaining spectrum compatibility and regulatory compliance.
2028 — 6G Phase 1 Standard Frozen: First 6G specification with AI as a first-class PHY/MAC function. AI-assisted initial access, channel estimation, and beam management are normative. Legacy procedures remain as a fallback for the transition decade.
2030 — Commercial 6G Deployment: First commercial 6G networks. Fully AI-native operation in new deployments; existing 5G infrastructure upgraded over 5–10 year cycle. AI model management infrastructure (OTA update, performance monitoring, fallback triggering) fully operational in commercial deployments.
Pacing factors
The timeline above is optimistic and depends on resolution of the challenges documented in §15. The two most likely schedule-limiting factors are:
- Standardization gap resolution (§15.2): Until the model versioning and fallback problems are solved normatively, AI cannot become a mandatory component of the air interface. Rel-19 model management work items are the critical path.
- Generalization validation methodology (§15.1): 3GPP requires performance claims to be verified against a defined set of evaluation scenarios. For AI, this requires agreement on training datasets, evaluation channel models, and performance metrics. Reaching this consensus is likely to take 2–3 years of normative debate.
16.3 Final Perspective
The body of work surveyed in this whitepaper — from §3's foundations in channel estimation through §14's vision of AI-native 6G — represents only the first chapter of a long story. The study items of Release 18 will be remembered, in retrospect, the way we remember the first digital modems: as the moment when the trajectory changed, even if the full transformation was still decades away.
What is clear today is that the wireless industry has made an irreversible commitment. The investment in AI/ML standardisation infrastructure — the data collection frameworks (TR 37.817), the model management architectures (TR 22.874), the evaluation methodologies (TR 38.843) — creates institutional momentum that will carry AI-native design principles into every generation of wireless standards from this point forward. The research challenges of §15 are formidable but finite; the direction of the field is not in question.
- [1] ITU-R M.2160 (12/2023) — Framework and overall objectives of IMT for 2030 and beyond
- [2] ITU-R M.2150 (02/2022) — Detailed specifications of the terrestrial radio interfaces of IMT-2020
- [3] 3GPP TR 38.843 v18.0.0 — Study on AI/ML for NR Air Interface (Rel-18), 2024
- [4] 3GPP TR 22.874 v18.0.0 — Study on traffic characteristics and performance requirements for AI/ML model transfer
- [5] 3GPP TR 23.700-80 v18.0.0 — Study on AI/ML architecture enhancements
- [6] 3GPP TR 38.901 v17.0.0 — Study on channel model for frequencies from 0.5 to 100 GHz
- [7] 3GPP TS 28.310 v18.0.0 — Energy efficiency of 5G
- [8] 3GPP TR 37.817 v17.1.0 — Study on enhancement for data collection for NR and ENDC
- [9] 3GPP TR 37.816 v16.0.0 — Study on SON and the O&M aspects
- [10] O-RAN.WG2.AI-ML-v01.03 — AI/ML Workflow Description and Requirements
- [11] W. Wen et al., "Deep Learning for Massive MIMO CSI Feedback," IEEE WCL, vol. 7, no. 5, 2018
- [12] T. O'Shea and J. Hoydis, "An Introduction to Deep Learning for the Physical Layer," IEEE Trans. CogNet., 2017
- [13] H. McMahan et al., "Communication-Efficient Learning of Deep Networks from Decentralized Data," AISTATS, 2017
- [14] E. Bourtsoulatze et al., "Deep Joint Source-Channel Coding for Wireless Image Transmission," IEEE Trans. CogNet., 2019
- [15] 3GPP SP-221500 — 6G Vision Document (3GPP SA Plenary, 2022)
- [16] ITU-R IMT-2030 Focus Group Technical Report FG-IMT2030, 2022
- [17] 3GPP TR 38.855 v16.0.0 — Study on NR positioning support
- [17b] 3GPP TR 38.859 v18.0.0 — Study on Expanded and Improved NR Positioning (Rel-18 AI/ML-enhanced positioning study item)
- [18] 3GPP TR 38.874 v16.0.0 — Study on Integrated Access and Backhaul
- [19] A. Zappone et al., "Wireless Networks Design in the Era of Deep Learning," IEEE Trans. Commun., 2019
- [20] DeepMIMO Dataset — deepmimo.net (A. Alkhateeb et al.)
- [21] C. Chaccour et al., "Seven Defining Features of Terahertz (THz) Wireless Systems," IEEE Commun. Surveys Tuts., 2022
- [22] M. Chen et al., "A Joint Learning and Communications Framework for Federated Learning over Wireless Networks," IEEE Trans. Wireless Commun., 2021
- [23] 3GPP TR 38.843 v18.0.0, §8 — Standardization impact analysis for AI/ML NR air interface use cases
- [24] 3GPP TR 22.874 v18.0.0, §7 — AI/ML model management requirements, lifecycle, and transfer procedures
Academic Research References (illustrative examples — not 3GPP-standardised architectures)
- [A1] M. Soltani, V. Pourahmadi, A. Mirzaei, and H. Sheikhzadeh, "Deep Learning-Based Channel Estimation," IEEE Commun. Lett., vol. 23, no. 4, pp. 652–655, Apr. 2019. (Representative DNN-based channel estimator; ChannelNet-class architecture.)
- [A2] C.-K. Wen, W.-T. Shih, and S. Jin, "Deep Learning for Massive MIMO CSI Feedback," IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751, Oct. 2018. (CsiNet autoencoder for CSI feedback compression — addresses TR 38.843 UC2.)
- [A3] A. Vaswani et al., "Attention Is All You Need," in Advances in Neural Information Processing Systems (NeurIPS), 2017. (Transformer / multi-head attention mechanism underlying TransNet and similar CSI models.)
- [A4] E. Bourtsoulatze, D. B. Kurka, and D. Gündüz, "Deep Joint Source-Channel Coding for Wireless Image Transmission," IEEE Trans. Cognit. Commun. Netw., vol. 5, no. 3, pp. 567–579, Sep. 2019. (DeepJSCC — foundational JSCC paper; no cliff effect at low SNR.)
- [A5] T. O'Shea and J. Hoydis, "An Introduction to Deep Learning for the Physical Layer," IEEE Trans. Cognit. Commun. Netw., vol. 3, no. 4, pp. 563–575, Dec. 2017. (End-to-end autoencoder transceiver; learned constellations. Not adopted in any 3GPP release.)
- [A6] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, "Communication-Efficient Learning of Deep Networks from Decentralized Data," in Proc. AISTATS, 2017. (FedAvg — canonical federated learning aggregation algorithm.)
- [A7] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Smola, and V. Smith, "Federated Optimization in Heterogeneous Networks," in Proc. ICLR, 2020. (FedProx — addresses non-IID client drift in federated settings.)
- [A8] V. Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol. 518, pp. 529–533, Feb. 2015. (DQN — deep Q-network algorithm used for energy scheduling examples in §7.)
- [A9] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments," in Advances in NeurIPS, 2017. (MADDPG — CTDE multi-agent RL used for multi-cell scheduling in §8.)
- [A10] C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, "The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games," in Advances in NeurIPS, 2021. (MAPPO — on-policy cooperative MARL; recommended baseline for 6G multi-cell coordination.)
- [A11] C. Dwork and A. Roth, "The Algorithmic Foundations of Differential Privacy," Foundations and Trends in Theoretical Computer Science, vol. 9, nos. 3–4, pp. 211–407, 2014. (Foundational DP theory; Gaussian mechanism σ ≥ Δf√(2ln(1.25/δ))/ε.)
- [A12] M. Abadi et al., "Deep Learning with Differential Privacy," in Proc. ACM CCS, 2016. (DP-SGD — gradient clipping + Gaussian noise for differentially private federated training.)
- [A13] I. J. Goodfellow, J. Shlens, and C. Szegedy, "Explaining and Harnessing Adversarial Examples," in Proc. ICLR, 2015. (FGSM adversarial perturbation; εadv notation in §15.4 refers to perturbation budget, distinct from DP privacy budget ε.)