|
Building 6G Radio Foundation Models with Transformer Architectures
Foundation deep learning (DL) models are general models, designed to learn general, robust and adaptable repre- sentations of their target modality, enabling finetuning across a range of downstream tasks. These models are pretrained on large, unlabeled datasets using self-supervised learning (SSL). Foundation models have demonstrated better generalization than traditional supervised approaches, a critical requirement for wire- less communications where the dynamic environment demands model adaptability. In this work, we propose and demonstrate the effectiveness of a Vision Transformer (ViT) as a radio foun- dation model for spectrogram learning. We introduce a Masked Spectrogram Modeling (MSM) approach to pretrain the ViT in a self-supervised fashion. We evaluate the ViT-based foundation model on two downstream tasks: Channel State Information (CSI)- based Human Activity sensing and Spectrogram Segmentation. Experimental results demonstrate competitive performance to supervised training while generalizing across diverse domains. Notably, the pretrained ViT model outperforms a four-times larger model that is trained from scratch on the spectrogram segmentation task, while requiring significantly less training time, and achieves competitive performance on the CSI-based human activity sensing task. This work demonstrates the effectiveness of ViT with MSM for pretraining as a promising technique for scalable foundation model development in future 6G networks. https://arxiv.org/abs/2411.09996 6G WavesFM: A Foundation Model for Sensing, Communication, and Localization This paper introduces WavesFM, a novel WFM, capable of supporting communication, sensing, and localization tasks. The model processes image-like wireless modalities, such as spectrograms and channel state information (CSI), and in-phase and quadrature (IQ) signals arranged as orthogonal frequency-division multiplexing (OFDM) resource grids. We demonstrate the strong generalization capabilities of WavesFM through extensive experiments on four downstream tasks: Fifth Generation New Radio (5G NR) positioning; multiple‑input multiple‑output OFDM (MIMO‑OFDM) channel estimation; human activity sensing; and radio‑frequency (RF) signal classification. Compared to supervised baselines trained individually, our approach achieves superior performance while sharing $80\%$ of its parameters across tasks. Furthermore, we show that pretraining on domain‑relevant data not only boosts performance but also accelerates convergence, reducing training time by up to $5$×. Additionally, we incorporate Low-Rank Adaptation (LoRA) fine-tuning, which enables full parameter sharing across tasks, significantly reducing memory overhead without compromising performance. These results demonstrate that our unified WFM can support diverse tasks and deliver significant gains in both performance and efficiency, highlighting the transformative potential of WFMs to drive AI‑native paradigms in future sixth‑generation (6G) networks. https://ieeexplore.ieee.org/iel8/8782661/10829557/11131142.pdf Tiny Federated Wireless Foundation Models for Resource Constrained Devices Deploying large-scale foundation models (FMs) in resource-constrained devices presents critical challenges due to their substantial computational and memory requirements. This is particularly relevant for multi-task wireless sensing FMs running on sensors. To overcome these limitations, we propose a tiny federated wireless foundation model (WFM) framework that combines spectrogram-guided structured block-wise pruning with federated learning (FL) for efficient on-device deployment. Our approach prunes non-essential encoder blocks in vision transformers (ViTs) by leveraging the masked spectrogram modeling (MSM) pretraining loss as an importance indicator, ensuring only the most structurally significant components are retained. This enables federated adaptation with frozen backbones and lightweight, task-specific heads, minimizing both computational burden and communication overhead. The pruning strategy preserves the integrity of spectrogram reconstruction, while federated fine-tuning supports decentralized learning across clients with heterogeneous data distributions. Experimental results on human activity sensing and radio signal identification tasks confirm the efficacy of our approach. Specifically, the pruned ViT-based WFMs achieve up to 93\% multiply-accumulate operations (MACs) reduction, 85\% lower CPU inference time, and 49\% reduction in communication overhead, all while maintaining high task accuracy. Our method demonstrates strong generalization and robustness across varying pruning ratios and data heterogeneity levels, while substantially reducing communication overhead, making it highly suitable for real-world industrial IoT deployments. https://ieeexplore.ieee.org/abstract/document/11087489/ IQFM A Wireless Foundational Model for I/Q Streams in AI-Native 6G Foundation models have shown remarkable potential in natural language processing and computer vision, yet remain in their infancy in wireless communications. While a few efforts have explored image-based modalities such as channel state information (CSI) and frequency spectrograms, foundation models operating directly on raw in-phase/quadrature (IQ) signals remain largely unexplored. This paper presents IQFM, the first foundation model for wireless communications built on raw IQ data. IQFM supports diverse tasks including modulation classification, angle-of-arrival (AoA), beam prediction, and RF fingerprinting, without handcrafted features or heavy preprocessing. We introduce a task-aware augmentation strategy that categorizes transformations into core augmentations, such as cyclic time shifting, and task-specific augmentations, enabling structured representation learning within a contrastive self-supervised framework. Using this approach, a lightweight encoder pre-trained on over-the-air multi-antenna IQ data achieves 99.67\% accuracy on modulation classification and 65.45\% on AoA classification with only one labeled sample per class through linear probing, far outperforming supervised baselines. With minimal Low-Rank Adaptation (LoRA) updates, IQFM generalizes to out-of-distribution tasks, showing cross-dataset generalization on modulation classification and strong performance on unseen tasks such as beam prediction and RF fingerprinting. These results establish raw IQ-based foundation models as efficient, reusable encoders for multi-task learning in AI-native 6G systems. [2506.06718] IQFM A Wireless Foundational Model for I/Q Streams in AI-Native 6G Self-supervised Radio Representation Learning: Can we Learn Multiple Tasks? Artificial intelligence (AI) is anticipated to play a pivotal role in 6G. However, a key challenge in developing AI- powered solutions is the extensive data collection and labeling efforts required to train supervised deep learning models. To overcome this, self-supervised learning (SSL) approaches have re- cently demonstrated remarkable success across various domains by leveraging large volumes of unlabeled data to achieve near- supervised performance. In this paper, we propose an effective SSL scheme for radio signal representation learning using mo- mentum contrast. By applying contrastive learning, our method extracts robust, transferable representations from a large real- world dataset. We assess the generalizability of these learned rep- resentations across two wireless communications tasks: angle of arrival (AoA) estimation and automatic modulation classification (AMC). Our results show that carefully designed augmentations and diverse data enable contrastive learning to produce high- quality, invariant latent representations. These representations are effective even with frozen encoder weights, and fine-tuning further enhances performance, surpassing supervised baselines. To the best of our knowledge, this is the first work to propose and demonstrate the effectiveness of self-supervised learning for radio signals across multiple tasks. Our findings highlight the potential of self-supervised learning to transform AI for wireless communications by reducing dependence on labeled data and improving model generalization − paving the way for scalable foundational 6G AI models and solutions. https://arxiv.org/pdf/2509.03077 |