Deep Reinforcement Learning-based Power Allocation in Uplink Cell-Free Massive MIMO (original) (raw)
Related papers
Deep Reinforcement Learning-Based Sum Rate Fairness Trade-Off for Cell-Free mMIMO
IEEE Transactions on Vehicular Technology
The uplink of a cell-free massive multiple-input multiple-output with maximum-ratio combining (MRC) and zeroforcing (ZF) schemes are investigated. A power allocation optimization problem is considered, where two conflicting metrics, namely the sum rate and fairness, are jointly optimized. As there is no closed-form expression for the achievable rate in terms of the large scale-fading (LSF) components, the sum rate fairness trade-off optimization problem cannot be solved by using known convex optimization methods. To alleviate this problem, we propose two new approaches. For the first approach, a use-and-then-forget scheme is utilized to derive a closed-form expression for the achievable rate. Then, the fairness optimization problem is iteratively solved through the proposed sequential convex approximation (SCA) scheme. For the second approach, we exploit LSF coefficients as inputs of a twin delayed deep deterministic policy gradient (TD3), which efficiently solves the non-convex sum rate fairness trade-off optimization problem. Next, the complexity and convergence properties of the proposed schemes are analyzed. Numerical results demonstrate the superiority of the proposed approaches over conventional power control algorithms in terms of the sum rate and minimum user rate for both the ZF and MRC receivers. Moreover, the proposed TD3-based power control achieves better performance than the proposed SCA-based approach as well as the fractional power scheme.
Exploiting Deep Learning in Limited-Fronthaul Cell-Free Massive MIMO Uplink
IEEE JSAC special issue on Multiple Antenna Technologies for Beyond 5G, 2020
A cell-free massive multiple-input multiple-output (MIMO) uplink is considered, where quantize-and-forward (QF) refers to the case where both the channel estimates and the received signals are quantized at the access points (APs) and forwarded to a central processing unit (CPU) whereas in combine-quantize-and-forward (CQF), the APs send the quantized version of the combined signal to the CPU. To solve the non-convex sum rate maximization problem, a heuristic sub-optimal scheme is exploited to convert the power allocation problem into a standard geometric programme (GP). We exploit the knowledge of the channel statistics to design the power elements. Employing large-scale-fading (LSF) with a deep convolutional neural network (DCNN) enables us to determine a mapping from the LSF coefficients and the optimal power through solving the sum rate maximization problem using the quantized channel. Four possible power control schemes are studied, which we refer to as i) small-scale fading (SSF)-based QF; ii) LSF-based CQF; iii) LSF use-and-then-forget (UatF)-based QF; and iv) LSF deep learning (DL)-based QF, according to where channel estimation is performed and exploited and how the optimization problem is solved. Numerical results show that for the same fronthaul rate, the throughput significantly increases thanks to the mapping obtained using DCNN.
Power Allocation in Multi-User Cellular Networks: Deep Reinforcement Learning Approaches
IEEE Transactions on Wireless Communications
The model-based power allocation algorithm has been investigated for decades, but it requires the mathematical models to be analytically tractable and it usually has high computational complexity. Recently, the data-driven model-free machine learning enabled approaches are being rapidly developed to obtain near-optimal performance with affordable computational complexity, and deep reinforcement learning (DRL) is regarded as of great potential for future intelligent networks. In this paper, the DRL approaches are considered for power control in multiuser wireless communication cellular networks. Considering the cross-cell cooperation, the off-line/on-line centralized training and the distributed execution, we present a mathematical analysis for the DRL-based top-level design. The concrete DRL design is further developed based on this foundation, and policy-based REINFORCE, value-based deep Q learning (DQL), actor-critic deep deterministic policy gradient (DDPG) algorithms are proposed. Simulation results show that the proposed data-driven approaches outperform the state-of-art modelbased methods on sum-rate performance, with good generalization power and faster processing speed. Furthermore, the proposed DDPG outperforms the REINFORCE and DQL in terms of both sum-rate performance and robustness, and can be incorporated into existing resource allocation schemes due to its generality.
Deep Learning-Aided Finite-Capacity Fronthaul Cell-Free Massive MIMO with Zero Forcing
IEEE ICC, 2020
We consider a cell-free massive multiple-input multiple-output (MIMO) system where the channel estimates and the received signals are quantized at the access points (APs) and forwarded to a central processing unit (CPU). Zero-forcing technique is used at the CPU to detect the signals transmitted from all users.. To solve the non-convex sum rate maximization problem, a heuristic sub-optimal scheme is proposed to convert the problem into a geometric programme (GP). Exploiting a deep convolutional neural network (DCNN) allows us to determine both a mapping from the large-scale fading (LSF) coefficients and the optimal power by solving the optimization problem using the quantized channel. Depending on how the optimization problem is solved, different power control schemes are investigated ; i) small-scale fading (SSF)-based power control; ii) LSF use-and-then-forget (UatF)-based power control; and iii) LSF deep learning (DL)-based power control. The SSF-based power control scheme needs to be solved for each coherence interval of the SSF, which is practically impossible in real time systems. Numerical results reveal that the proposed LSF-DL-based scheme significantly increases the performance compared to the practical and well-known LSF-UatF-based power control, thanks to the mapping obtained using DCNN.
Optimal Power Allocation for Rate Splitting Communications With Deep Reinforcement Learning
IEEE Wireless Communications Letters, 2021
This letter introduces a novel framework to optimize the power allocation for users in a Rate Splitting Multiple Access (RSMA) network. In the network, messages intended for users are split into different parts that are a single common part and respective private parts. This mechanism enables RSMA to flexibly manage interference and thus enhance energy and spectral efficiency. Although possessing outstanding advantages, optimizing power allocation in RSMA is very challenging under the uncertainty of the communication channel and the transmitter has limited knowledge of the channel information. To solve the problem, we first develop a Markov Decision Process framework to model the dynamic of the communication channel. The deep reinforcement algorithm is then proposed to find the optimal power allocation policy for the transmitter without requiring any prior information of the channel. The simulation results show that the proposed scheme can outperform baseline schemes in terms of average sum-rate under different power and QoS requirements.
Distributed Uplink Beamforming in Cell-Free Networks Using Deep Reinforcement Learning
arXiv (Cornell University), 2020
In a cell-free network, a large number of mobile devices are served simultaneously by several base stations (BSs)/access points(APs) using the same time/frequency resources. However, this creates high signal processing demands (e.g. for beamforming) at the transmitters and receivers. In this work, we develop centralized and distributed deep reinforcement learning (DRL)-based methods to optimize beamforming at the uplink of a cell-free network. First, we propose a fully centralized uplink beamforming method (i.e. centralized learning) that uses the Deep Deterministic Policy Gradient algorithm (DDPG) for an offline-trained DRL model. We then enhance this method, in terms of convergence and performance, by using distributed experiences collected from different APs based on the Distributed Distributional Deterministic Policy Gradients algorithm (D4PG) in which the APs represent the distributed agents of the DRL model. To reduce the complexity of signal processing at the central processing unit (CPU), we propose a fully distributed DRL-based uplink beamforming scheme. This scheme divides the beamforming computations among distributed APs. The proposed schemes are then benchmarked against two common linear beamforming schemes, namely, minimum mean square estimation (MMSE) and the simplified conjugate symmetric schemes. The results show that the D4PG scheme with distributed experience achieves the best performance irrespective of the network size. Furthermore, although the proposed distributed beamforming technique reduces the complexity of centralized learning in the DDPG algorithm, it performs better than the DDPG algorithm only for
2021
Heterogeneous network (HetNet) is now considered to be a promising technique for enhancing the coverage and reducing the transmit power consumption of the next 5G system. Deploying small cells such as femtocells in the current macrocell networks achieves great spatial reuse at the cost of severe cross-tier interference from concurrent transmission. In this situation, two novel energy efficient power control and resource allocation schemes in terms of energy efficiency (EE)-fairness and EE-maximum, respectively, are investigated in this paper. In the EE-fairness scheme, we aim to maximize the minimum EE of the femtocell base stations (FBSs). Generalized Dinkelbach's algorithm (GDA) is utilized to tackle this optimization problem and a distributed algorithm is proposed to solve the subproblem in GDA with limited intercell coordination, in which only a few scalars are shared among FBSs. In the EE-maximum scheme, we aim to maximize the global EE of all femtocells which is defined as the aggregate capacity over the aggregate power consumption in the femtocell networks. Leveraged by means of the lower-bound of logarithmic function, a centralized algorithm with limited computational complexity is proposed to solve the global EE maximization problem. Simulation results show that the proposed algorithms outperform previous schemes in terms of the minimum EE, fairness and global EE.
Deep Learning based Multi-User Power Allocation and Hybrid Precoding in Massive MIMO Systems
ArXiv, 2022
This paper proposes a deep learning based power allocation (DL-PA) and hybrid precoding technique for multi-user massive multiple-input multiple-output (MU-mMIMO) systems. We first utilize an angular-based hybrid precoding technique for reducing the number of RF chains and channel estimation overhead. Then, we develop the DL-PA algorithm via a fullyconnected deep neural network (DNN). DL-PA has two phases: (i) offline supervised learning with the optimal allocated powers obtained by particle swarm optimization based PA (PSO-PA) algorithm, (ii) online power prediction by the trained DNN. In comparison to the computationally expensive PSO-PA, it is shown that DL-PA greatly reduces the runtime by 98.6%-99.9%, while closely achieving the optimal sum-rate capacity. It makes DL-PA a promising algorithm for the real-time online applications in MU-mMIMO systems.
Deep Reinforcement Learning for Distributed Dynamic Power Allocation in Wireless Networks
arXiv (Cornell University), 2018
This work demonstrates the potential of deep reinforcement learning techniques for transmit power control in wireless networks. Existing techniques typically find near-optimal power allocations by solving a challenging optimization problem. Most of these algorithms are not scalable to large networks in real-world scenarios because of their computational complexity and instantaneous cross-cell channel state information (CSI) requirement. In this paper, a distributively executed dynamic power allocation scheme is developed based on model-free deep reinforcement learning. Each transmitter collects CSI and quality of service (QoS) information from several neighbors and adapts its own transmit power accordingly. The objective is to maximize a weighted sum-rate utility function, which can be particularized to achieve maximum sum-rate or proportionally fair scheduling. Both random variations and delays in the CSI are inherently addressed using deep Q-learning. For a typical network architecture, the proposed algorithm is shown to achieve near-optimal power allocation in real time based on delayed CSI measurements available to the agents. The proposed scheme is especially suitable for practical scenarios where the system model is inaccurate and CSI delay is non-negligible.