Deep Reinforcement Learning-Based Sum Rate Fairness Trade-Off for Cell-Free mMIMO (original) (raw)

Deep Reinforcement Learning-based Power Allocation in Uplink Cell-Free Massive MIMO

2022 IEEE Wireless Communications and Networking Conference (WCNC)

A cell-free massive multiple-input multiple-output (MIMO) uplink is investigated in this paper. We address a power allocation design problem that considers two conflicting metrics, namely the sum rate and fairness. Different weights are allocated to the sum rate and fairness of the system, based on the requirements of the mobile operator. The knowledge of the channel statistics is exploited to optimize power allocation. We propose to employ large scale-fading (LSF) coefficients as the input of a twin delayed deep deterministic policy gradient (TD3). This enables us to solve the non-convex sum rate fairness trade-off optimization problem efficiently. Then, we exploit a use-and-then-forget (UatF) technique, which provides a closedform expression for the achievable rate. The sum rate fairness trade-off optimization problem is subsequently solved through a sequential convex approximation (SCA) technique. Numerical results demonstrate that the proposed algorithms outperform conventional power control algorithms in terms of both the sum rate and minimum user rate. Furthermore, the TD3-based approach can increase the median of sum rate by 16%-46% and the median of minimum user rate by 11%-60% compared to the proposed SCA-based technique. Finally, we investigate the complexity and convergence of the proposed scheme. cc Index terms-Cell-free massive MIMO, deep reinforcement learning, fairness, power control, sequential convex approximation.

Exploiting Deep Learning in Limited-Fronthaul Cell-Free Massive MIMO Uplink

IEEE JSAC special issue on Multiple Antenna Technologies for Beyond 5G, 2020

A cell-free massive multiple-input multiple-output (MIMO) uplink is considered, where quantize-and-forward (QF) refers to the case where both the channel estimates and the received signals are quantized at the access points (APs) and forwarded to a central processing unit (CPU) whereas in combine-quantize-and-forward (CQF), the APs send the quantized version of the combined signal to the CPU. To solve the non-convex sum rate maximization problem, a heuristic sub-optimal scheme is exploited to convert the power allocation problem into a standard geometric programme (GP). We exploit the knowledge of the channel statistics to design the power elements. Employing large-scale-fading (LSF) with a deep convolutional neural network (DCNN) enables us to determine a mapping from the LSF coefficients and the optimal power through solving the sum rate maximization problem using the quantized channel. Four possible power control schemes are studied, which we refer to as i) small-scale fading (SSF)-based QF; ii) LSF-based CQF; iii) LSF use-and-then-forget (UatF)-based QF; and iv) LSF deep learning (DL)-based QF, according to where channel estimation is performed and exploited and how the optimization problem is solved. Numerical results show that for the same fronthaul rate, the throughput significantly increases thanks to the mapping obtained using DCNN.

Optimal Power Allocation for Rate Splitting Communications With Deep Reinforcement Learning

IEEE Wireless Communications Letters, 2021

This letter introduces a novel framework to optimize the power allocation for users in a Rate Splitting Multiple Access (RSMA) network. In the network, messages intended for users are split into different parts that are a single common part and respective private parts. This mechanism enables RSMA to flexibly manage interference and thus enhance energy and spectral efficiency. Although possessing outstanding advantages, optimizing power allocation in RSMA is very challenging under the uncertainty of the communication channel and the transmitter has limited knowledge of the channel information. To solve the problem, we first develop a Markov Decision Process framework to model the dynamic of the communication channel. The deep reinforcement algorithm is then proposed to find the optimal power allocation policy for the transmitter without requiring any prior information of the channel. The simulation results show that the proposed scheme can outperform baseline schemes in terms of average sum-rate under different power and QoS requirements.

Power Allocation in Multi-User Cellular Networks: Deep Reinforcement Learning Approaches

IEEE Transactions on Wireless Communications

The model-based power allocation algorithm has been investigated for decades, but it requires the mathematical models to be analytically tractable and it usually has high computational complexity. Recently, the data-driven model-free machine learning enabled approaches are being rapidly developed to obtain near-optimal performance with affordable computational complexity, and deep reinforcement learning (DRL) is regarded as of great potential for future intelligent networks. In this paper, the DRL approaches are considered for power control in multiuser wireless communication cellular networks. Considering the cross-cell cooperation, the off-line/on-line centralized training and the distributed execution, we present a mathematical analysis for the DRL-based top-level design. The concrete DRL design is further developed based on this foundation, and policy-based REINFORCE, value-based deep Q learning (DQL), actor-critic deep deterministic policy gradient (DDPG) algorithms are proposed. Simulation results show that the proposed data-driven approaches outperform the state-of-art modelbased methods on sum-rate performance, with good generalization power and faster processing speed. Furthermore, the proposed DDPG outperforms the REINFORCE and DQL in terms of both sum-rate performance and robustness, and can be incorporated into existing resource allocation schemes due to its generality.

Deep Reinforcement Based Power Allocation for the Max-Min Optimization in Non-Orthogonal Multiple Access

IEEE Access, 2020

NOMA is a radio access technique that multiplexes several users over the frequency resource and provides high throughput and fairness among different users. The maximization of the minimum the data-rate, also known as max-min, is a popular approach to ensure fairness among the users. NOMA optimizes the transmission power (or power-coefficients) of the users to perform max-min. The problem is a constrained non-convex optimization for users greater than two. We propose to solve this problem using the Double Deep Q Learning (DDQL) technique, a popular method of reinforcement learning. The DDQL technique employs a Deep Q- Network to learn to choose optimal actions to optimize users’ power-coefficients. The model of the Markov Decision Process (MDP) is critical to the success of the DDQL method, and helps the DQN to learn to take better actions. An MDP model is proposed in which the state consists of the power-coefficients values, data-rate of users, and vectors indicating which of the p...

Distributed Uplink Beamforming in Cell-Free Networks Using Deep Reinforcement Learning

arXiv (Cornell University), 2020

In a cell-free network, a large number of mobile devices are served simultaneously by several base stations (BSs)/access points(APs) using the same time/frequency resources. However, this creates high signal processing demands (e.g. for beamforming) at the transmitters and receivers. In this work, we develop centralized and distributed deep reinforcement learning (DRL)-based methods to optimize beamforming at the uplink of a cell-free network. First, we propose a fully centralized uplink beamforming method (i.e. centralized learning) that uses the Deep Deterministic Policy Gradient algorithm (DDPG) for an offline-trained DRL model. We then enhance this method, in terms of convergence and performance, by using distributed experiences collected from different APs based on the Distributed Distributional Deterministic Policy Gradients algorithm (D4PG) in which the APs represent the distributed agents of the DRL model. To reduce the complexity of signal processing at the central processing unit (CPU), we propose a fully distributed DRL-based uplink beamforming scheme. This scheme divides the beamforming computations among distributed APs. The proposed schemes are then benchmarked against two common linear beamforming schemes, namely, minimum mean square estimation (MMSE) and the simplified conjugate symmetric schemes. The results show that the D4PG scheme with distributed experience achieves the best performance irrespective of the network size. Furthermore, although the proposed distributed beamforming technique reduces the complexity of centralized learning in the DDPG algorithm, it performs better than the DDPG algorithm only for

Deep Learning-Aided Finite-Capacity Fronthaul Cell-Free Massive MIMO with Zero Forcing

IEEE ICC, 2020

We consider a cell-free massive multiple-input multiple-output (MIMO) system where the channel estimates and the received signals are quantized at the access points (APs) and forwarded to a central processing unit (CPU). Zero-forcing technique is used at the CPU to detect the signals transmitted from all users.. To solve the non-convex sum rate maximization problem, a heuristic sub-optimal scheme is proposed to convert the problem into a geometric programme (GP). Exploiting a deep convolutional neural network (DCNN) allows us to determine both a mapping from the large-scale fading (LSF) coefficients and the optimal power by solving the optimization problem using the quantized channel. Depending on how the optimization problem is solved, different power control schemes are investigated ; i) small-scale fading (SSF)-based power control; ii) LSF use-and-then-forget (UatF)-based power control; and iii) LSF deep learning (DL)-based power control. The SSF-based power control scheme needs to be solved for each coherence interval of the SSF, which is practically impossible in real time systems. Numerical results reveal that the proposed LSF-DL-based scheme significantly increases the performance compared to the practical and well-known LSF-UatF-based power control, thanks to the mapping obtained using DCNN.

Learning-based Precoding-aware Radio Resource Scheduling for Cell-free mMIMO Networks

Communication by jointly precoded transmission from many distributed access points (APs), called cell-free massive multiple-input multiple-output (CF mMIMO), is a promising concept for beyond 5G systems. One of the challenging aspects of CF mMIMO is an efficient management of the radio resources. We propose both reinforcement learning (RL)-based and heuristic precoding aware radio resource scheduling (RRS) algorithms aiming at maximizing sum spectral efficiency (SE). The proposed algorithms allocate resources for Maximum Ratio Transmission (MRT), Zero-Forcing (ZF) and Regularised Zero-Forcing (RZF) precoders. For the resource allocation, both the set of serving APs and the Physical Resource Blocks are considered. In high noise scenarios, the proposed RL-based RRS algorithm combined with the MRT precoder shows 2.4 times higher sum SE than the standard Round Robin scheduler. Moreover, we demonstrate that the proposed heuristic algorithms offer similar sum SE while significantly reduci...

Joint Energy-efficient and Throughput-sufficient Transmissions in 5G Cells with Deep Q-Learning

IEEE International Mediterranean Conference on Communications and Networking (MeditCom), Athens, Greece,, 2021

As a consequence of the 5G network densification and heterogeneity, there is a competitive relationship between the sufficient satisfaction of the cell users and the powerefficiency of 5G transmissions. This paper proposes a Deep Q-Learning (DQL) based power configuration algorithm by jointly optimizing the energy-efficiency (EE) and throughputadequacy (JET) of 5G cells. The algorithm exploits the user demands to effectively learn-and-improve the user fulfillment rate, while ensuring cost-efficient power adjustment. To evaluate the potency of the developed methodology, several validation setups were conducted comparing the outcomes of the JET-DQL with those derived from conventional power control schemes, namely a Water-filling (WF) algorithm, a weighted minimum mean squared error (WMMSE) method, a heuristic solution and three fixed power allocation policies. JET-DQL algorithm exhibits a remarkable trade-off between the allocated throughput (ensuring high user satisfaction rates and average behavior in total allocated throughput relative to baselines), while resulting into low-valued (almost minimum) power configurations. In particular, even for strict demand scenarios, JET-DQL outperforms the other baselines with respect to EE showing a gain of 2.9-4.5 relative to others, although it does not provide the optimal sum-rate utility and minimum power levels.

Deep Reinforcement Learning for Joint Spectrum and Power Allocation in Cellular Networks

2021 IEEE Globecom Workshops (GC Wkshps), 2021

A wireless network operator typically divides the radio spectrum it possesses into a number of subbands. In a cellular network those subbands are then reused in many cells. To mitigate co-channel interference, a joint spectrum and power allocation problem is often formulated to maximize a sum-rate objective. The best known algorithms for solving such problems generally require instantaneous global channel state information and a centralized optimizer. In fact those algorithms have not been implemented in practice in large networks with time-varying subbands. Deep reinforcement learning algorithms are promising tools for solving complex resource management problems. A major challenge here is that spectrum allocation involves discrete subband selection, whereas power allocation involves continuous variables. In this paper, a learning framework is proposed to optimize both discrete and continuous decision variables. Specifically, two separate deep reinforcement learning algorithms are designed to be executed and trained simultaneously to maximize a joint objective. Simulation results show that the proposed scheme outperforms both the state-of-the-art fractional programming algorithm and a previous solution based on deep reinforcement learning.