Albert Qu | UC Berkeley (original) (raw)
Papers by Albert Qu
bioRxiv (Cold Spring Harbor Laboratory), May 19, 2024
Dopamine release in the nucleus accumbens has been hypothesized to signal reward prediction error... more Dopamine release in the nucleus accumbens has been hypothesized to signal reward prediction error, the difference between observed and predicted reward, suggesting a biological implementation for reinforcement learning. Rigorous tests of this hypothesis require assumptions about how the brain maps sensory signals to reward predictions, yet this mapping is still poorly understood. In particular, the mapping is non-trivial when sensory signals provide ambiguous information about the hidden state of the environment. Previous work using classical conditioning tasks has suggested that reward predictions are generated conditional on probabilistic beliefs about the hidden state, such that dopamine implicitly reflects these beliefs. Here we test this hypothesis in the context of an instrumental task (a two-armed bandit), where the hidden state switches repeatedly. We measured choice behavior and recorded dLight signals reflecting dopamine release in the nucleus accumbens core. Model compari...
Materials Today Communications, 2018
Nano-structuring, especially surface nano-structuring techniques, has been widely utilized to obt... more Nano-structuring, especially surface nano-structuring techniques, has been widely utilized to obtain stronger and tougher metallic bulk materials. Such technique provides energy-efficient solutions of obtaining advanced materials. Among all these methods, surface mechanical attrition treatment and shot peening are considered as powerful and energy-saving methods of generating structural nano-crystalline layers on engineering and/or functional materials, such as iron-based alloys and superalloys. In this study, an unprecedented numerical simulation was developed to analyze the microstructure and stress-strain relationship under surface mechanical attrition treatment and shot peening technique cooperating with the support of experimental results.
During motor learning, as well as during neuroprosthetic learning, animals learn to control motor... more During motor learning, as well as during neuroprosthetic learning, animals learn to control motor cortex activity in order to generate behavior. Two different population of motor cortex neurons, intra-telencephalic (IT) and pyramidal tract (PT) neurons, convey the resulting cortical signals within and outside the telencephalon. Although a large amount of evidence demonstrates contrasting functional organization among both populations, it is unclear whether the brain can equally learn to control the activity of either class of motor cortex neurons. To answer this question, we used a Calcium imaging based brain-machine interface (CaBMI) and trained different groups of mice to modulate the activity of either IT or PT neurons in order to receive a reward. We found that animals learn to control PT neuron activity faster and better than IT neuron activity. Moreover, our findings show that the advantage of PT neurons is the result of characteristics inherent to this population as well as t...
ArXiv, 2019
This paper investigates methods for estimating the optimal stochastic control policy for a Markov... more This paper investigates methods for estimating the optimal stochastic control policy for a Markov Decision Process with unknown transition dynamics and an unknown reward function. This form of model-free reinforcement learning comprises many real world systems such as playing video games, simulated control tasks, and real robot locomotion. Existing methods for estimating the optimal stochastic control policy rely on high variance estimates of the policy descent. However, these methods are not guaranteed to find the optimal stochastic policy, and the high variance gradient estimates make convergence unstable. In order to resolve these problems, we propose a technique using Markov Chain Monte Carlo to generate samples from the posterior distribution of the parameters conditioned on being optimal. Our method provably converges to the globally optimal stochastic policy, and empirically similar variance compared to the policy gradient.
bioRxiv (Cold Spring Harbor Laboratory), May 19, 2024
Dopamine release in the nucleus accumbens has been hypothesized to signal reward prediction error... more Dopamine release in the nucleus accumbens has been hypothesized to signal reward prediction error, the difference between observed and predicted reward, suggesting a biological implementation for reinforcement learning. Rigorous tests of this hypothesis require assumptions about how the brain maps sensory signals to reward predictions, yet this mapping is still poorly understood. In particular, the mapping is non-trivial when sensory signals provide ambiguous information about the hidden state of the environment. Previous work using classical conditioning tasks has suggested that reward predictions are generated conditional on probabilistic beliefs about the hidden state, such that dopamine implicitly reflects these beliefs. Here we test this hypothesis in the context of an instrumental task (a two-armed bandit), where the hidden state switches repeatedly. We measured choice behavior and recorded dLight signals reflecting dopamine release in the nucleus accumbens core. Model compari...
Materials Today Communications, 2018
Nano-structuring, especially surface nano-structuring techniques, has been widely utilized to obt... more Nano-structuring, especially surface nano-structuring techniques, has been widely utilized to obtain stronger and tougher metallic bulk materials. Such technique provides energy-efficient solutions of obtaining advanced materials. Among all these methods, surface mechanical attrition treatment and shot peening are considered as powerful and energy-saving methods of generating structural nano-crystalline layers on engineering and/or functional materials, such as iron-based alloys and superalloys. In this study, an unprecedented numerical simulation was developed to analyze the microstructure and stress-strain relationship under surface mechanical attrition treatment and shot peening technique cooperating with the support of experimental results.
During motor learning, as well as during neuroprosthetic learning, animals learn to control motor... more During motor learning, as well as during neuroprosthetic learning, animals learn to control motor cortex activity in order to generate behavior. Two different population of motor cortex neurons, intra-telencephalic (IT) and pyramidal tract (PT) neurons, convey the resulting cortical signals within and outside the telencephalon. Although a large amount of evidence demonstrates contrasting functional organization among both populations, it is unclear whether the brain can equally learn to control the activity of either class of motor cortex neurons. To answer this question, we used a Calcium imaging based brain-machine interface (CaBMI) and trained different groups of mice to modulate the activity of either IT or PT neurons in order to receive a reward. We found that animals learn to control PT neuron activity faster and better than IT neuron activity. Moreover, our findings show that the advantage of PT neurons is the result of characteristics inherent to this population as well as t...
ArXiv, 2019
This paper investigates methods for estimating the optimal stochastic control policy for a Markov... more This paper investigates methods for estimating the optimal stochastic control policy for a Markov Decision Process with unknown transition dynamics and an unknown reward function. This form of model-free reinforcement learning comprises many real world systems such as playing video games, simulated control tasks, and real robot locomotion. Existing methods for estimating the optimal stochastic control policy rely on high variance estimates of the policy descent. However, these methods are not guaranteed to find the optimal stochastic policy, and the high variance gradient estimates make convergence unstable. In order to resolve these problems, we propose a technique using Markov Chain Monte Carlo to generate samples from the posterior distribution of the parameters conditioned on being optimal. Our method provably converges to the globally optimal stochastic policy, and empirically similar variance compared to the policy gradient.