Optimal Combination of Imitation and Reinforcement Learning for Self-driving Cars (original) (raw)

2019, Revue d'Intelligence Artificielle

The two steps in human intelligence development, namely, mimicking and tentative application of expertise, are reflected by imitation learning (IL) and reinforcement learning (RL) in artificial intelligence (AI). However, the RL process does not always improve the skills learned from expert demonstrations and enhance the algorithm performance. To solve the problem, this paper puts forward a novel algorithm called optimal combination of imitation and reinforcement learning (OCIRL). First, the concept of deep q-learning from demonstrations (DQfD) was introduced to the actor-critic (A2C) model, creating the A2CfD model. Then, a threshold was estimated from a trained IL model with the same inputs and reward function with the DOfD, and applied to the A2CfD model. The threshold represents the minimum reward that conserves the learned expertise. The resulting A2CfDoC model was trained and tested on self-driving cars in both discrete and continuous environments. The results show that the model outperformed several existing algorithms in terms of speed and accuracy.

Sign up for access to the world's latest research.

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact