A paper coauthored by researchers at the University of Toronto, the Vector Institute, and the University of California, Berkeley proposes a new method that allows reinforcement learning algorithms to accumulate knowledge while erring on the side of caution in dangerous situations. They claim their proposed approach can achieve competitive performance while incurring lower catastrophic failure rates during training versus prior methods.
Reinforcement learning is a powerful framework because it allows agents to learn to make decisions automatically through trial and error. However, in the real world, the cost of those trials — and those errors — can be fairly high. For example, a drone that attempts to fly at high speed might crash and then be unable to attempt further trials due to physical damage. However, learning complex skills without any failures at all is likely impossible, making safe exploration methods desirable.
A number of previous approaches have tackled the problem of safe exploration including from DeepMind and OpenAI, but most of these approaches require assumptions like knowledge of unsafe states and obtaining safe policies after training. By contrast, this newly proposed safe reinforcement learning algorithm only assumes access to a sparse indicator for catastrophic failure and it trains a conservative safety critic that overestimates the probability of catastrophic failure.
The researchers tested their approach across several simulated environments using an open-source platform. One environment was “point agent and car navigation avoiding traps,” where an agent guided by the safe reinforcement learning algorithm had to navigate a maze while avoiding trap. Another was “Panda push without toppling,” where a robot arm had to push a vertically-placed block across the table to a location without the block toppling over. In “Panda push within boundary,” the arm had to push a block across the table without the block exiting rectangular lines. And in “Laikago walk without falling,” a quadruped robot had to walk without falling.
The results show that the safe reinforcement learning algorithm “demonstrated that the probability of failures is bounded throughout training and provided convergence results showing how ensuring safety does not severely bottleneck task performance,” according to the researchers. “We empirically validated our theoretical results and showed that we achieve high task performance while incurring low accidents during training,” they continued. “Although our approach bounds the probability of failure and is general in the sense that it does not assume access any user-specified constraint function, in situations where the task is difficult to solve, for example due to stability concerns of the agent, our approach will fail without additional assumptions. In such situations, some interesting future work directions would be to develop a curriculum of tasks to start with simple tasks where safety is easier to achieve, and gradually move towards more difficult tasks, such that the learned knowledge from previous tasks is not forgotten.”
How startups are scaling communication:
The pandemic is making startups take a close look at ramping up their communication solutions. Learn how
We Thank To Our Readers For Your All Contributes. We Still Seek Your Support In Pandemic CoronaVirus.
Donate Bellow For Better Future
If you registered an account, please enter your details below to login. If this is your first time, proceed to the donation form.