Google AI Researchers Train Robot Dogs to Fall Less and Move with More Agility
Good Spot, good military unmanned legged locomotion freestyle vehicle.
Jimmy (Tsung-Yen) Yang a student researcher at Google recently posted on Google’s AI blog something that is quite interesting in robotics. The dog robots once seen in at Boston Dynamics are now everywhere, including being trained in China for military use.
In China, various drones were heard saying during the lockdowns of 2022, “Control your Soul’s desire for freedom”. In Shanghai’s dystopian lockdown of April and May, 2022 - as someone filmed on a mobile phone, a drone appeared in the night sky above them. “Please comply with Covid restrictions,” a woman’s voice announced from a speaker, as a light blinked on and off. “Control your soul’s desire for freedom.”
I would argue that Google’s stranglehold on R&D involving artificial intelligence, internet Search, its android mobile app store, and digital advertising, is basically of the same nature. So it’s not entirely surprising Google is now working on algorithms to make robots more efficient.
Important note: Google (Alphabet) owned Boston Dynamics between 2013 and 2017.
Google and China of course both have something in common, they inherit and hold in privilege a new trend in capitalism called “surveillance capitalism”. Where data is the new oil and our behavior can be tracked across the entire spectrum of our online presence and footprint.
Teaching Robots to Develop Locomotion Skills
Jimmy’s blog article on their research of course is just a paper about the algorithm involved in robotics. You can read it here: Learning Locomotion Skills Safely in the Real World. BigTech companies who are digital advertising monopolies, even when in a position of being a media company, or doing scientific research pretend their pursuits don’t have any ethical issues regarding their applications.
Companies like Google will make robots more agile than humans could possibly be.
Effectively training an RL policy requires exploring a large set of robot states and actions, including many that are not safe for the robot.
This is a considerable risk, for example, when training a legged robot. Because such robots are inherently unstable, there is a high likelihood of the robot falling during learning, which could cause damage.
At last count Spot sells for around $74,000. Spot is a very undoggy agile mobile robot with a decidedly doggy name. Created by Boston Dynamics, cloned by China and many other companies at this point.
See Spot Run.
Run Spot Run.
The problem is of course as robots penetrate our human existence, their use for violence against people becomes inevitable. Just as when Google helped train killer drones to become more accurate with machine-learning, in Project Maven, this becomes an issue.
In a new paper, called remarkably, “Safe Reinforcement Learning for Legged Locomotion”, Google introduce a safe RL framework for learning legged locomotion while satisfying safety constraints during training.
Teaching Spot to Learn and Not Play Dead
Their goal is to learn locomotion skills autonomously in the real world without the robot falling during the entire learning process.
Their learning framework adopts a two-policy safe RL framework: a “safe recovery policy” that recovers robots from near-unsafe states, and a “learner policy” that is optimized to perform the desired control task. The safe learning framework switches between the safe recovery policy and the learner policy to enable robots to safely acquire novel and agile motor skills.
Summary of Paper
Designing control policies for legged locomotion is complex due to the under-actuated and non-continuous robot dynamics. Model-free reinforcement learning provides promising tools to tackle this challenge.
However, a major bottleneck of applying model-free reinforcement learning in real world is safety. In this paper, we propose a safe reinforcement learning framework that switches between a safe recovery policy that prevents the robot from entering unsafe states, and a learner policy that is optimized to complete the task.
The safe recovery policy takes over the control when the learner policy violates safety constraints, and hands over the control back when there are no future safety violations. We design the safe recovery policy so that it ensures safety of legged locomotion while minimally intervening in the learning process. Furthermore, we theoretically analyze the proposed framework and provide an upper bound on the task performance.
We verify the proposed framework in four locomotion tasks on a simulated and real quadrupedal robot: efficient gait, catwalk, two-leg balance, and pacing. On average, our method achieves 48.6% fewer falls and comparable or better rewards than the baseline methods in simulation. When deployed it on real-world quadruped robot, our training pipeline enables 34% improvement in energy efficiency for the efficient gait, 40.9% narrower of the feet placement in the catwalk, and two times more jumping duration in the two-leg balance. Our method achieves less than five falls over the duration of 115 minutes of hardware time.
Google essentially created algorithms for Spot to fall less while learning to become more agile. That sounds like a great idea!
You may remember in 2020, Google trained Robots better imitating real animals.
Learning Agile Robotic Locomotion Skills by Imitating Animals
In 2020 paper: “Learning Agile Robotic Locomotion Skills by Imitating Animals”, we present a framework that takes a reference motion clip recorded from an animal (a dog, in this case) and uses RL to train a control policy that enables a robot to imitate the motion in the real world.
Spot, what an agile police dog you have become? Control your soul's desire for freedom.
When I say “we”, I am of course referring to the Google AI paper’s researchers.
SRLLL - Safe Reinforcement Learning for Legged Locomotion
The Proposed Framework
Our goal is to ensure that during the entire learning process, the robot never falls, regardless of the learner policy being used. Similar to how a child learns to ride a bike, our approach teaches an agent a policy while using "training wheels", i.e., a safe recovery policy. We first define a set of states, which we call a “safety trigger set”, where the robot is close to violating safety constraints but can still be saved by a safe recovery policy. For example, the safety trigger set can be defined as a set of states with the height of the robots being below a certain threshold and the roll, pitch, yaw angles being too large, which is an indication of falls. When the learner policy results in the robot being within the safety trigger set (i.e., where it is likely to fall), we switch to the safe recovery policy, which drives the robot back to a safe state. We determine when to switch back to the learner policy by leveraging an approximate dynamics model of the robot to predict the future robot trajectory. For example, based on the position of the robot’s legs and the current angle of the robot based on sensors for roll, pitch, and yaw, is it likely to fall in the future? If the predicted future states are all safe, we hand the control back to the learner policy, otherwise, we keep using the safe recovery policy.
Spot is not a Terminator precursor, he’s just an imaginary robot dog. Control your soul's desire for freedom.
Since we know that whatever Google or others do in R&D China will infiltrate, steal the IP and clone on the field, I wonder if we should be making drones always more efficient with artificial intelligence? But I digress. Google AI’s motto is literally “Advancing AI for Everyone”. So long as you don’t do evil Spot?
Spot can turn it up a notch, at a moment’s notice:
Legged Locomotion Tasks
To demonstrate the effectiveness of the algorithm, we consider learning three different legged locomotion skills:
Efficient Gait: The robot learns how to walk with low energy consumption and is rewarded for consuming less energy.
Catwalk: The robot learns a catwalk gait pattern, in which the left and right two feet are close to each other. This is challenging because by narrowing the support polygon, the robot becomes less stable.
Two-leg Balance: The robot learns a two-leg balance policy, in which the front-right and rear-left feet are in stance, and the other two are lifted. The robot can easily fall without delicate balance control because the contact polygon degenerates into a line segment.
I have noticed in recent years Microsoft Research also do more R&D in robotics and drone A.I. optimization. All very harmless A.I. for Good I am sure.
Implementation Details
We use a hierarchical policy framework that combines RL and a traditional control approach for the learner and safe recovery policies. This framework consists of a high-level RL policy, which produces gait parameters (e.g., stepping frequency) and feet placements, and pairs it with a low-level process controller called model predictive control (MPC) that takes in these parameters and computes the desired torque for each motor in the robot. Because we do not directly command the motors’ angles, this approach provides more stable operation, streamlines the policy training due to a smaller action space, and results in a more robust policy.
The input of the RL policy network includes the previous gait parameters, the height of the robot, base orientation, linear, angular velocities, and feedback to indicate whether the robot is approaching the safety trigger set. We use the same setup for each task.
We train a safe recovery policy with a reward for reaching stability as soon as possible. Furthermore, we design the safety trigger set with inspiration from capturability theory. In particular, the initial safety trigger set is defined to ensure that the robot’s feet can not fall outside of the positions from which the robot can safely recover using the safe recovery policy. We then fine-tune this set on the real robot with a random policy to prevent the robot from falling.
Real-World Experiment Results
We report the real-world experimental results showing the reward learning curves and the percentage of safe recovery policy activations on the efficient gait, catwalk, and two-leg balance tasks. To ensure that the robot can learn to be safe, we add a penalty when triggering the safe recovery policy. Here, all the policies are trained from scratch, except for the two-leg balance task, which was pre-trained in simulation because it requires more training steps.
Overall, we see that on these tasks, the reward increases, and the percentage of uses of the safe recovery policy decreases over policy updates. For instance, the percentage of uses of the safe recovery policy decreases from 20% to near 0% in the efficient gait task.
For the two-leg balance task, the percentage drops from near 82.5% to 67.5%, suggesting that the two-leg balance is substantially harder than the previous two tasks. Still, the policy does improve the reward. This observation implies that the learner can gradually learn the task while avoiding the need to trigger the safe recovery policy. In addition, this suggests that it is possible to design a safe trigger set and a safe recovery policy that does not impede the exploration of the policy as the performance increases.
Training Spot to do Tricks While Falling Less
Google is essentially training Spot to fall less while doing harder tricks.
That a legged locomotive robot can stand on just two legs is albeit, impressive. The GIF depicting this is also somewhat creepy.
Boston Dynamics has for years tried to make viral videos of them kicking their robots and showing how agile they are.
Boston Dynamics calls Spot, a four-legged semi-autonomous robot or FLSAR. You can see the evolution of Boston Dynamics in this video.
That Google AI is working on this as well, is a bit eye opening. Tesla AI is working on Optimus as well, which it believes could be a leading general purpose robot in the decade ahead. It was announced at the company's Artificial Intelligence (AI) Day event on August 19, 2021. CEO Elon Musk claimed during the event that Tesla would likely build a prototype by 2022.
Conclusion
We presented a safe RL framework and demonstrated how it can be used to train a robotic policy with no falls and without the need for a manual reset during the entire learning process for the efficient gait and catwalk tasks. This approach even enables training of a two-leg balance task with only four falls. The safe recovery policy is triggered only when needed, allowing the robot to more fully explore the environment. Our results suggest that learning legged locomotion skills autonomously and safely is possible in the real world, which could unlock new opportunities including offline dataset collection for robot learning.
No model is without limitation. We currently ignore the model uncertainty from the environment and non-linear dynamics in our theoretical analysis. Including these would further improve the generality of our approach. In addition, some hyper-parameters of the switching criteria are currently being heuristically tuned. It would be more efficient to automatically determine when to switch based on the learning progress.
Furthermore, it would be interesting to extend this safe RL framework to other robot applications, such as robot manipulation. Finally, designing an appropriate reward when incorporating the safe recovery policy can impact learning performance. We use a penalty-based approach that obtained reasonable results in these experiments, but we plan to investigate this in future work to make further performance improvements.
Read the paper: https://arxiv.org/abs/2203.02638
Read the Blog: https://ai.googleblog.com/
Acknowledgements
We would like to thank our paper co-authors: Tingnan Zhang, Linda Luu, Sehoon Ha, Jie Tan, and Wenhao Yu. We would also like to thank the team members of Robotics at Google for discussions and feedback.
As robots become smarter and more numerous I may simply take the following approach. I will simply learn to control my soul’s desire for freedom.
What do you think?
If you want to support me so I can keep writing, please don’t hesitate to give me tips, a paid subscription or some donation. With a conversion rate of less than two percent, this Newsletter exists mostly by the grace of my goodwill (passion for A.I.) & my own experience of material poverty as I try to pivot into the Creator Economy.
Join 66 other paying subscribers will access to premium articles (which are getting more numerous).
Thanks for reading!
Create your profile
Only paid subscribers can comment on this post
Check your email
For your security, we need to re-authenticate you.
Click the link we sent to , or click here to sign in.