Quality Criterias for Evaluating Robot Research
Introduction
Evaluating robot research presents a unique challenge in the field of computer science in particular when it comes to comparability and reproducibility. While tools like docker enabled researchers to easily test and evaluate software solutions, similar tools are more complicated in robotics. While simulation environments can give an idea on the functionality and usability of a presented approach, they usually are unable to confirm the real life results that are often presented in robotics research, while also being time-consuming to create. In this article we outline problems and challenges researchers are faced with, as well as approaches and solutions to better evaluate robot research.
Reproducibility
Reproducibility is the biggest hurdle when comparing results of robotics research. Different labs have different robot hardware, from platforms with no manipulator to two manipulators, from pincer like grippers to grippers shaped like human hands, from differential drives to omnidirectional drives or even bipedal robots, from robots with human like features to storage robots. On top of that, the environment that are setup for these different platforms to operate in can be very varied as well. From very static and standardized setups with fixed robots and environment features with qr-markers to non standardized scenarios with a variable environment. All these different platforms and setups require markedly different software solutions which makes it difficult to test and compare. For example, a newly presented mapping approach with navigational planning by researches with an omnidirectional platform with long range lidar sensors is very difficult to impossible to evaluate by researchers with a robot with differential drive and short range sonar sensors. Or, an approach to generate and execute different grasps presented by a lab with a very fixed setup will always outperform another lab that is testing their manipulation pipeline in varied non standard environments in terms of traditional metrics. But since the two setups are so different it is very difficult to definitively say which approach works better.
Although, the differences in platform and setup present problems for reproducibility, they could be overcome by porting and adapting the approach to fit different platforms. This is often very time consuming and considered not worth the effort. One way to alleviate this is cooperation. It should be in the interest of researchers to help others to adapt their approach to work on different platforms, and vice versa to try to get help to test approaches on their own platform.
The second solution is to create standard scenarios so that multiple research groups worldwide test and work in the same environment. Probably the biggest example of this is currently the Robocup competition. Here, through different leagues researchers can test different hardware in standardized environments and challenges, or even the same hardware in so called standard leagues. Creating these kind of challenges and setups for researchers to benchmark their approaches in should become a higher prioritized goal to facilitate reproducibility and comparability in robotics.
Reinforcement Learning for Robotics
Ongoing research is concerned with learning-based approaches to accomplish complex robotic manipulation tasks. The goal of learning control is to derive a policy π(s) that maps an observations vector o to an action vector a, such that the desired behavior is achieved. Recent developments focus on policy search methods, that are suitable to robotics because they scale with high dimensional state- and action-spaces in contrast to value-based methods.
The Reinforcement Learning (RL) framework provides a powerful possibility to learn based on interaction with the environment. A policy π(s) maps state observations gathered by sensors onto motor commands. Execution of those motor commands change the environment state, because the robot and manipulated parts are moved. A reward function is used to encode the task objectives and assigns a real-valued number to each state-action-pair that rates how “good” a specific state-action-pair is with regard to the task objectives. During the reinforcement learning process the policy is optimized to maximize the sum of all rewards.
Trial Efficiency
In recent years applications in robotics evolved rapidly with the advent of model-free deep RL policy gradient methods that scale with high-dimensional search spaces and have the potential to solve a wide range of tasks including dexterous manipulation. Deep neural networks represent a policy that maps state observations to actions and generalizes over the high-dimensional search space. Main drawbacks limiting the application in industry are low trial efficiency, vulnerability of hyper parameter configurations and that the algorithms tend to get stuck in locally optimal solutions.
Manipulation Primitives as Robot Agnostic Actions
Finkemeyer et al. describe MPs as a tuple of (1) an atomic hybrid motion $HM$ describing a hybrid force- or position control policy for each translational and rotational direction based on the Task Frame Formalism (TFF) (2) a tool command τ and (3) a Stop-Condition ƛ :
MP:={HM, τ, ƛ}
Following the TFF approach the control set-points for each direction (three translational d and three rotational θ directions) of the hybrid motion HM are given with respect to a task frame TF. There are three types of controllers: force/torque controller, position controller and velocity controller. Each direction is controlled by one controller. Thereby, the type of controller can differ in all directions. E.g. the movement along the x-axis can be force controlled and simultaneously the movement along the z-axis can be position controlled.
Finkemeyer et al. extended the TFF by a reference frame that allows to couple the task frame to an arbitrary frame in the work cell. For one thing, the task frame can be attached to the end-effector. This setup is comparable to the classical TF approach. During the robot motion, the task frame is adapted accordingly. For another thing, the task frame can be attached to a fixed or moving frame in the work cell e.g. a conveyor belt that carries the workpieces. How to track the task frame during execution is defined by a tracking mode.
Learning based on manipulation primitives in the operational cartesian space can boost the sample efficiency and at the same time allows to learn robot agnostic policies.