Visual servoing has been a viable method of robot manipulator control for more than a decade. It involves the use of one or more cameras and a computer vision system to control the position of the robot's end-effector relative to the workpiece as required by the task. It is a multi-disciplinary research area spanning computer vision, robotics, kinematics, dynamics, control and real-time systems.
Though this discipline has seen considerable development in recent years, the different methods offer tradeoffs in performance, and cannot solve all tasks that may confront a robot. There are a number of recurrent questions such as: is 2D better than 3D visual servoing? (and replace 2D and/or 3D by 2D1/2, stereo, point/line/circle ...) for which we have a certain amount of theoretical arguments. However, in practice, people seem to focus mainly on the overall behaviour of their method rather than on the effective accuracy or on the quantification of the behaviour. For instance,"my method yields straighter trajectories" but what does a straight trajectory means in visual servoing?
To date, there has been little research that explores the relative strengths and weaknesses of these methods. Thus, the experts agree that defining benchmarks, metrics, and measurement procedures in visual servoing is very interesting since many research papers present new visual features or new control schemes to realize the same basic task. Comparing these methods on a well-defined and universally recognized benchmark would help to judge the practical interests of these contributions.
But they also are aware of the fact that it would be very difficult or nearly impossible to specify a benchmark that turns out to be useful for all past and future works in visual servoing. On the contrary, benchmarking would be more amenable on free and web-available simulation software, since visual servoing algorithms are rather complex. Indeed, the control law itself may be rather simple to derive and implement once the geometric primitives are extracted from the image, even though obtaining the geometric primitives may not be easy. There exist many trackers, pose estimators, image treatments, not to say anything about robots and applications.
So, the problem is mainly to know what we want to measure:
Between these two extreme positions, we may want to split the benchmark into sub-benchmarks but under which criterion? Applications? Such as mobile robot, manipulator robot (and here we can split it again between anthropomorphic arms, Cartesian arms, parallel robots, etc.) Subtasks? (i.e. detection, tracking, control law, numerical implementation).
There is no clear answer since this is a multicriteria problem with strong overlappings and coupling between the criteria. For instance, detection algorithms depend on your application (outdoor mobile robot or microrobotics).
It is arguable that we can set-up a generic benchmark for visual servoing that is completely general and generic. We might be forced to restrict the benchmark to some specific application, but then the comparison would only hold for this application.
In a visual servoing system all the parts of the system should be considered, namely:
In each of these parts one should identify which information is measurable and which is not. Then, find the good metric to measure the performance of each sub-system and the performance of the whole system.
Benchmarks should include some datasets (which would specify at least the target and the robot configuration) available in a publicly available repository, together with free simulation software.
Researchers could use benchmarks and datasets to improve their algorithms. In order to avoid ad-hoc algorithms that just solve the problem, the benchmarks should evolve. Researchers should be able to propose new datasets that defeat standard algorithms.
What an ad-hoc dataset should be is also a tough question since visual servoing is definitely a continuous process over the whole workspace. Quantization of this workspace would generate non-neglectable artifacts. In the case of a benchmark on control only, then using the Java simulator or the Matlab Toolbox (see proposals below) would be a solution. However, we come back again to the relevance of the comparison and on how realistic is the simulation.
Some experts believe that well-defined tasks or rules should be preferred. However, there is not, at the moment, the slightest idea of what they could be.
It is widely agreed that it does not make sense to compare different approaches/algorithms running on different hardware (robots, hands etc). The hardware should be the same for a better comparison among algorithms. For an easy use by all the community, it is suggested to perform benchmarks/metrics/measure procedures in simulated tasks.
Only if the comparison is application oriented, then the hardware might be different, but then the benchmark is not restricted to visual servoing any more. Hence, the benchmark turns somehow into a robot competition under the constraint that vision is the only exteroceptive sensor, and opens to robotics in general. If one only wants to restrict to algorithms, the hardware should be exactly the same, i.e. one single physical instance. It is not sufficient enough to say: "the hardware should be a Mitsubishi PA-10, with a Sony XCD-X700 camera" since the hand-eye position is not defined, and the lightning conditions may differ. Even two robots may have different accuracies depending on their age, mechanical assembly, low-level control algorithm version, and so on.
Thus, if a single hardware is proposed then, however, the comparison may be biased in the sense that the algorithm may be optimized (consciously or not) for this precise hardware and loose its generality which is, as for most experts, the beauty of visual servoing.
Due to the problems of using physical setups, an open simulation tool, which might be updated with all possible algorithms, would be necessary. Then, one could define a series of benchmarks to compare algorithms that work on the same data input and produce the same output.
To start with a minimal benchmark, one should consider the most basic task (6 dof control for a positioning task), consider different possible targets (planar, non planar), different initial and desired poses and different sources of perturbations. Afterwards, other tasks may be faced, as well as more complex situations.
Considering that matching and tracking algorithms are available and provide a set of image points, and a reference image is available, the problem is to position an eye-in-hand camera from a random starting position. The benchmarks could be:
Existing software packages could be possibly customized to allow such benchmarks, e.g. MATLAB Robotics Toolbox, ViSP (C++) from Irisa, or JaViSS (Java) from Jaume-I University.
The MATLAB Robotics Toolbox has been developed by Peter Corke (CSIRO - Australia) [Corke 96]. It provides many functions that are useful in robotics such as kinematics, dynamics, and trajectory generation. The Toolbox is useful for simulation as well as analyzing results from experiments with real robots. The current release includes a Simulink block library, which brings Toolbox functionality into the Simulink enviroment. There are also a number of demos ranging from simple forward dynamics to image-based visual servoing.
ViSP (Visual Servoing Platform) is modular software that allows fast development of eye-in-hand image-based visual servoing applications [Marchand 99]. Various issues have to be considered in the design of such application: among these issues we find the control of camera motions and the tracking of visual features. The ViSP environment features a wide class of control skills as well as a library of real-time tracking processes.
If physical robots were going to be considered in a further step, visual servoing schemes should be defined taking explicitly into account the robot kinematics. Hence, it would be possible to test various algorithms on various robots: a kind of plug-and-play interfaces so that one can "plug" a "foreign" robot on the PC doing visual servoing as easily as with the robot itself. This would become a kind of visual servoing interconnection standard, essentially.
In this context, a visual servoing simulation environment called JaViSS has been developed at Universitat Jaume I. It is called JaViSS and written in Java with graphical rendering making extensive use of Java 3D API. The calculus engine is implemented with the Colt libraries. Though JaViSS currently runs on a single computer, it has been heavily designed in a distributed manner, using the agent-based JADE platform. Finally, 3D models have been created with AC3D, and loaded with the Java 3D loader J3D-VRML97. Manipulator kinematics code is based on the Robotics Toolbox for Matlab by Peter Corke. It is intended as a tool to simplify the testing and comparison of different visual servoing approaches since it models free-motion cameras as well as cameras attached to a manipulator, with its kinematics. Different visual primitives can be modeled, to compare the behaviour of the visual task with regard to the choice of primitives.
This initiative is located at: http://www.robot.uji.es/research/projects/javiss. A complete account can be found in the paper included in the deliverable document Lecture Notes of the IROS'06 Workshop on Benchmarks in Robotics Research: Cross-Platform Software for Benchmarks on Visual Servoing, by Enric Cervera
Similarly, the University of Siena in Italy has developed an Automatic Control Telelab with the idea of offering support to real-time configuration and observation of experiments, as well as playback access to acquired data, from remote computer linked to a collection site through the Internet. The underlying philosophy is that distributed data acquisition makes possible the collection of data from remote environments and can also improve collaboration among geographically dispersed scientific communities by distributing scientific results more quickly and less expensively than most other methods. The on-going initiative offers the resources already available at the Automatic Control Telelab —both software and hardware— to the research community in order to extend them for implementing visual servoing benchmarks to be used in a remote way. Since it is the same hardware set-up that is used, comparison across different methods is easier.
At present, five diverse experiments for remote control are available and two competitions based on this framework are proposed as a way of benchmarking.
This initiative can be found at: http://act.dii.unisi.it. Also additional technical information is included in this document.