Benchmarks

From RoSta
Jump to: navigation, search
Defining a standard benchmark for mobile service robots

Contents

Introduction

Before starting with definitions, state of the art, and meetings, it is important to clarify the idea of this approach and this website. As the approach of RoSta in general is the definition of action plans for standardisation in the area of mobile service robots, the approach of the benchmarking section is the definition of community accepted benchmarks.
Today there is nothing like this available. There are competitions in some areas e.g. RoboCup but they are on a more abstract level showing some capabilities of robots for soccer playing, search and rescue, or home use. And even winning these competitions does not necessarily mean that the winning robot has the best navigation or object recognition. There is a lack of comparable numbers which help identifying good scientific approaches in all service robotic related areas.
custom term papers In order to come to a standard benchmark for mobile service robots which is accepted by the community, this benchmark needs to be a benefit for the community. This can only be reached if on the one hand everyone is heard and can state requirements while defining the benchmarks and on the other hand the benchmark becomes a living process. This website is the platform where the community can contribute to the state of the art, state and discuss requirements, build the first structure of an accepted and beneficial benchmark, and let the benchmarking approach become a living activity.

Definitions

Basic definitions

There is more than one definition for benchmarks and benchmarking available. The most basic definitions are as follows:

Benchmark:
1. A standard by which something is evaluated or measured.
2. A surveyor's mark made on some stationary object and shown on a map; used as a reference point.
Benchmarking:
1. To measure the performance of an item relative to another similar item in an impartial scientific manner.

(source: http://en.wiktionary.org/wiki/benchmark)

There are two important points mentioned in these definitions. First of all a benchmark is a standard itself and second, benchmarking is a comparing measurement of performance.

Detailed definition

A more detailed definition, which also includes four main steps of benchmarking, is the following:

Basic definition of benchmarking
Benchmarking is a powerful technique that provides practical learning through comparing measurements, policies or outcomes, across industries, sectors, policies, products or services. The essence of benchmarking is the process of identifying the highest standards of excellence for products, services or processes and then making the improvements necessary to reach those standards. There are four basic steps to the benchmark process:

Step 1 Gathering Information:
The first step is defining and collecting indicators from official sources. In the case that data are not available some ad hoc surveys could be designed. The process is facilitated by past experiences at international level.

Step 2 - Comparing & Understanding:
The data are stored in a database, which immediately generates a Benchmark Index report. The report provides comprehensive and quantifiable performance indicators, highlighting the country’s strengths and weaknesses against those of the comparison group chosen.

Step 3 - Analysing the Information:
A preliminary analysis is carried out in order to start the process of identifying the key improvement areas on which action should be focused on.

Step 4 – Implementation:
The final step consists of a critical review of the results and a compilation of the final report. This process is invaluable and provides the catalyst for change and improvement. A clear action plan is developed in order to ensure that strategic decisions are implemented on a controlled and systematic basis.

(source: MLP Workshop on Regional Benchmarking, Brussels 25. November 2005, Workshop report)

Even if this definition is used to describe regional benchmarking, it shows that benchmarking is used in a wide area of application. It also indicates that the process of benchmarking is not only comparing two measurements but the whole way from the definition of the measured values to the development of an action plan to improve what was measured in a way that it meets the standard.

Mobile service robotics

Before talking about the benchmarks in more detail it is important to clarify the domain that is searched. The field of robotics can be divided into industrial and service robots. A clear separation is given by the following definition of service robot:
“A robot which operates semi or fully autonomously to perform services useful to the well being of humans and equipment, excluding manufacturing operations.” [3]
The application of service robots splits into two different domains which are as follows: [4]

  • Professional service robots
    • Field robotics
    • Professional cleaning
    • Inspection an maintenance systems
    • Construction and demolition
    • Logistic systems
    • Medical robotics
    • Defence, rescue & security applications
    • Underwater systems
    • Public relation robots
  • Domestic service robots
    • Vacuuming, floor cleaning
    • Lawn mowing
    • Pool cleaning
    • Handicap assistance
    • Home security & surveillance

Examples for industrial service robots are shown in Figure 1. The different shapes, sizes, and looks of these robots can convey the challenge of standardizing a benchmark for service robots. But even if service robots are used in a wide field of different applications they have several key technologies in common. According to Fehler! Verweisquelle konnte nicht gefunden werden. they are:

  • Perception
  • Action Planning
  • Trajectory Planning & Control
  • Manipulation & Grasping
  • Control Architecture / Middleware
  • Intuitive Communication
  • Teaching by Demonstration & Learning

Besides these key technologies which can be treated as different components there are characteristics which apply to the whole system such as:

  • Autonomy
  • Dependability
  • Safety

It is essential that these characteristics are also subject to benchmark.

Examples of service robots.jpg
Figure 1: Examples of different service robots for “professional services”

Benchmarks for mobile service robots

There was an expert meeting in Stuttgart on the 4th and 5th of July. Subject of the meeting was the collection of the state of the art in benchmarking of mobile service robots. The agenda of the meeting can be found here: Agenda Expert Meeting Stuttgart July 4th and 5th
The attending experts were:

Matteo Matteucci Presentation
Fabio Bonsignorio Presentation
Malachy Eaton Presentation
Jens Kubacki Presentation

The experts were first asked to give a short presentation of their previous experience and current work with mobile service robots and with benchmarking. The presentations can be downloaded by clicking on presentation right beside the participants name.
The detailed outcome of this meeting can be found here: Outcome State of the Art Expert Meeting
The state of the art regarding benchmarking of mobile service robots can be summed up as follows:

There are a few benchmarks in related technologies.
There are competitions for mobile robots.
Benchmarking is not easily compareable due to different sensors, actors and approaches.
There is no accepted benchmark for mobile service robots.

During this expert meeting there were also discussions about the requirements for benchmarks for mobile service robots. One key point mentioned was, that a benchmark has to be accepted by the community and therefore the community needs to feed its thoughts and requirements into the process of defining benchmarks. The experts will work out a questionnaire for the community to obtain this information. This questionnaire will be uploaded to this page within the next weeks.

Benchmarks in related areas

Mobile service robotics involves quite a few research areas which have been developed earlier and independently. These areas are nowadays closely related to mobile service robotics but have developed their own way to measure and compare algorithms and approaches against each other. With some effort these benchmarks could be used to some extend to compare mobile service robots on a component level. On the other hand they can give examples how good and community wide accepted benchmarks look like.

Navigation

Rawseeds

The following information on Rawseed is taken from [19].
One interesting and new approach in the field of navigation and there especially in SLAM is the Rawseeds Project. The aim is to stimulate and support progress in autonomous robotics by providing a comprehensive, high-quality benchmarking toolkit. The Rawseeds Project will also perform all the actions needed for a rapid and thorough dissemination of its results through the academic and industrial domains (e.g. setup of a website, documentation and support actions, workshops, competitions, publications). The benchmarking toolkit that The Rawseeds will create includes

  • highquality multisensorial data sets,
  • benchmark problems based on them,
  • state-of-the-art solutions to these problems in the form of algorithms and software, and
  • methodologies for the assessment of algorithms.

Objectives
The Rawseeds Project will obtain its objectives through the following actions:

  • Definition of a set of high-quality benchmarks and methodologies for the assessment of algorithms and software for autonomous robotics applications. The benchmarks will be focused on the problems of sensorial data analysis, sensor fusion, localization, mapping and Simultaneous Localization And Mapping (SLAM).
  • Creation of a website from which researchers and companies will be able to download these benchmarks, contribute new material and communicate with each other.
  • Dissemination of knowledge about the Rawseeds Project benchmarks and website.

Tasks
The Rawseeds Project will spend a considerable effort in the definition of measurable standards for assessing the quality of provided data. Such definition aims at assuring that dataset will be as close as possible to scenarios in typical robotics application preserving the main characteristics of sensing and movement. The presence of typical error/noise in perceptions (e.g., reflections in images or non stationary light conditions) and proper synchronization of acquisitions will be checked in the validation phase as well as the absence of artefacts.
Documentation
Full documentation of sensors, settings and experimental procedures will be provided increasing the quality of benchmarks and reproducibility of results. This will be possible also because of the choice of public and permanent sites, e.g., the building and nearby outdoor location of participating partners.
Involved institutions
The Rawseeds Projects at the Specific Support Action (FP6-045144) under the Sixth Frame Program of the European Community and the partners in it are:

  • Politecnico di Milano (POLIMI - Coordinator) - Dipartimento di Elettronica ed Informazione, Italy.
  • Università di Milano-Bicocca (UNIMIB) - Dipartimento di Informatica, Sistemistica e Comunicazione, Italy.
  • University of Freiburg (ALU-FR) - Institut für Informatik, Germany
  • University of Zaragoza (UNIZAR) - Depto. Informática e Ingeniería de Sistemas, Spain.

Radish

The following information on Radish is taken from [20].
The American initiative Radish (Robotics Data Set Repository) launched by Nick Roy and Andrew Howard provides a collection of standard robotics data sets. These are sorted in the following four categories:

  • Logs of odometry, laser and sonar data taken from real robots
  • Logs of all sorts of sensor data taken from simulated robots
  • Environment maps generated by robots
  • Environment maps generated by hand (i.e., re-touched floor-plans)

The aim of Radish is to facilitate the development, evaluation and comparison of robotics algorithms by making these data sets available to the whole community. With the initial focus on localization and mapping, the idea was that Radish would expand to reflect the interests of the complete robotics community. As this is still pending Radish collected 37 datasets with 73 files so far.

OpenSLAM

The following information on OpenSLAM is taken from [21].
In January 2007 the OpenSLAM.org website was put online by Cyrill Stachniss, Udo Frese and Giorgio Grisetti. The idea to set up this website came from the fact that SLAM is subject to intense studies but only a few approaches are available as implementations to the community. The intention of OpenSLAM.org is to provide a platform for SLAM researchers, which gives them the possibility to publish their algorithms. After submission every author gets access to a subversion repository of all published work. Submitting source code does not mean to give away copyright for commercial application or redistribution. It only means that other researchers can use the source code for their own research.

Rat's Life

In January 2008 the Rat's Life benchmark and robot programming contest was started. This benchmark and associated programming contest is developed within the ICEA project (funded by IST Cognitive Systems Unit, European Commission). Rat's life defines a full benchmark including a Webots-based simulation and a real world setup based on the e-puck robot and LEGO bricks. This benchmark addresses autonomous robot navigation (including visual landmarks and energy management). The performance metrics is a relative evaluation where a solution can be compared to another in terms of performance. Hence a performance ranking of solutions can be established (see the hall of fame section on the Rat's life web site). The programming contest is running online every day. It is open to anyone free of charge and the winners will be awarded with nice prizes (see web site for details). As time of writing (May, 20th, 2008) 31 teams from 12 different countries are competing in this programming contest, contributing to establish reference breakthroughs in the benchmark.

Object recognition

PASCAL

View-based object recognition aims at the 3D recognition of objects with 2D sensors. With a set of different 2D views of the object the type of the object and its pose in 3D should be recognised. In order to compare existing algorithms against each other different datasets of objects were collected together to databases. The Columbia Object Image Library (COIL-20 and COIL-100) database is a commonly used one. It can be downloaded at:
http://www.cs.columbia.edu/CAVE/databases/SLAM_coil-20_coil-100/coil-100/coil-100.zip
The COIL-100 database consists of colour images of 100 different objects (COIL-20 accordingly 20 objects). 72 poses per object have been taken at 5° angles. In order to fit the image size of 128x128 pixels, the images have been normalized.
The following information about PASCAL is taken from [22].
PASCAL is an object recognition database collection. The objectives are to compile a standardised collection of object recognition databases, to provide standardised ground truth object annotations across all databases, and to provide a common set of tools for accessing and managing the database annotations. The databases are standardised so that all images are available in PNG format and are annotated with ground truth, which includes a bounding box for the objects of interest and might also include pixel segmentation masks or polygonal boundaries.
There are nine different databases collected so far which are listed below. Following the link leads directly to the information on the PASCAL homepage.

In parallel there is a visual object classes challenge, which goes into the third year. The goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning problem in which a training set of labelled images is provided. The twenty object classes that have been selected for the 2007 competition are:

  • Person: person
  • Animal: bird, cat, cow, dog, horse, sheep
  • Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
  • Indoor: bottle, chair, dining table, potted plant, sofa, TV/monitor

There will be two main competitions, and two smaller scale "taster" competitions. The first main competition is about classification, meaning to predict the presence or absence of an example out of the twenty classes. The second competition aims at detection. Here a labelled boundary box around each example in a picture has to be generated.
The taster competitions are first pixel-wise segmentation between class of the object and background and second person layout, in which a labelled boundary box for each part of a person (head, hands, feet) has to be predicted.
More information about the VOC 2005-2007 competition can be found at [23].

PEIPA (Pilot European Image Processing Archive) and PCCV (Performance Characterization in Computer Vision)

The following information about PEIPA and PCCV is taken from [24].
PEIPA is an archive relating to image processing and analysis, with emphasis on computer vision. Its principal aim is to provide information, datasets and software that allow the effectiveness of algorithms to be measured and compared. PEIPA distributes test datasets that may be used by researchers to evaluate the performance of their algorithms in order to compare it with the performances of algorithms of others. The archive also has an overview of benchmarking and several tutorials which shall help researchers to evaluate the performance of their own algorithms. The PEIPA web pages provide a comprehensive resource concerning all aspects of performance characterization in computer vision. In this context, performance does not mean how quickly a piece of software runs but how well it performs its task.
The principal aim of the PCCV web-site is to show researchers how to carry out benchmarking exercises themselves, both by providing tutorials that describe the principles and in-depth case studies that illustrate the practice. There are also examples of people's work and a comprehensive bibliography of relevant publications.
There is also a test harness provided to automate the testing process. The same test harness also allows the researchers to compare the performances of techniques, both those they have developed themselves and others available on the Internet.
The development and maintenance of the PEIPA web-site is currently funded under a European project entitled Performance Characterization in Computer Vision (PCCV). This project was set up by Patrick Courtney while at Visual Automation in Manchester (UK), though it has been managed by Neil Thacker of ISBE, University of Manchester since Patrick moved to Perkin-Elmer. The other partners in PCCV are Henrick Christensen of KTH in Stockholm (Sweden) and Adrian Clark of the University of Essex in Colchester (UK), where the PCCV pages reside.

The Facial Recognition Technology (FERET) Database

The following information about FERET is taken from [25].
Even if this document is the collection of the state of the art in benchmarking, FERET, which ran from 1993 through 1997, should be mentioned. Sponsored by the Department of Defence’s Counter drug Technology Development Program through the Defence Advanced Research Products Agency (DARPA), its primary mission was to develop automatic face recognition capabilities that could be employed to assist security, intelligence and law enforcement personnel in the performance of their duties.
The FERET image corpus was assembled to support government monitored testing and evaluation of face recognition algorithms using standardized tests and procedures. The final corpus, presented at [14], consists of 14051 eight-bit greyscale images of human heads with views ranging from frontal to left and right profiles. In October 2003 the Colour FERET database was released, which supersedes the earlier greyscale release of the database.

Speech recognition

The following general information about evaluation of speech recognition is taken from [26].
Similar to the approach for vision based object recognition large speech datasets have been developed. These speech corpora were meant for system development, training, and testing. Some of these corpora are designed for acoustic phonetic research, while others are highly task specific. Nowadays, it is not uncommon to have tens of thousands of sentences available for system training and testing. These corpora permit researchers to quantify the acoustic cues important for phonetic contrasts and to determine parameters of the recognizers in a statistically meaningful way. While many of these corpora (e.g., TIMIT, RM, ATIS, and WSJ) were originally collected under the sponsorship of the DARPA to spur human language technology development among its contractors, they have nevertheless gained world-wide acceptance (e.g., in Canada, France, Germany, Japan, and the U.K.) as standards on which to evaluate speech recognition.

TIMIT

The following information about TIMIT is taken from [27] and Wikipedia.
The DARPA TIMIT speech database was designed to provide acoustic phonetic speech data for the development and evaluation of automatic speech recognition systems. It consists of utterances of 630 speakers that represent the major dialects of American English. Each transcribed element has been delineated in time. TIMIT was designed to further acoustic-phonetic knowledge and automatic speech recognition systems. It was commissioned by DARPA and worked on by many sites, including Texas Instruments (TI) and Massachusetts Institute of Technology (MIT), hence the corpus' name. There is also a telephone bandwidth version called NTIMIT (Network TIMIT).

RM (Resource Management)

The following information about RM is taken from [28].
The DARPA Resource Management Continuous Speech Corpora (RM) consists of digitized and transcribed speech for use in designing and evaluating continuous speech recognition systems. There are two main sections, often referred to as RM1 and RM2. RM1 contains three sections, Speaker-Dependent (SD) training data, Speaker-Independent (SI) training data and test and evaluation data. RM2 has an additional and larger SD data set, including test material. All RM material consists of read sentences modelled after a naval resource management task. The complete corpus contains over 25,000 utterances from more than 160 speakers representing a variety of American dialects.

ATIS (Air Travel Information System)

The following information about ATIS is taken from [29].
This speech database is the first in a series of recordings of "natural speech", in the Air Travel Information System (ATIS) domain. Queries collected for these corpora are spoken, without scripts or other constraints, to ATIS, a computerized simulation of a database system that includes a simplified version of the Official Airline Guide (OAG). A human "wizard" simulating the speech recognizer of the future gives the impression of a speech-recognizing computer system.

WSJ (Wall Street Journal)

The following information about WSJ is taken from [30].
MC-WSJ-AV corpus offers an intermediate task between simple digit recognition and large vocabulary conversational speech recognition. The corpus consists of read Wall Street Journal sentences taken from the test set of the WSJCAM0 database, recorded in the instrumented meeting rooms constructed for the recording of the AMI Meetings Corpus. The sentences are read by a range of speakers (some 45 in total) with varying accents (including a number of non-native English speakers). Sentences are read according to a number of scenarios including a single stationary speaker, a single moving speaker, and multiple concurrent speakers. During recordings, all speakers wear lapel and headset microphones, and audio from two eight element microphone arrays is also captured. The rooms also provide synchronised video recordings including close-up views of the speakers' faces, as well as wide-angle views of the entire room. The data is suitable for a wide variety of research tasks including :

  • development of microphone array ASR front-end processing systems
  • audio-visual ASR
  • audio-visual person tracking
  • integration of audio-visual person tracking with microphone array ASR processing
  • recognition of accented and non-native English speech
  • recognition of overlapped speech

Computer industry and science

TPC (Transaction Processing Performance Council)

The following information about TPC is taken from [31].
History
In the 1980’s the testing of transaction processing performance was done with the TP1 and the DebitCredit benchmark. Without a standards body to supervise the testing and publishing, vendors published extraordinary marketing claims on both TP1 and DebitCredit. They often deleted key requirements in DebitCredit to improve their performance results. This reduced the credibility of these benchmarks with the press, market researchers, and users.
The TPC was founded on August 10, 1988 by Omri Serlin an industry analyst together with eight computer companies in order to generate a standards body that supervises testing and publishing. One year later, the first benchmark, the TPC-A, was published by the TPC. The first TPC-A results were announced in July 1990. Four years later, at the peak of its popularity, 33 companies were publishing on TPC benchmarks and 115 different systems had published TPC-A results. In total, about 300 TPC-A benchmark results were published.
Mission
The TPC is a non-profit corporation founded to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry.
Scope
The term transaction is often applied to a wide variety of business and computer functions. Looked at as a computer function, a transaction could refer to a set of operations including disk read/writes, operating system calls, or some form of data transfer from one subsystem to another.
While TPC benchmarks certainly involve the measurement and evaluation of computer functions and operations, the TPC regards a transaction as it is commonly understood in the business world: a commercial exchange of goods, services, or money. A typical transaction, as defined by the TPC, would include the updating to a database system for such things as inventory control (goods), airline reservations (services), or banking (money).
In these environments, a number of customers or service representatives input and manage their transactions via a terminal or desktop computer connected to a database. Typically, the TPC produces benchmarks that measure transaction processing (TP) and database (DB) performance in terms of how many transactions a given system and database can perform per unit of time, e.g., transactions per second or transactions per minute.
Organisational Structure
It can be beneficial to look at already established benchmarks and benchmarking bodies and their organisational structure when setting ground for new benchmarks. The structure of TCP is as follows:

  • Full Council

All major decisions are made by the Full Council. Each member company of the TPC has one vote, and a two-thirds vote is required to pass any motion.

  • Steering Committee

Consists of five representatives, elected annually, from member companies. The Steering Committee is responsible for overseeing TPC administration and support activities and for providing overall direction and recommendations to the Full Council. The Full Council, however, decides all substantive TPC matters. To expedite the work of the TPC, the Full Council has created two types of subcommittees: standing and technical subcommittees. Standing subcommittees are permanent committees that supervise and manage administrative, public relations and documentation issues for the TPC. The technical subcommittees are formed to develop a benchmark proposal and maintain and evolve the benchmark after development work is complete.
Standing Subcommittees:

  • Steering Committee (SC): See description above.
  • Technical Advisory Board (TAB): This subcommittee is tasked with maintaining document and change control over the complex benchmark proposals and methodologies. In addition, the TAB studies issues involving interpretation/compliance of TPC specifications and makes recommendations to the Council.
  • Public Relations Committee (PRC): This subcommittee is tasked with promoting the TPC and establishing the TPC benchmarks as industry standards.

Technical Subcommittees

  • TPC-C Subcommittee: This subcommittee owns the TPC-C benchmark, which addresses OLTP environments.
  • TPC-H Subcommittee: This subcommittee owns the TPC-H benchmark which addresses decision support applications (with complex query models).
  • TPC-App Subcommittee: This subcommittee owns the TPC-App benchmark, which addresses web based e-commerce environments.

SPEC (Standard Performance Evaluation Cooperation)

The following information about SPEC is taken from [32].
The Standard Performance Evaluation Cooperation (SPEC) is a non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers. SPEC develops benchmark suites and also reviews and publishes submitted results from our member organizations and other benchmark licensees.
The System Performance Evaluation Cooperative, now named the Standard Performance Evaluation Corporation (SPEC), was founded in 1988 by a small number of workstation vendors who realized that the marketplace was in desperate need of realistic, standardized performance tests. The key realization was that an ounce of honest data was worth more than a pound of marketing hype.
SPEC has grown to become one of the more successful performance standardization bodies with more than 60 member companies. SPEC publishes several hundred different performance results each quarter spanning across a variety of system performance disciplines.
The goal of SPEC is to ensure that the marketplace has a fair and useful set of metrics to differentiate candidate systems. The path chosen is an attempt to balance between requiring strict compliance and allowing vendors to demonstrate their advantages. The belief is that a good test that is reasonable to utilize will lead to a greater availability of results in the marketplace.
The basic SPEC methodology is to provide the benchmarker with a standardized suite of source code based upon existing applications that has already been ported to a wide variety of platforms by its membership. The benchmarker then takes this source code, compiles it for the system in question and then can tune the system for the best results. The use of already accepted and ported source code greatly reduces the problem of making apples-to-oranges comparisons.
Organisational Structure
Originally just a bunch people from workstation vendors devising CPU metrics, SPEC has evolved into an umbrella organization encompassing three diverse groups. These are:

  • The Open System Group (OSG)

The OSG is the original SPEC committee. This group focuses on benchmarks for desktop systems, high-end workstations and servers running open systems environments. The OSG is divided into subcommittees benchmarking CPU, Java, Mail, Power and Performance, SIP, SFS, Virtualisation, and Web.

  • The High Performance Group (HPG)

The HPG is a forum for establishing, maintaining and endorsing a suite of benchmarks that represent high-performance computing applications for standardized, cross-platform performance evaluation. These benchmarks target high performance system architectures, such as symmetric multiprocessor systems, workstation clusters, distributed memory parallel systems, and traditional vector and vector parallel supercomputers.

  • The Graphics and Workstation Performance Group (GWPG)

SPEC/GWPG is the umbrella organization for project groups that develop consistent and repeatable graphics and workstation performance benchmarks and reporting proce¬du¬res. SPEC/GWPG benchmarks are worldwide standards for evaluating performance in a way that reflects user experiences with popular applications. The GWPG has two project groups. The SPECapc (application performance characterisation) and SPECgpc (grafics performance characterisation).

Manipulation and grasping

Visually-Guided 3-Finger Grasping Experiments

The following information about the Visually-Guided 3-Finger Grasping Experiments is taken from [33].
The Visually-Guided 3-Finger Grasping Experiments are an initiative of Universitat Jaume I. It offers a description of a set of experiments on visually-guided grasping of planar objects with a Barrett hand. They are made available to the community as a set of standard experiments on which benchmarks and associated performance metrics can be proposed. A first experimental protocol for such a benchmark is also proposed.
The goal of the experiments is to test different implemented procedures for visually-guided grasping. That is, by only using visual information about planar objects a set of 3-finger feasible grasps are determined and executed to lift the object. The experimental setup for the experiments consists of:

  • A Bisight head consist of four mechanical DOF (pan/tilt head , and two independent verge).
  • Two gray-scale cameras enhanced wit two mechanized lenses that provided three additional optical DOF (zoom, iris, focus) per camera.
  • A seven DOF Whole Arm Body (WAM) robot manipulator from Barrett Technologies.
  • A three-fingered Barrett Hand. Each hand has 4 DOF and has a tactile load cell (ATI) integrated in each fingertip. These cells provide a 6-dimensional force/torque information.

It is worth noting that these experiments involve the solution of several problems. The main one is the constraining of a general three-finger grasp to the particular kinematics of the 4 DOF Barrett Hand. Another relevant problem is the use of stereo images to locate an object placed in front of the robot, and to obtain the contour of such object.

Competitions

There are a few competitions worth mentioning in the context of mobile service robots. Aim of these competitions is to identify the best robot-team combination to fulfil a certain task. This does not always mean that the winning team has the best algorithms. Winning also includes how the team itself works, the strategy to achieve the goal, the conditions under which the competition takes place and the behaviour of the competing teams. Therefore these competitions could be considered benchmarks on a system level, whereas the system includes more than the robot itself, as described before. In the following the main competitions are described in more detail.

RoboCup

The following information on RoboCup is taken from [9].
RoboCup is an international research and education initiative. Its goal is to foster artificial intelligence and robotics research by providing a standard problem where a wide range of technologies can be examined and integrated.
The concept of soccer-playing robots was first introduced in 1993. Following a two-year feasibility study, in August 1995, an announcement was made on the introduction of the first international conferences and football games. In July 1997, the first official conference and games were held in Nagoya, Japan. Followed by Paris, Stockholm, Melbourne and Seattle where the annual events attracted many participants and spectators. The 6th RoboCup 2002 was held in Fukuoka, Japan in cooperation with Busan, Korea, while the 7th edition in 2003 took place in Padua, Italy. In 2004 in Lisbon, Portugal. 2005 in Osaka, Japan, 2006 in Bremen, Germany, and now in 2007 in Atlanta, USA. The events were covered by national and international media all over the world.

RoboCup – Soccer

The main focus of the RoboCup activities is competitive football. The games are important opportunities for researchers to exchange technical information. They also serve as a great opportunity to educate and entertain the public. RoboCupSoccer is divided into the following leagues:
The RoboCup Soccer consists of several competitions:

Simulation league

Independently moving software players (agents) play soccer on a virtual field inside a computer. Matches have 5-minute halves. This is one of the oldest fleet in RoboCupSoccer. As a physical visualization sub-league, a visualization using thumb-size robots (Eco-be system) will be demonstrated in 2007.

Small-size robot league (f-180)

Small robots of no more than 18 cm in diameter play soccer with an orange golf ball in teams of up to 5 robots on a field with the size of bigger than a ping-pong table. Matches have 10-minute halves. This league focuses on the issues of multi-agent cooperation with a hybrid centralized/distributed system.

Middle-size robot league (f-2000)

Middle-sized robots of no more than 50 cm diameter play soccer in teams of up to 4 robots with an orange soccer ball on a field the size of 12x8 metres. Matches are divided in 15-minute halves. All sensors are on-board. Robots can use wireless networking to communicate.

Four-legged robot league

Teams of 4 four-legged entertainment robots (SONY's AIBO) play soccer on a 3 x 5 metre field. Matches have 10-minute halves. The robots use wireless networking to communicate with each other and with the game referee. Challenges include vision, self-localization, planning, and multi-agent coordination.

Standard Platform league

Teams of 4 Nao humanoid robots (produced by Aldebaran Robotics) play soccer on a field. Matches have 10-minute halves. The robots use wireless networking to communicate with each other and with the game referee. Challenges include vision, self-localization, planning, and multi-agent coordination. This league is intended to replace the four-legged robot league as Aibo robots are discontinued by Sony.

Humanoid league

This league was introduced in 2002 and the robots will have their third appearance ever in this year's RoboCup. Biped autonomous humanoid robots play in "penalty kick" and " 2 vs. 2" matches and "Technical Challenges". This league has two subcategories: Kid-size and Teen-size.

Robotstadium league (simulated humanoids)

This league was introduced in 2008. It is a realistic simulation based on the Standard Platform league (based on the Nao robot) and using the Webots simulation software. Robots can be programmed in URBI and/or Java. The competition is currently running online every day. It is open to anyone free of charge and the winners will be awarded with a Nao robot and a software package including Webots PRO and URBI for Webots.

RoboCup – Rescue

Disaster rescue is one of the most serious issues involving very large numbers of heterogeneous agents in a hostile environment. The intention of the RoboCupRescue project is to promote research and development in this significant domain by involving multi-agent team work coordination, physical robotic agents for search and rescue, information infrastructures, personal digital assistants, standard simulator and decision support systems, evaluation benchmarks for rescue strategies and robotic systems that are all integrated into a comprehensive system in future. RoboCupRescue is divided into two leagues.
The RoboCup Rescure is split up into two disciplines:

RoboCupRescue Robot League

Robots explore a specially constructed disaster site, including mannequins with various signs of life, such as waving hands, shouting noises and heat, hidden amongst stairs, platforms and building rubble. The robots, some under human control, must find and approach the victims, identify their signs of life and produce a map of the site showing where the victims are located. The aim is to provide human rescuers with enough information to safely perform a rescue. Each team is scored based on the quality of its maps, the accuracy of the victim information and the number of victims found.

RoboCupRescue Simulation League

The league is composed of three competitions: the virtual robot competition, the agent competition, and the infrastructure competition.
In a virtual robot competition run, a team of simulated robots has to explore, map and clear a block-sized disaster area, featuring both carefully modeled indoor / outdoor environments. Robots and sensors used in this competition closely mirror platform and devices currently used in physical robots.
The Agent competition involves scoring competing agent coordination algorithms on different maps of the RobocupRescue simulation platform. The challenge in this case involves developing coordination algorithms that will enable teams of Ambulances, Police forces, and Fire Brigades to save as many civilians as possible and extinguish fires in a city where an earthquake has just happened.
The Infrastructure competition involves evaluating tools and simulators developed for the simulation platform and for simulating disaster management problems in general. Here, the intent is to build up realistic simulators and tools that could be used to enhance the basic RobocupRescue simulator and expand upon it.

RoboCup @ home

RoboCup@Home focuses on real-world applications and human-machine interaction with autonomous robots. The aim is to promote the development of robots that will aid humans in everyday life.
The scenario involves the home itself. Participants are given an environment that involves a kitchen, a living room, and possibly more. Contestants then demonstrate their robots' abilities in this environment. There are six different specific tests which are divided into two phases. These phases are Proof of Concept and General Applicability. For each specific test there is also a score system defined. The six different tests are:

  • Follow and Guide a Human
  • Navigate
  • Manipulate
  • Who is Who
  • CopyCat Game
  • Lost & Found

More information about the competition and the six tests can be found in the rule book [10]. The first demonstration was held 2007 in Bremen.

RoboCup – Junior

RoboCupJunior is a project-oriented educational initiative that sponsors local, regional and international robotic events for young students. It is designed to introduce RoboCup to primary and secondary school children, as well as undergraduates who do not have the resources to get involved in the senior leagues yet. The focus of the Junior league lies on education. The tournament offers to the participants the chance to take part in international exchange programmes and to share the experience of meeting peers from abroad.
RoboCup Junior offers several challenges, each emphasizing both cooperative and competitive aspects. For young students, RoboCupJunior provides an exciting introduction to the field of robotics, a new way to develop technical abilities through hands-on experience with electronics, hardware and software, and a highly motivating opportunity to learn about teamwork while sharing technology with friends. In contrast to the one-child-one-computer scenario typically seen today, RoboCupJunior provides a unique opportunity for participants with a variety of interests and strengths to work together as a team to achieve a common goal.

  • Soccer Challenge
  • Dance Challenge
  • Rescue Challenge

Darpa Grand Challenge

The DARPA Grand Challenge [14] is a prize competition for driverless cars, sponsored by the Defense Advanced Research Projects Agency (DARPA), the central research organization of the United States Department of Defense. Congress has authorized DARPA to award cash prizes to further DARPA’s mission to sponsor revolutionary, high-payoff research that bridges the gap between fundamental discoveries and their use for national security. DARPA has technologies needed to create the first fully autonomous ground vehicles capable of completing a substantial off-road course within a limited time. The third event, The DARPA Urban Challenge, scheduled to take place November 3, 2007, further advances vehicle requirements to include autonomous operation in a mock urban environment.
The Grand Challenge was the first long distance competition for robot cars in the World; to date, there have been other competitions for semi-autonomous and autonomous vehicles, but none on the scale of the Grand Challenge. The U.S. Congress authorized DARPA to offer prize money ($1 million) for the first Grand Challenge to facilitate robotic development, with the ultimate goal of making one-third of ground military forces autonomous by 2015. Following the 2004 event, Dr. Tony Tether, the director of DARPA, announced that the prize money had been increased to US$2 million for the next event, which was claimed on October 9, 2005. The first, second and third places in the 2007 Urban Challenge will receive US$2 million, US$1 million, and US$500,000, respectively.
The competition is open to teams and organizations from around the world, with the proviso that they have at least one U.S. citizen on the roster. Teams have participated from high schools, universities, businesses and other organizations. More than 100 teams registered in the first year, bringing a wide variety of technological skills to the race. In the second year, 195 teams from 36 US states and 4 foreign countries entered the race.

Darpa Urban Challenge

For 2007, DARPA introduced a new challenge, which it named the "Urban Challenge". The Urban Challenge will take place on November 3, 2007 at the site of the now-closed George Air Force Base (currently used as Southern California Logistics Airport), in Victorville, California [15]. The course will involve a 60-mile (96 km) urban area course, to be completed in less than 6 hours. Rules will include obeying all traffic regulations while negotiating with other traffic and obstacles and merging into traffic. While the 2004 and 2005 events were more physically challenging for the vehicles, the robots operated in isolation and did not encounter other vehicles on the course. The Urban Challenge requires designers to build vehicles able to obey all traffic laws while they detect and avoid other robots on the course. This is a particular challenge for vehicle software, as vehicles must make "intelligent" decisions in real time based on the actions of other vehicles. Other than previous autonomous vehicle efforts that focused on structured situations such highway driving with little interaction between the vehicles, this competition will operate in a more cluttered urban environment and requires the cars to perform sophisticated interactions with each other, such as maintaining precedence at a 4-way stop intersection [15].
The event is being followed closely by auto manufacturers for the implications it holds for smarter cars and safer highways in the future.
Unlike the past two challenges, DARPA has announced that some teams will receive development funding, based on proposals submitted to DARPA. Eleven teams can receive up to US$1 million a piece under this special program track (Track A) [16]. These 11 teams largely represent major universities and large corporate interests such as CMU teaming with GM, Stanford teaming with Volkswagen, Oshkosh Truck, Honeywell, Raytheon, Caltech, Autonomous Solutions, Virginia Tech, Cornell, and MIT. One of the few independent entries in Track A is the Golem Group.
On December 8, 2006, DARPA reinstalled the winning cash prize money. The winner will receive US$2 million. The second place finisher will receive US$1 million and the third place finisher will receive US$500,000 [17].
June/July 2007 53 teams (see list below) were notified that they qualified for DARPA Site visits. If successful in these evaluations, some or all of these teams will move to a national qualifying event to take place in October 2007.
On August 9, 2007, after completing the site visits, DARPA announced [18] the 36 semi-finalists selected to participate in the Urban Challenge National Qualification Event (NQE) scheduled for October 26-31, 2007. The top 20 teams from that event will proceed to the final competition on November 3.

ELROB

European Land-Robot Trial (ELROB) is a European contest to demonstrate the capabilities of modern robots.
ELROB is o competition like the American DARPA Grand Challenge, but a demonstration of the state of the art of European Robotics. The first study ELROB2006 was performed from the German army on 15. to 18. May 2006 at the training area next to Hammelburg. Its goal was to illustrate the development in the near future and to gain an impression on Unmanned Ground Vehicles which are supposed to be realised in the near future. The focus was on military applications.
According to Henrik I. Christensen, coordinator of the European robotics network Euron and leader of the ELROB jury, ELROB is suposed to be held yearly. Then the scenarios are switched each year between civil and military applications. The next event focusing on civil aspects is planned to be held from 13 to 16 August in Monte Ceneri in Switzerland.
Academic as well as commercial participants from European countries are allowed to take part in this contest [11].

C-ELROB (Civilian Elrob)

ELROB aims to bridge the gap between users, industry and research in the field of robotics. Nowadays industry plays the "leading" role in robotics in the sense that it often defines the robotic capabilities available. Many times this results in more or less "standard" robots being introduced into operations without very thoroughly defined functional requirements. This might be an appropriate approach for those cases where the users have no vision themselves on how to use robotics in their domains. But many users do have an idea of their own on their possible use of robotics. These views are often characterised by short-term objectives and a limited understanding of technical issues.
Research instead has usually no short-term objective and often fails to provide practical applications. Ignoring scientific results, however, prevents interesting and innovative developments. Industry, especially medium-sized enterprises, could use research to fertilise their strategic development process.
So, many times there is a significant gap between the industry's understanding of robots needed, the application of research results and the users understanding of robot technology as well as the technical feasibility of their requirements.
ELROB as an event aims to solve this situation by bringing together users, industry and researchers. Therefore ELROB is set-up as a co-operation between representatives from the users’ community, industry and the research community. It is open to:

  • Users:

These are (future) professional users of robots.

  • Industry:

These are designers and manufacturers of integrated ground robots focusing on considered domains.

  • Research:

These are universities and other research institutes focusing on (partial) solutions relevant to the considered domains (e.g.: sensor technology or outdoor navigation). Facing the user the “European Land-Robot Trial (ELROB)” is organised in order to provide an overview of the European state-of-the-art in the field of UGVs/UAVs with focus on short-term realizable robot systems.
ELROB is explicitly designed to assess current technology to solve problems at hand, using whatever strategy to achieve it. ELROB is presenting cutting edge robotics technology, applied to real world applications, which can save lives now and shape the direction of research for the short and medium term.
With regard to available capabilities, the organisers seek to promote innovative technical approaches that will enable the operation of unmanned vehicles.
Keep in mind that the theme of ELROB still is:

  • ELROB is conducted to provide an overview of the European state-of-the-art in the field of unmanned vehicles with focus on short-term realisable robot systems!
  • ELROB is explicitly designed to assess current technology to solve real world problems at hand!
  • ELROB in addition is an opportunity to bring together users, researchers and industry to build a community!

This information is taken from the C-ELROB website [12].

M-ELROB (Military ELROB)

German Army and Directorate General of Armament will organize again a European Land-Robot Trail. ELROB 2008 will take place from 30th of June to 3rd of July at Infantry School (Hammelburg/Germany).
Servicemen and –women of the Bundeswehr are targets of ambushes and attacks while on mission abroad. Opposing Forces are not seeking a decisive military victory on the battlefield but trying to erode the political firmness of the parliament and demoralize the German population’s will to support Germany’s commitment for global security. The protection of German soldiers against death and injury enables the completion of military orders and ensures political freedom of action. By definition, unmanned robot systems provide maximum protection for soldiers, because they are no longer exposed to the threat.
ELROB 2008 focuses on the question, if robot systems can support soldiers effectively in the near future. The purpose of the trial is to identify such robotic systems, which are able to master daily military tasks. Four scenarios are the test bed for this. The Organizers’ aim is to gain insights for future armament projects and Research & Technology activities.
Developers of robotic systems are asked to illustrate the performance of their systems at ELROB 2008. A comprehensive fair provides more insights in other aspects of robot systems.
Representatives of German Armed Forces and allied/friendly armed forces, police and other security agency will attend.
The proven philosophy of ELROB will continue in 2008:

  • ELROB is conducted with a focus on short-term realisable robot systems!
  • ELROB is explicitly designed to assess current technology to solve real world problems at hand!
  • ELROB in addition is an opportunity to bring together users, researchers and industry to build a community!

More information can be retrieved on the M-ELROB website [13].

References

[1] http://en.wiktionary.org/wiki/benchmark
[2] http://www.innovating-regions.org/download/MLP Benchmarking Workshop_25 November 2005_report.pdf
[3] IFR (International Federation of Robotics)
[4] Lecture of Service Robots 2007, Fraunhofer IPA, Stuttgart, Germany
[5] Autonomy Measures for Robots; H. Huang, E. Messina, R. Wade, R. English, B. Nova, J. Albus; Proceedings of IMECE: International Mechanical Engineering Congress, November 13-19, 2004
[6] Benchmarks for evaluating socially assistive robotics; D. Feil-Seifer, K. Skinner, M.J. Matarić; Interaction Studies: Psychological Benchmarks of Human-Robot Inteaction, 8(3), 423-429 Oct, 2007
[7] Internal work document; Service robotic initiative: DESIRE; http://www.service-robotik-initiative.de/
[8] Workshop: „Presentation of Draft Methodology Report“; CARE; M. Hägele, T. Laube; 12 April, Rome
[9] http://www.robocup.org/Intro.htm
[10] RoboCup@Home - Rules & Regulations; Version: 1.0 Revision: 27 Date: 2007-06-12; http://www.ai.rug.nl/robocupathome/documents/rulebook.pdf
[11] http://www.elrob.org
[12] http://www.c-elrob.eu
[13] http://www.m-elrob.eu
[14] http://www.darpa.mil/grandchallenge
[15] http://www.darpa.mil/grandchallenge/docs/urb_challenge_announce.pdf
[16] http://cimar.mae.ufl.edu/grand_challenge/pages/docs/Grand%20Challenge%20Wrap%20Up.doc
[17] http://www.darpa.mil/GrandChallenge/overview.asp
[18] http://www.darpa.mil/grandchallenge/docs/PR_UC_Semifinalist_Announcement.pdf
[19] http://www.rawseed.org
[20] http://radish.sourceforge.net
[21] http://openslam.org
[22] http://www.pascal-network.org/challenges/VOC/databases.html
[23] http://www.pascal-network.org/challenges/VOC/
[24] http://peipa.essex.ac.uk/benchmark/index.html
[25] http://www.itl.nist.gov/iad/humanid/feret/feret_master.html
[26] http://cslu.cse.ogi.edu/HLTsurvey/ch1node4.html
[27] http://www.mpi.nl/world/tg/corpora/timit/timit.html
[28] http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S3B
[29] http://www.ldc.upenn.edu/Catalog/readme_files/atis/sspcrd.readme.html
[30] http://www.idiap.ch/mmm/corpora/ami_wsj
[31] http://www.tpc.org
[32] http://www.spec.org
[33] http://www.robot.uji.es/benchmarks/competition/visually.html
[34] http://www.eu-nited-robotics.net/node/86

Personal tools