Autonomous Space Robotics Lab

Speeded Up SURF


Speeded Up Speeded Up Robust Features

A GPU implementation of the SURF algorithm


Image credit: NASA/JPL/University of Arizona. Image from the edge of the south polar residual cap on Mars.

Table of Contents


This is an implementation of the SURF algorithm [1] using the NVIDIA CUDA API [2] and released under a BSD license.

In implementing this algorithm, the focus was on accuracy, not speed. That is, we attempted to match the results of SURF where possible, and where not, we made educated guesses and validated the algorithm against standard datasets. Additionally, a Matlab implementation is available, which is useful for understanding the algorithm and the implementation. The library is under active development and so further functionality is forthcoming. Further details on the implementation (some out of date) are available as a report on the original code [pdf].


As a library

Compiling the source code provides a dynamic link library (.dll in Windows, .dylib in OSX, .so in Linux). High-level use of the library is provided through the GpuSurfDetector object. An example demonstrating all functionality of detector object is provided in the file gpusurf_engine.cpp.

As a command line utility

The gpusurf_engine example program can be run through a command line interface. Help on syntax is provided if the program is run without any input options. An illustration of how the input filter parameters are used can be found in the documentation for the GpuSurfOctave class. For a series of images, all of the image file names should be provided in a single command. This is faster than processing each image sequentially, as re-initialization is not required between images.

The output for each image is a .key text file, which consists of all of the keypoints and their associated descriptors in each row. For visualization purposes, these .key files can be loaded into Matlab using the load_gpu_keypoints() function.

From within Matlab

The parallel implementation coded in Matlab can be run by using the surf_find_keypoints() function. The output keypoints can be sorted by strength using surf_best_n_keypoints(), and plotted using surf_plot_keypoints().

Deviations from SURF

Due to the closed-source nature of the original SURF distribution [1], we were not able to reproduce exactly matching results with our algorithm. However, it is to the best of our knowledge that no other available open-source implementations have been able to accomplish this task either. The results of the Mikolajczyk [3] repeatability test for the Graffiti dataset for our implementation and other available detectors can be seen below.


The deviations we have identified are:

  • The GPU hardware produces slightly different computational results due to the single precision floats.
  • Due to the custom-dimensioned box filters, the reported keypoint scale is somewhat larger than what SURF produces for a similarly sized blob. For the default parameters, it is approximately 1.3 times larger. This larger value causes the orientation and descriptors are computed at a larger scale.
  • The Dxy filter gap grows with scale, unlike the illustration in [1].
  • The computed descriptors use a Gaussian weighting of 3.3s as described in [1], but descriptors obtained from SURF seem to indicate another value is used.

Known limitations

  • The default GPU thread counts should be optimized for specific hardware.
  • Integral image computation can likely be sped up by using the 2D parallel prefix sum by Terriberry et al. [4].

Building the library on different platforms

The bulk of our development has been on MacBook Pro computers that dual boot into Windows XP. As such, the build instructions are best for those platforms. However, no platform-specific libraries were used and so compiling on other platforms should be straigtforward, provided they are supported by CUDA. The main difficulty with compiling the library arises from dependencies: OpenCV, Boost, and CUDPP. We have included the source of CUDPP in our download and provided links and instructions for obtaining the other libraries for each platform.

Build instructions for Windows

Build instructions for OSX

Build instructions for Linux


Release 0.2.0 [download] [release notes]

  • Number of times downloaded: [stats]

Projects using this code

Please send us links to your project!

About the Authors

Paul Furgale is a PhD candidate in the Autonomous Space Robotics Lab at the University of Toronto Institute for Aerospace Studies. His research works to expand the basic Visual Odometry pipeline to enable long-range autonomous rover navigation. [website]

Chi Hay Tong is a PhD student in the Autonomous Space Robotics Lab at the University of Toronto Institute for Aerospace Studies. His research interests involve robust 3D worksite mapping using laser rangefinders. [website]

Contact Information

Inquiries about the project can be directed at (


This code is released under a BSD license.


Many people have helped us in the development of this code and we would like to thank some of them here. Gaetan Kenway helped with the initial development of the code during an excellent course run by Dr. A. Moshovos at the University of Toronto. Thanks to Mikolajczyk et al. [3] and Bay et al. [1] for providing us with libraries that let us evaluate if we were on the right track. Thanks to people who have provided feedback so far: Alastair Harrison and David McKinnon. Finally, thanks to our supervisor Tim Barfoot for providing equipment and support, even though we were supposed to be working on our theses.


[1] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3):346-359, 2008.

[2] NVIDIA Corporation, 2701 San Tomas Expressway, Santa Clara, CA. NVIDIA CUDA Programming Guide, 2.3.1 edition.

[3] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. V. Gool. A comparison of affine region detectors. International Journal of Computer Vision, 65(1-2):43-72, November 2005.

[4] T. Terriberry, L. French, and J. Helmsen. GPU Accelerating Speeded-Up Robust Features. In Proceedings of the 4th International Symposium on 3D Data Processing, Visualization and Transmission, 355–362, June 2008.

Generated on Fri Apr 30 20:06:19 2010 for gpusurf by doxygen 1.6.2