Photo Tourism - Exploring Photo Collections in 3D

Photo taken from here



The Full Paper can be found here and more information can be found here

Introduction:
The authors create a novel photo explorer for browsing and organizing large collections of photographs by exploiting 3D geometry of the underlying scene. Their reason for coming up with the system is that the technology for browsing and organizing large amount of photos is old and will generally just show them as thumbnails and i also feel that this is true. 
First Impression:
At first I thought that some of the things that the system was trying to accomplish were not that hard to do like assigning a location or, were already done like scene visualization which i thought to be similar to Google's street view. Some features however did sound very interesting like like object based photo modeling which basically gives more similar images of the object  or of a particular item in the scene(In my experience Google's image search does not always give perfect results) or annotations in image which give information about a particular object in the photo and transferring of these annotations to other images that contain this object.  
Scope of the Paper:
The paper talks about an end to end photo organizer and explorer. The authors divide their work in 3 main categories: Image Based Modeling, Image Based Rendering and image browsing, retrieval and annotation. The paper defines and explains these categories in the section "Related Work" which somewhat helps to get a basic understanding of what they are trying to implement and discuss some similar work. Sections 4 through 7 talk about design choices and how they implement  the various functionalities in good detail. In the end they discuss the limitations of the explorer and talk about some of the possible future work.
Reconstruction of cameras and 3D Geometry(Image based modeling): 
Basically refers to creating 3D models from images. Their system requires Camera pose (location,orientation,field of view), intrinsic camera parameters and also absolute locations. They obtain these values
in the following manner (dont rely of GPS or EXIF tags): 
  1. Detect feature points in image (use SIFT) 
  2. match features b/w pairs of images
  3. run a robust and iterative SfM (structure from motion) to recover camera parameters
  4. Get relative info from SfM, find the absolute info by overlaying the recovered camera on an overhead map
The fact that they dont have to rely on GPS for location is really great as this helps in including an already huge database which might not have GPS locations. One major drawback being that the SfM has a really huge running time ranging from about few hours to even 2 weeks. Another aspect that i find really interesting is that Geo-registering with a satellite image or floor plan or digital elevation map requires certain amount of human computation to determine the correct transformation.
Photo rendering and Navigation:
I feel that the authors have done a good job of explaining their interface with this following video - https://vimeo.com/30584674 and hopefully this covers the interface details in the best way possible.
The navigation techniques that they have used are really good and deserve a mention. They have a free flight navigation which can be used to move the virtual camera back,left, right ,up and down and on a user's click to a different area goes there and fills it with that image. Also the ability to view object  beyond the field of view is a great feature  and the explorer also lets you find similar photos or zoom out or bring up high resolution photos of details in an image
Object based navigation is basically asking for photos of a particular object by selecting the object in a 2D grid (Very intuitive). There are various challenges that the authors have dealt with like some points may get included by the user that were not intended to be.
Enhancing Photos:
A user can register their photos after initial photos have been registered, The way in which they add new photos is particularly interesting- The user has to drop the photo to an approximate location on the map making it a form of human computation and allows them to run an abbreviated version of the SfM.
Annotating the image and then applying this annotation to all the images which have that object is really great. Also the annotation is scale variant and can handle partial and full occlusions 
Conclusion:
The system boasts of a number of features, While i felt initially that some features were already implemented before like Google street view, I later found out that Google street view in fact came out in 2007 and this paper was published in 2006. I am not sure about how efficiently the system would work if it were to be given a mix set of images (of different sites). Another issue is the time it takes to run SfM is a lot which needs to be sped up if it was to be used in everyday life. While there are other limitations that the authors have mentioned I feel that these features were fairly advanced for 2006.  

Comments

Popular posts from this blog

Why start blogging

Interactive 3D Modeling of Indoor Environments with a Consumer Depth Camera