Fast robust large-scale mapping from video and internet photo collections (original) (raw)
Related papers
Modeling the world from internet photo collections
2008
Abstract There are billions of photographs on the Internet, comprising the largest and most diverse photo collection ever assembled. How can computer vision researchers exploit this imagery? This paper explores this question from the standpoint of 3D scene modeling and visualization.
Towards Urban 3D Reconstruction from Video
2006
The paper introduces a data collection system and a processing pipeline for automatic geo-registered 3D reconstruction of urban scenes from video. The system collects multiple video streams, as well as GPS and INS measurements in order to place the reconstructed models in georegistered coordinates. Besides high quality in terms of both geometry and appearance, we aim at real-time performance. Even though our processing pipeline is currently far from being real-time, we select techniques and we design processing modules that can achieve fast performance on multiple CPUs and GPUs aiming at real-time performance in the near future. We present the main considerations in designing the system and the steps of the processing pipeline. We show results on real video sequences captured by our system.
City-Scale Reality Modeling from Community Photo Collection
icg.tugraz.at
Many applications in Augmented Reality (AR) require the use of Reality Models -3D information tailored for use in AR systems. Available virtual information obtained from CAD models and GIS data is not always trustworthy for registration in AR, as it often has not been verified and might not accurately represent reality. Nowadays, images of landmarks, available on the Internet, allow for automatic 3D modeling, and we propose to deploy these datasets to model the reality. This paper is the first to use city scale models reconstructed from Internet photo collection in an AR framework. Our method augments the information of the generated Reality Models, which are typically represented through texture images and 3D geometry (sparse or dense). We automatically annotate the 3D models by analyzing the rich information contained in user provided tags of the original images. Additionally, we introduce an automatic procedure to align new uncalibrated images to our reality model, thus providing a method for the transfer of augmentation from the 3D-model to an image.
Geo-registered 3D models from crowdsourced image collections
Geo-spatial Information Science, 2013
In this article we present our system for scalable, robust, and fast city-scale reconstruction from Internet photo collections (IPC) obtaining geo-registered dense 3D models. The major achievements of our system are the efficient use of coarse appearance descriptors combined with strong geometric constraints to reduce the computational complexity of the image overlap search. This unique combination of recognition and geometric constraints allows our method to reduce from quadratic complexity in the number of images to almost linear complexity in the IPC size. Accordingly, our 3D-modeling framework is inherently better scalable than other state of the art methods and in fact is currently the only method to support modeling from millions of images. In addition, we propose a novel mechanism to overcome the inherent scale ambiguity of the reconstructed models by exploiting geo-tags of the Internet photo collection images and readily available StreetView panoramas for fully automatic geo-registration of the 3D model. Moreover, our system also exploits image appearance clustering to tackle the challenge of computing dense 3D models from an image collection that has significant variation in illumination between images along with a wide variety of sensors and their associated different radiometric camera parameters. Our algorithm exploits the redundancy of the data to suppress estimation noise through a novel depth map fusion. The fusion simultaneously exploits surface and free space constraints during the fusion of a large number of depth maps. Cost volume compression during the fusion achieves lower memory requirements for high-resolution models. We demonstrate our system on a variety of scenes from an Internet photo collection of Berlin containing almost three million images from which we compute dense models in less than the span of a day on a single computer.
Enhancing Large Urban Photo Collections with 3D Ladar and GIS Data
Recent work in computer vision has demonstrated the potential to automatically recover camera and scene geometry from large collections of uncooperatively-collected photos. At the same time, aerial ladar and Geographic Information System (GIS) data are becoming more readily accessible. In this paper, we present a system for fusing these data sources in order to transfer 3D and GIS information into outdoor urban imagery. Applying this system to 1000+ pictures shot of the lower Manhattan skyline and the Statue of Liberty, four proof-of-concept examples of geometry-based photo enhancement are presented which are difficult to perform via conventional image processing: feature annotation, image-based querying, photo segmentation and city image retrieval. In each example, high-level knowledge projects from 3D world-space into georegistered 2D image planes and/or propagates between different photos. Such automatic capabilities lay the groundwork for future real-time labeling of imagery shot in complex city environments by mobile smart phones.
Real-Time Video-Based Reconstruction of Urban Environments
2007
We present an approach for automatic 3D reconstruction of outdoor scenes using computer vision techniques. Our system collects video, GPS and INS data which are processed in real-time to produce geo-registered, detailed 3D models that represent the geometry and appearance of the world. These models are generated without manual measurements or markers in the scene and can be used for visualization from arbitrary viewpoints, documentation and archiving of large areas. Our system consists of a data acquisition system and a processing system that generated 3D models from the video sequences off-line but in real-time. The GPS/INS measurements allow us to geo-register the pose of the camera at the time each frame was captured. The following stages of the processing pipeline perform dense reconstruction and generate the 3D models, which are in the form of a triangular mesh and a set of images that provide texture. By leveraging the processing power of the GPU, we are able to achieve faster...
BUILDUP: interactive creation of urban scenes from large photo collections
Multimedia Tools and Applications, 2016
We propose a system for creating images of urban scenes composed of the large structures typical in such environments. Our system provides the user with a precomputed library of image-based 3D objects, such as roads, sidewalks and buildings, obtained from a large collection of photographs. When the user picks the 3D location of a new object to insert, the system retrieves objects that have all the required properties (location, orientation and lighting). Then, the user interface guides the user to add more objects enabling non-experts to make a new composition in a fast and intuitive way. Unlike prior work, the entire image composition process is done in the 3D space of the scene, therefore inconsistent scale or perspective distortion does not arise, and occlusions are properly handled.
Building streetview datasets for place recognition and city reconstruction
Google Maps API combined with Street View images can serve as a powerful tool for place recognition or city reconstruction tasks. In this paper, we present a way how to build geotagged datasets of perspective views from Google Maps. Given the initial GPS coordinates, the algorithm can build a list of panoramas in a certain area, download corresponding panoramas, and generate perspective views. In more detail, each panorama on Google Maps Street View contains meta data from which the GPS location and the direction of the view can be extracted. Moreover, the information about the neighbouring panoramas can be obtained as well, hence, a list of panoramas covering a certain area can be built. Downloading panoramas from the list and combining it with the meta data, each downloaded panorama is cut into a set of overlaping perspective views and stored while the camera GPS location, yaw, and pitch are coded in the filename of the perspective view. The geotagged database is subsequently used for place recognition and structure from motion 3D reconstruction.
Geolocalization of Crowdsourced Images for 3-D Modeling of City Points of Interest
IEEE Geoscience and Remote Sensing Letters, 2015
Geolocalization of crowdsourced images is a challenging task that is getting increased attention nowadays due to the rise in popularity of geotagging and its applications. Among these applications, 3-D modeling from Internet photograph collections is a very active research topic with great promise and potential. In order to automize and optimize the crowdsourced 3-D modeling process, this letter proposes a novel framework that can be used for automatic 3-D modeling of city points of interest (POIs), such as statues, buildings, and temporary artworks. Crowdsourced images related to the POI and its location are collected using a geographical Web search process based on geotags and semantic geodata. Subsequently, panoramic Google Street View (SV) images are used to geolocalize the images. If enough feature matches are found between the image and one of the SV images, the image is annotated with the location metadata of the best matching image from the SV database. Otherwise, when too few matches are found, the image most probably will not contain the POI in its field of view (FOV), and it is filtered out. For optimal performance, the equirectangular panoramic SV images are transformed into an SV data set of perspective cutouts facing the POI with different pitches and FOVs. From this data set, a basic 3-D model of the POI and its environment is generated. Finally, the geolocalized crowdsourced images refine and optimize the 3-D model using the matching matrix that is generated from the geolocalization results. Experiments show the feasibility of our approach on different types of city POIs. Our main contribution is that we can decrease the 3-D modeling computation time by more than half and significantly improve the model completeness. Finally, it is important to remark that the applicability of the proposed framework is not limited to 3-D modeling but can also be used in other domains, such as geoaugmented reality and location-based media annotation.