Scalable rendering on PC clusters (original) (raw)
Related papers
Hybrid sort-first and sort-last parallel rendering with a cluster of PCs
2000
We investigate a new hybrid of sort-first and sort-last approach for parallel polygon rendering, using as a target platform a cluster of PCs. Unlike previous methods that statically partition the 3D model and/or the 2D image, our approach performs dynamic, viewdependent and coordinated partitioning of both the 3D model and the 2D image. Using a specific algorithm that follows this approach, we show that it performs better than previous approaches and scales better with both processor count and screen resolution. Overall, our algorithm is able to achieve interactive frame rates with efficiencies of 55.0% to 70.5% during simulations of a system with 64 PCs. While it does have potential disadvantages in client-side processing and in dynamic data management-which also stem from its dynamic, view-dependent nature-these problems are likely to diminish with technology trends in the future.
Performance Analysis of a 3D Parallel Volume Rendering Application on Scalable Tiled Displays
Current high-speed general-purpose networks, such as 1Gigabit/10Gibabit networks, are fast enough to handle the demanding tasks of routing streams of graphics primitives. Systems built with such networks and off-the-shelf GPUs and PCs, are being used to provide graphics clusters, which are more economical than expensive graphics supercomputers. Further, the current trend in building high-resolution display systems is to tightly couple inexpensive LCD/TFT monitors to provide a large-scale high-resolution display system for detailed scientific visualizations with an increased pixel density, such as the GeoWall (3) or Lightning-2 (9). The graphics clusters can drive the large-scale high- resolution display systems to possibly replace the limited output resolution of standard devices such as monitors and video projectors. A graphics cluster with new display technology and off-the-shelf, inexpensive components has made possible affordable large-scale high-resolution display systems. In t...
Equalizer: a scalable parallel rendering framework
IEEE Transactions on Visualization and Computer Graphics, 2008
Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantadges over previous approaches, present example configurations and usage scenarios as well as scalability results.
A Distributed Rendering System for Scientific Visualization
2002
Parallel, real-time rendering using clusters of commodity components has rapidly become a topic of significant interest within the scientific visualization community. This paper describes the design and implementation of a very large scale, distributed system that renders 6144 × 3072 pixel images and projects them across a 14 × 7 display wall at 35 frames per second.
Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs
Lecture Notes in Computer Science, 2003
An image-space-parallel, ray-casting-based direct volume rendering algorithm is developed for rendering of unstructured data grids on distributed-memory parallel architectures. For efficiency in screen workload calculations, a graph-partitioning-based tetrahedral cell clustering technique is used. The main contribution of the work is at the proposed model, which formulates the screen partitioning problem as a hypergraph partitioning problem. It is experimentally verified on a PC cluster that, compared to the previously suggested jagged partitioning approach, the proposed approach results in both better load balancing in local rendering and less communication overhead in data migration phases.
2004 IEEE Symposium on Volume Visualization and Graphics
Hardware-accelerated image composition for sort-last parallel rendering has received increasing attention as an effective solution to increased performance demands brought about by the recent advances in commodity graphics accelerators. So far, several different hardware solutions for alpha and depth compositing have been proposed and a few of them have become commercially available. They share impressive compositing speed and high scalability. However, the cost makes it prohibitively expensive to build a large visualization system. In this paper, we used a hardware image compositor marketed by Mitsubishi Precision Co., Ltd. (MPC) which is now available as an independent device enabling the building of our own visualization cluster. This device is based on binary compositing tree architecture, and the scalable cascade interconnection makes it possible to build a large visualization system. However, we focused on a minimal configuration PC Cluster using only one compositing device while taking cost into consideration. In order to emulate this cascade interconnection of MPC compositors, we propose and evaluate the hybrid hardware-assisted image composition method which uses the OpenGL alpha blending capability of the graphics boards for assisting the hardware image composition process. Preliminary experiments show that the use of graphics boards diminished the performance degradation when using an emulation based on image feedback through available interconnection network. We found that this proposed method becomes an important alternative for providing high performance image composition at a reasonable cost.
Network aware parallel rendering with PCs
2004
Interactive rendering of complex models has many applications in the Virtual Reality Continuum. The oil&gas industry uses interactive visualizations of huge seismic data sets to evaluate and plan drilling operations. The automotive industry evaluates designs based on very detailed models. Unfortunately, many of these very complex geometric models cannot be displayed with interactive frame rates on graphics workstations. This is due to the limited scalability of their graphics performance. Recently there is a trend to use networked standard PCs to solve this problem. Care must be taken however, because of nonexistent shared memory with clustered PCs. All data and commands have to be sent across the network. It turns out that the removal of the network bottleneck is a challenging problem to solve in this context.
Object-space parallel polygon rendering on hypercubes
Computers & Graphics, 1998
ÐThis paper presents algorithms for object-space parallel polygon rendering on hypercube-connected multicomputers. A modi®ed scanline z-buer algorithm is proposed for local rendering phase. The proposed algorithm avoids message fragmentation by packing local foremost pixels in consecutive memory locations eciently, and it eliminates the initialization of scanline z-buer for each scanline. Several algorithms, utilizing dierent communication strategies and topological embeddings, are proposed for global z-buering of local foremost pixels during the pixel merging phase. The performance comparison of these pixel merging algorithms are presented based on the communication overhead incurred in each scheme. Two adaptive screen subdivision heuristics are proposed for load balancing in the pixel merging phase. These heuristics utilize the distribution of foremost pixels on the screen for the subdivision. Experimental results obtained on an Intel's iPSC/2 hypercube multicomputer and a Parsytec CC system are presented. Rendering rates of 300K±700K triangles per second are attained on 16 processors of Parsytec CC system in the rendering of datasets from publicly available SPD database.
FlexRender: A distributed rendering architecture for ray tracing huge scenes on commodity hardware
2000
As the quest for more realistic computer graphics marches steadily on, the demand for rich and detailed imagery is greater than ever. However, the current "sweet spot" in terms of price, power consumption, and performance is in commodity hardware. If we desire to render scenes with tens or hundreds of millions of polygons as cheaply as possible, we need a way of doing so that maximizes the use of the commodity hardware that we already have at our disposal. We propose a distributed rendering architecture based on message-passing that is designed to partition scene geometry across a cluster of commodity machines, allowing the entire scene to remain in-core and enabling parallel construction of hierarchical spatial acceleration structures. The design uses feed-forward, asynchronous messages to allow distributed traversal of acceleration structures and evaluation of shaders without maintaining any suspended shader execution state. We also provide a simple method for throttling work generation to keep message queueing overhead small. The results of our implementation show roughly an order of magnitude speedup in rendering time compared to image plane decomposition, while keeping memory overhead for message queuing around 1%.
High-Performance Rendering on Clusters of Workstations
Geometric Modeling and Imaging--New Trends (GMAI'06)
The computer-graphics aspects of the visualization of large data sets, in particular, digital models of real or planned solid objects in a heterogeneous distributed environment are investigated. It is demonstrated that binaryswap compositing does not scale well on networks of workstations. A multi-server system, based on scanline algorithms and using Java technology, is proposed. The proposed system is efficient as servers only need to solve a problem of growth rate of n log n, it is fault tolerant as both lost messages and server failures are tolerated, and it has negligible hardware costs as it runs on existing networks of workstations. The system is also scalable, as data sets are sent to all servers in the same packets, regardless of the number of servers, and the amount of data sent back by servers only depends on the resolution of the final image.