Andrew Kirmse's Page - Topographic Prominence (original) (raw)

by Andrew Kirmse

May, 2017, updated January 2020

January, 2023: An updated analysis with higher resolution data is available here.

This site describes my effort to compute the prominence of every mountain in the world down to 100 feet.

Prominence, also known as rise or re-ascent, has several equivalent definitions:

In many ways prominence is a better measure of the "interestingness" of a mountain than its elevation. The highest mountains in the world are all in the Himalayas, and a list of the highest peaks shows little variety. In contrast, the most prominent peaks of the world (see Wikipedia) are much more interesting.

The lowest point on a walk from a peak P to a higher peak is known as the key saddle (or sometimes key col). The prominence of P is the difference in elevation between P and the key saddle. It can be shown that each key saddle is unique; that is, it corresponds to only one peak.

Illustration of isolation from Wikipedia.

The three most important objective measures of a mountain are its elevation, isolation, and prominence. Finding the height of mountains has a storied history going back to the development of trigonometry for surveying. Today we have very accurate methods based on satellites, and the elevation of every point on Earth is known to a reasonable degree of accuracy. Once the elevation of every point is known, finding isolation (the distance to a higher point) can be done by studying topographic maps, and this was done for selected large mountains as the Internet became popular. Prominence was also initially determined by looking at maps, trying to find the key saddle for each big mountain. This could be very difficult if the key saddle was far away. Starting in the 1990s, computers began to attack the problem, with the ultimate goal of finding the prominence of every mountain in the world.

From about 2005-2010 I worked on Google Earth, where one of my tasks was to build up a terrain database for Earth to use. We gathered terrain data from many different sources, merged them, and corrected many of their errors. In 2014, as I was getting more into peakbagging as a hobby, I noticed that the peakbagger.com Web site was missing prominence values for some of the minor mountains I had climbed in Europe. I wrote an email to Edward Earl, whom I had heard calculated many of the prominence values on the site. I soon learned that Edward had written a program called WinProm that could take terrain data and generate prominence values from all of the peaks within a given area. I offered to run Edward's algorithms over the Google Earth terrain database using thousands of computers in parallel to find the prominence of every little hill. Over the next few months Edward worked frantically to adapt his code to Google's environment. He was also working on some improvements he had in mind for awhile.

By the time I left Google in early 2015, I had sent Edward the output of our first global run. He still had some changes he wanted to make, and he had substantial work to do to glue together the pieces of output I had sent him. Work continued sporadically over the next few months with another Google engineer, until we received the shocking news in June that Edward had died on a mountaineering trip in Alaska.

Several months later, I got in contact with Edward's brother Jim, who had found the WinProm code on Edward's computer. I was able to get it running again, and we released it as open source (here). The program was Windows only, and it hadn't been updated in quite awhile, so it didn't work with more modern terrain data formats. I figured a few curiosity seekers might take a look at it, but that it had no real future.

Over the next year I periodically returned to the problem of computing prominence for the whole world. I discussed potential algorithms with colleagues, and we even built a few prototypes and ran them over small land masses like Hawaii. These approaches were all limited by the size of the region they could examine, so they wouldn't be able to handle the entire world. This is important when computing prominence, because a key saddle can be very far away from its corresponding peak.

At some point I stumbled across an appendix Edward had written in Adam Helman's book "The Finest Peaks: Prominence and Other Mountain Measures". Although the appendix was short, it went into enough detail about how Edward had developed WinProm that I thought I might be able to puzzle it out. In particular, Edward discussed how he initially implemented, and then later replaced the algorithm I then thought was most favorable. As a dry run for a prominence calculation, I spent a few months calculating the isolation of every peak in the world from digital terrain data. With this experience in hand, I was ready to implement a prominence analysis, based partly on what I was able to learn from studying WinProm.

As with my earlier isolation calculation, I started with Jonathan de Ferranti's global terrain data set at viewfinderpanoramas.org. Jonathan's data is based on the Shuttle Radar Topography (SRTM) mission, which used radar from the Space Shuttle to measure elevations. Voids in the SRTM data have been filled in from topographic maps and other digital sources. Coverage has been extended beyond SRTM's +/- 60 degrees latitude to cover the whole world. Jonathan's data has 90m spacing (3 arcseconds) at the equator. In Antarctica, the source data is 200m resolution, resampled to 3 arcseconds. This data comes in 1 degree square tiles, with 1201 pixels on a side.

Because prominence has been determined accurately by hand for many thousands of peaks in North America, I had to use higher resolution terrain data if I hoped to compare against existing known values. In Canada and Mexico, the U.S. government supplies 1 arcsecond (30m) data. For the lower 48 states and Hawaii, 1/3 arcsecond (10m) data is available. These latter tiles are quite large, at 10812 pixels on a side.

Some of these data sets are advertised as "seamless", meaning that neighboring tiles share exactly the same pixels along their common border. In practice I found this not always to be the case. Because my prominence algorithm requires the edges to match exactly, I had to enforce this myself by copying the shared edge pixels from one tile to another. This required loading extra data from disk, which slowed down the USA computation considerably.

Digital terrain data that includes Mt. Elbert in Colorado

Edward's main advance with WinProm was to discover a topological algorithm for computing prominence. In this approach, the pixels of the terrain data are first transformed into a network of peaks and saddles (known as a graph in computer science). Graph theory is a well-understood area of mathematics and much is known about how to manipulate such networks. Because the graph is very sparse---there are many fewer peaks and saddles than there are terrain pixels---once the graph is built, any further operations are very fast. This is important because of the huge size of the terrain data, which is in the billions of pixels.

I wrote new computer programs in C++ to compute prominence. The general scheme of the calculation was this:

The divide tree for part of Nevada

Here's some more information on how each step was performed:

1: Find peaks and saddles

A peak is a flat area that is higher than all of its neighboring pixels. A saddle is a flat area that has at least two independent, higher areas in its border. In other words, in order to walk from one higher area of the saddle's border to the other, you must go through the saddle. This part of the algorithm was by far the slowest, and it required considerable optimization. One huge saddle in southern Louisiana had a border over 5 million pixels in size!

Each peak or saddle was assigned a number, and every pixel of the peak or saddle was filled in with that number in a parallel array called a domain map. Note that it is possible for a flat area to be neither a peak nor a saddle. These areas were also filled in the domain map using a special number.

Saddles with more than two independent higher areas are split up into multiple saddles at this point, until each remaining saddle has only two independent higher areas. The way pairs of higher areas are selected is arbitrary, in that it doesn't affect the final prominence results. However, for aesthetic reasons, we may not want to see paths across the flat region cross each other, and we may not want to see large, artificial-looking detours. With its emphasis on graphical display, WinProm goes to great lengths to produce nice looking paths across saddles. Since I expected most of these paths to be pruned away later anyway, I simply paired the highest higher area with each other higher area, and placed the saddle location as close to the midpoint between them as possible.

2: Build a divide tree

This is the critical step of the algorithm. The divide tree determines which peaks are connected to each other by walks along ridges. Staying on a ridge implies staying as high as possible, which meets the definition of prominence as the minimum vertical distance one must descend.

To build the tree, we examine each saddle in turn. From the saddle, we follow the two lines of steepest ascent until they reach a pixel in the domain map that is marked with the identifier of a peak. (During this upward walk, it is possible to encounter other flat areas, and care must be taken to cross the flat area and continue climbing at a higher point in the flat area's boundary.) There are three possible outcomes of this walk:

Locations of basin saddles, showing how they tend to occur along rivers

4: Merge divide trees

Although the next step chronologically is simplifying the divide tree, it's easier to understand simplification if we first talk about how to glue two divide trees together. Since it's common for a peak to have its key saddle in a tile other than its own, we must be able to glue divide trees together before we can compute correct prominence values.

If two divide trees are from tiles that do not touch, then merging them is trivial. If the tiles do touch, they will have one or more edges in common. The key to merging the trees is to introduce the concept of a runoff. A runoff is an area of edge pixels higher than its neighbors, considering only the tile's edge pixels (i.e. the non-edge pixels are not examined at all). It's like half a saddle: the runoff has lower border pixels in this tile, and when the neighboring tile is examined, we may find that there are at least two independent higher areas of pixels, one from each tile. If so, we will introduce a saddle at the point of the runoff and attempt to add it to the divide tree. This can cause a divide tree edge to cross the tile boundary, splicing the two trees together.

The actual mechanism for converting runoffs to potential saddles is somewhat more complicated than this. But the general idea is to find pairs of runoffs from neighboring tiles that are at exactly the same point. This is the reason that neighboring tiles must have exactly the same pixels along their shared border. Without this, the runoffs will not align with each other, and we will fail to find connections between the two trees.

Runoffs (question marks) occur along the edge of a tile. Each is connected to the peak (orange) that is reached by going uphill from the runoff (blue lines). These runoffs will be resolved when the tile to the south is merged in.

5: Calculate prominence values

We make a copy of the divide tree and perform a peak sorting operation to convert the copy into a prominence island tree. In this operation, we take each peak P and move it up the tree (towards its parent) until we find a higher peak. At each step, if the old (lower) parent has a lower saddle, we move it underneath P in the tree, otherwise we leave it where it was. After we've performed this operation on every peak in the divide tree, the peaks form a data structure called a heap, where the highest peak is at the root. We can then find the prominence of a peak by computing the difference between its elevation, and the elevation of the saddle between it and its (higher) parent in the prominence island tree. The root of the tree has no parent, and its prominence is equal to its elevation. Root peaks are the highest points of land masses.

Now that we know how to compute prominence values, we can return to the problem of simplifying the divide tree by removing low prominence peaks.

3: Simplify the divide tree

In the first two steps, we identified every peak and saddle in a one degree tile. Even a 1-foot-high peak will show up in the divide tree at this point. We don't yet know the prominence of anything, and we can't discard a peak until we're sure that its prominence is below our threshold (100 feet in the case of this analysis). A single tile can have well over 1 million "peaks". Clearly if we tried to build a global divide tree with this many peaks, our algorithm would slow to a crawl. We need to remove peaks that are definitely low prominence before we merge with neighboring tiles.

This simplification (or pruning) step is the most complicated part of the algorithm, and I'll only sketch it here. One thing to notice is that the divide tree always connects peaks and saddles, never peaks to peaks directly, or saddles to saddles. Therefore, if we want to remove a peak from the tree to make it smaller, we must also remove one of its neighboring saddles. So before we remove anything, we need to identify those peaks that definitely have under 100 feet of prominence, and those saddles that cannot possibly be the key saddles of a peak with over 100 feet of prominence. Where such a peak and saddle are neighbors, it is safe to remove the pair of them. We repeat the process until there are no more peaks eligible to be removed.

We can calculate the prominence of each peak by building a prominence island tree as in step 5. To compute a similar "prominence" value for each saddle, we walk from each peak to a higher peak along the divide tree, identifying the lowest saddle we encounter. The "prominence" of this saddle is the difference between its elevation and that of the original peak where the walk started. In addition to this, we trace paths through the tree between pairs of runoffs. The lowest saddle along such a path could be the key saddle of a peak outside the tile. Such a peak could be arbitrarily high, and thus it is never safe to remove such a saddle.

The merging process can leave behind low prominence peaks whose prominence couldn't be determined before the merge. Thus, after merging multiple divide trees into one tree, we run the simplification again on the merged tree.

The divide tree around Mt. Washington, New Hampshire. Left is unpruned, right is pruned to 100 feet of prominence.

6: Filter out peaks

In practice, we want to compute prominence inside large, connected land regions surrounded by water, like Australia. All of the peaks and saddles in such a region are self-contained and cannot be influenced by terrain outside the region. However, because the terrain data comes in one degree square tiles, we wind up with some extra peaks that are not in the region of interest. A good example is the Strait of Gibraltar: while we may want to compute prominence only for peaks in Africa, because of where the tile boundaries lie, we wind up pulling in some terrain in Spain. Their prominence is incorrect because their key cols could lie far away in Europe. To restrict our interest to only Africa, I wrote a tool that takes a KML polygon as input and discards any peaks outside the polygon.

As a compromise between keeping divide trees small, and generating 100% accurate prominence values, I wound up splitting land regions at the Panama and Suez Canals, and running Africa separately from Eurasia, and South America separately from North America. This had small impacts on the prominence values for the high points of the lower region (Kilimanjaro for Africa, and Denali for North America). However, their prominence values were already well known, and it wasn't worth greatly increasing the runtime in order to get those two values exactly right.

7: Compare results with peak databases

At this point I had prominence values for peaks expressed as latitude/longitude pairs. What I wanted was prominence values on well-known mountain names. To do this, I loaded the databases for Peakbagger.com and ListsOfJohn.com, and merged peaks that were within 200 meters of each other. Of course, both the terrain data and the peak databases contain errors, so this process did not always succeed. Manual review is necessary before the prominence values can make it into these databases.

I split up my divide trees roughly by continent. However, things were trickier at the border of higher and lower resolution data sets. Because there was no way to force shared pixels to be identical along the edges between different resolutions---which the algorithm requires---I had to use KML polygons to split runs along a convenient border. This could potentially introduce problems where a peak in one data set would have its key saddle in another, which wouldn't be found.

To minimize these problems, I extended the region of the lower resolution data set's computation well into the region of the higher resolution area. Only at the very end, after the peaks' prominence was calculated, did I filter the peaks to the lower resolution area. Unfortunately the same trick can't be done with the higher resolution area, because the high resolution data just doesn't exist in the lower resolution area. There are thus some peaks near the borders of high resolution areas whose computed prominence is too high. I've noted this below in the discussion for each region. A possible future enhancement might be to run these areas with lower resolution terrain data, and substitute those prominence values for the artificially high ones.

Although the purpose of my computation was to greatly increase the coverage of known prominence values, it was also interesting to check its accuracy on high-prominence peaks. In order to determine which "ultra prominent" peaks (>= 1500m, known as ultras) were known, I looked at peaklist.org and peakbagger.com. I also checked with Jonathan, who originally found many of these ultras, because in some cases peaklist.org has not been updated with ultras that have been found in the last few years. Because the terrain data tends to underestimate peak elevations more than saddle elevations, if anything my prominence values tend to be a little low. Thus there were many ultras on peaklist.org that I found to have under 1500m of prominence. I did not investigate them further, because in most cases they have been carefully investigated by hand using topographic maps.

Finally, a note on the 100 foot prominence threshold. In most of the world I used Jonathan's terrain data, which is a surface model. That is, it includes the tops of trees and buildings, rather than measuring the height of the ground. This can lead to lots of small peaks that aren't really there. I went down to 100 feet globally because I was going to do it in the US anyway, where the terrain data does measure the ground level, and to prove the power of the topographic approach. As data improves over time, it's clear that we can quickly recompute prominence values down to any desired level. For now, prominence values under 300 feet outside the US should be viewed with extreme suspicion, especially in forested or urban areas.

Distribution of peaks with at least 2000 feet of prominence

Africa

I cut Africa along the Suez canal, and included Socotra and Madagascar. I found 3 ultras not in peaklist.org, but Jonathan had previously found them.

Antarctica

Because the terrain data source for Antarctica has only 200m horizontal resolution, I went down to only 300 feet of prominence here. The huge ice sheet is remarkably smooth, and there are few prominent points outside the gigantic mountain ranges. I may have found a new ultra on Alexander Island. Maps and terrain data are poor here, and there is some confusion over the placement even of named peaks.

Australia

I hand-checked the peaks with 2000 feet of prominence in Australia, because Peakbagger has good coverage of them in its database. I found several new ones.

Central America

I had to treat Central America as a separate entity because there is higher resolution data available for Mexico. I did not find any new ultras here. In retrospect, I probably should have included Central America in the run with South America. I extended the computation far enough north to include the key saddle of Tajumulco, the highest mountain.

Eurasia

Europe and Asia together are an enormous land mass. Merging the thousands of individual divide trees went smoothly, however. I believe I found 7 new ultras here, 4 in China, and 1 each in Russia, Iran, and Turkey. One of the reasons these weren't found earlier is that their key cols are very far from the peaks, and previous methods would have had to load a tremendous amount of terrain at once to find them. This illustrates a major advantage of the current method, where tiles are analyzed completely independently.

Greenland

I believe there are 3 new ultras here, all along the coast.

Islands

I tried hard to reduce the size of the continental analyses, because I didn't know in advance what the performance would be like, especially for Eurasia. This left me with a huge swath of islands, from the Indian Ocean over to Easter Island, including Indonesia and New Zealand. I had to develop some special techniques to correctly handle the antimeridian (the line between -180 and +180 degrees of longitude). I did not find any new ultras here.

South America

Published terrain databases for Patagonia are surprisingly poor. Jonathan's void-fill terrain database allowed me to find 3 new ultras here, all in remote areas. I cut South America along the Panama Canal.

Mexico

Mexico has 1 arcsecond data for the entire country. Those tiles are 9 times larger than the 3 arcsecond data I had been using up to this point, so naturally the analysis ran slower here. There are likely overly large prominence values near the border with Guatemala, because that's where the higher resolution data ends. Along the northern border I included a substantial section of the southern U.S., enough to include the key saddle of Picacho del Diablo. I did not find any new ultras in Mexico.

Canada

One arcsecond data is available for most of Canada, but annoyingly, not for some very mountainous areas in the west near Alaska. I thus had to cut that section out and run it with lower-resolution data along with Alaska. I drew a fairly detailed border along the southern edge, but it's not guaranteed that every peak along the edge is actually in Canada; a few may be in the U.S. I did not find any new ultras here, although I was surprised to find a large number of peaks where the high point is far away from the one in peaklist.org. In several cases I think it is likely that peaklist has the wrong mountain as the ultra.

Alaska

Unfortunately I had to use the lower resolution 3 arcsecond data for Alaska, because everything with higher resolution doesn't have complete coverage. I didn't find any new ultras here.

USA (Lower 48 + Hawaii)

I ran the USA last because I knew it would be my toughest test. The terrain data here is 1/3 arcsecond, or 81 times larger than the data I used almost everywhere else. Runs would take much longer. In very flat areas near the Mississippi River, I encountered huge flat saddle areas that took forever to analyze. I had to make some adjustments to speed up the algorithm. It was ironic that the analysis was getting so bogged down in areas under 10 feet of elevation where there were no peaks of interest at all!

When I ran my isolation calculation, I had copied peak elevations from Peakbagger into the terrain data so that I would get good agreement in peak locations during post-processing. By this point, after I had examined the results of most of the world, I realized that had been a mistake. In some areas, especially in Indonesia and Africa, Peakbagger's peak locations were very far off, enough to generate incorrect values even for ultra prominent peaks. Thus, I did not include Peakbagger elevations up to this point. However, in the US, I had pretty good confidence in Peakbagger's peak locations. The big advantage of including them is that their elevations are often taken from surveyed spot elevations on topographic maps. So I did wind up including Peakbagger elevations for the US outside Alaska.

Not only did I not find any new ultras in the US, but Peakbagger appears to have a complete list of 2000 foot prominence mountains, too (though there are some peaks on which my analysis shows 2000 feet of prominence and it's slightly less in Peakbagger). I also compared my list of 300 foot prominence peaks to the ListsOfJohn.com database, which is meant to be complete down to this level for the US. I found some differences, but they will all have to be checked by hand to weed out terrain errors and elevation differences with topo maps.

Near the Canadian and Mexican borders, prominence values may be too high. More than a mile or two from the border, only the largest peaks are affected. In particular, my analysis misidentified Mount Bonaparte in Washington, the Boundary County High Point in Idaho, and Northwest Peak in Montana as ultras, when in reality their prominences are much lower.

Here's a summary of the number of peaks I found, by region and by minimum prominence:

* Includes 3 peaks near Canadian border that actually have much less prominence

I analyzed peakbagger.com's database from April 17, 2017, which contained 68,806 peaks. After matching, the peaks were classified as follows:

New ultras

The following table describes the new ultra-prominent peaks found by the analysis.

Raw data

A large zip file with the roughly 7.8 million peaks of at least 100 feet of prominence is available here. The unzipped file has one peak per line, with fields separated by commas, in the following format:

latitude,longitude,elevation in feet,key saddle latitude,key saddle longitude,prominence in feet

A key saddle at (0, 0) means that the peak is the high point of a land mass. Some island high points have their key saddles listed at degree intersections in the water (such as 27, -15) instead. (These are the coordinates of runoffs at the corners of tiles that didn't get removed.)

Elevation is in feet to avoid losing precision when converting to meters. The peaks are sorted by decreasing prominence.

Visualizations

A heat map of the 1000 most prominent mountains in the world

All of the P300s in northern South America. Click on the image to go to a live map.

All the P600m peaks in Scandinavia.

All the P300s in the world, color-coded by prominence range (300, 1000, 2000, 5000 feet). Click to go to a live map.

Hundreds of P300s among the sand dunes of Saudi Arabia (left) and Algeria (right).

I created some of these maps with Google Fusion Tables, which Google turned down in 2019. I have a replacement map here.

Just as prominence measures the height of mountains around their surroundings, anti-prominence (or subsidence) measures the depth of basins below their surroundings. Subsidence can be computed with the same algorithm as prominence by simply flipping the sign of the DEM elevations at the beginning and ending steps. Using Jonathan de Ferranti's 1-arcsecond data exclusively, and omitting Antarctica where the data is too coarse, I found a little over 10,000 basins with at least 300 feet of subsidence. This is likely an overcount, as it is difficult to distinguish true basins from elevation errors in densely forested areas where topographic maps aren't available.

The results are available in KMZ and CSV formats.

I was surprised by the interesting variety of low points, and how they are clustered in certain regions. They come in several types:

I found the most prominent basin in the world to be Aydingkol, the low point of China, at 4290 feet of anti-prominence. Saline Valley in California is #2 at 3888 feet. This demonstrates that the lowest points are not always the most prominent, which takes some adjustment if you're used to thinking only about the prominence of high points. The Dead Sea and Death Valley are actually less prominent than these other two. These are all the points with at least 2000 feet of subsidence: