Resilient Overlay Networks (original) (raw)
overview | - | what RON's all about |
---|---|---|
papers | - | Papers and publications |
talks | - | selected presentation slides |
data | - | RON experimental data [RON1 and RON2,Live & archived BGP feed, dns] |
sites | - | RON wide-area testbed sites |
resources | - | links for packet traces, routing data, etc. |
related research | - | past and current projects that share some of RON's goals |
people | - | who are we? |
funding | - | who sponsors RON? |
Overview
"Because, sometimes, the Internet doesn't quite work..."
The MIT RON (Resilient Overlay Networks) project is a DARPA-funded effort motivated by the desire to improve the robustness and availability of Internet paths between hosts by an order of magnitude over today's wide-area Internet routing infrastructure. The key design goal in RON is to develop techniques to allow end-hosts and applications to cooperatively gain improved reliability and performance from the Internet. At a glance, RON nodes examine the condition of the Internet between themselves and the other nodes, and, based upon how the network looks, decide if they should let packets flow directly to other nodes, or if they should send them indirectly via other RON nodes. For instance, the group of cooperating systems below can mutually provide a more available and better-performing routing service than what vanilla Internet routing can provide.
RON is an architecture that allows a small group of distributed Internet applications to detect and recover from path outages and periods of degraded performance within several seconds, improving over today's wide-area routing protocols that take at least several minutes to recover. A RON is an application-layer overlay on top of the existing Internet routing substrate. The RON nodes monitor the functioning and quality of the Internet paths among themselves, and use this information to decide whether to route packets directly over the Internet or by way of other RON nodes, optimizing application-specific routing metrics.
The RON project has several components, including:
- Overlay configuration and maintenance.
- Probing and outage detection
- Routing around outages and performance failures
- Application-controlled routing
- Policy routing
- Multi-path routing; QoS routing
- Data forwarding
- API and RON libraries
- Applications (e.g., resilient VPN, resilient conferencing, etc.)
- Data analysis and understanding wide-area routing and fault-tolerance behavior; BGP interactions
- Simulations of RON behavior
RON is part of a larger research agenda on large-scale, robust, Internet-based distributed systems, which spans areas ranging from resilient routing (as in RON) to emerging peer-to-peer systems. Our work on peer-to-peer systems is based on Chord, a scalable p2p lookup service.
RON is also closely related to other current projects at LCS in the area of robust Internet infrastructures and uses some of the ideas from these projects: CM , the Inernet Congestion Manager; and Click-SMP , a modular PC-based router.
RON data, Internet experiments
- RON1 and RON2 datasets - several million latency and loss samples, with thousands of throughput samples taken on the RON testbed
- Real-time BGP Monitor
- DNS analysis
- ron-all-0.2a.tar.gz (release 0.2a, November 2007) (ChangeLog)
- The initial RON release (2001). Abeta, unsupported release of the RON source code.
RON deployment sites
Since early 2001, we have run a real-life RON, which now has 17 sites located around the Internet. Our deployment is international. We have also collected extensive data sets and analyzed them. They will soon be made publicly available on this page.
- Hosting a RON site? - check for FAQs and information
- Internal sites information
- Many thanks to the People and organizations that host RON nodes
Papers
- Scaling All-Pairs Overlay Routing
David Sontag, Yang Zhang, Amar Phanishayee, David G. Andersen, David Karger
CoNEXT, Rome, Italy, December 2009. - Measuring the Effects of Internet Path Faults on Reactive Routing
Nick Feamster, David Andersen, Hari Balakrishnan, and Frans Kaashoek
ACM SIGMETRICS 2003, San Diego, CA, June 2003.
Presentation - Mayday: Distributed Filtering for Internet Services
David G. Andersen
4th Usenix Symposium on Internet Technologies and Systems, Seattle, Washington, March 2003.
Presentation: [Postscript (390k)] [PDF (110k)] - Topology Inference from BGP Routing Dynamics
David G. Andersen, Nick Feamster, Steve Bauer, and Hari Balakrishnan
2nd SIGCOMM Internet Measurement Workshop, Marseille, France, November 2002. - Resilient Overlay Networks
David G. Andersen, Hari Balakrishnan, M. Frans Kaashoek, Robert Morris
Proc. 18th ACM SOSP, Banff, Canada, October 2001.
Presentation (PDF) (292 KB) - DNS Performance and the Effectiveness of Caching
Jaeyeon Jung, Emil Sit, Hari Balakrishnan, and Robert Morris
Proc. 1st ACM SIGCOMM Internet Measurement Workshop, San Francisco, CA, November 2001. - Resilient Overlay Networks
David G. Andersen, SM Thesis, Massachusetts Institute of Technology, May 2001.
[Postscript (8.9 MB)] [ps.gz (1.2 MB)][ PDF (2.2 MB)] (86 pages) - The Case for Resilient Overlay Networks
David G. Andersen, Hari Balakrishnan, M. Frans Kaashoek, and Robert Morris
Proc. HotOS VIII, Schloss Elmau, Germany, May 2001. (best student paper award)
**Presentation:**[Slides (ps)] [Slides (pdf)] [Notes (ps)] [Notes (pdf)] - Fine-Grained Failover Using Connection Migration
Alex C. Snoeren, David G. Andersen, and Hari Balakrishnan
Proc. 3rd USENIX USITS, San Francisco, CA, March 2001.
(Also MIT-LCS-TR-812, September 2000.)
Talks
- Topology Inference from BGP Routing Dynamics. 2002 Internet Measurement Workshop. [Postscript (400k)] [PDF (150k)]
- RON: Choosing Resiliency. 2002 Opensig workshop, Lexington, KY. [Postscript (780k)] [PDF (240k)]
- Resilient Overlay Networks, 18th SOSP, Lake Louise, Alberta, Canada, October 2001.
- Resilient Overlay Networks, MIT LCS Annual Retreat, Cape Cod, June 2001.
- Resilient Overlay Networks, DARPA PI Meeting, Colorado Springs, CO, July 2001.
- Slides from an old presentation comparing existing link probing mechanisms.
Resources
- RIPE NCC stores data about BGP routing table updates.
People
Faculty/PIs: Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris
Graduate Students: David Andersen Nick Feamster Jaeyeon Jung Todd Nightingale Stan Rost Alex Snoeren Jacob Strauss
Collaborators: Ion Stoica
Projects
- The Detour Project at the University of Washington. They developed "sting", which uses TCP to determine forward andvreverse path packet loss rates. There has also been a small project follow-on to Detour by some of David Wetherall's students to test Detour. They simulated some algorithms for forming the routing topology: [Orig ps] [Local Mirror] The projects list is also available.
There are some important differences between RON and Detour. First. RON seeks to prevent disruptions in end-to-end communication in the face of failures. RON takes advantage of underlying Internet path redundancy on time-scales of a few seconds, reacting responsively to path outages and performance failures. Second, RON is designed as an application-controlled routing overlay; because each RON is more closely tied to the application using it, RON more readily integrates application-specific path metrics and path selection policies. Third, we present and analyze experimental results from a real-world deployment of a RON to demonstrate fast recovery from failure and improved latency and loss-rates even over short time-scales. - The Berkeley SPAND project. The Spared Passive Network Performance toolkit lets applications measure and share performance information with other local clients to make better guesses about which (for example) mirror site to use. The SPAND paper contains more information [ps]local ps] as does Mark Stemm's thesis [html] [ps] [local ps].
- RAMPReliable Adaptive Multipath Routing, from UCSD.
Network Characterization
- Craig Labovitz's BGP and network stability information and Delayed Internet Routing convergence paper. (30-second to 3-minute outages from BGP fluctuations.)
- Inter-AS traffic patterns (Fang & Peterson, Princeton)
- Much work at ACIRI
Measurement Tools
- The IETF's Internet Protocol Performance Metrics project.
- The CAIDA Network Measurement Tools Taxonomy
- The IDMaps Project (Internet Distance Maps). Creating a "server" that can provide pairwise Internet distance information.
- Commercial tools:VisualRoute measures per-hop loss and delays. VitalSigns NetMedic uses bprobes and application-specific metrics to report network performance.
Overlay Networks
- The X-Bone Project provides a toolkit for rapid deployment of overlay network for things like IPv6.
A follow-on project,Dynabone plans to add dynamic overlay adaptation. - 6bone- IPv6 overlay
- The MBone FAQ - multicast overlay
- VNS uses overlays to provide Quality of Service.
- End-system multicast from CMU
- Yallcast is an open-source content distribution topology using a shared tree or mesh topology. [ Architecture Description]
- UUNET's Denial Of Service tracking overlay (NANOG talk).
Funding
We gratefully acknowledge funding for RON from DARPA under the Fault-Tolerant Networking (FTN) program of the ATO; it is being supported by DARPA and the Space and Naval Warfare Systems Center (SPAWAR), San Diego, under contract N66001-00-1-8933.