Modularity Meets Batching: Towards an Experimental Platform for High-speed Software Routers (original) (raw)

Forwarding path architectures for multicore software routers

Proceedings of the Workshop on Programmable Routers for Extensible Services of Tomorrow - PRESTO '10, 2010

Multi-core CPUs, along with recent advances in memory and buses, render commodity hardware a strong candidate for building flexible and high-performance software routers. With a forwarding plane physically composed of many packet processing components and operations, resource allocation in multi-core systems is not trivial. Indeed, packets crossing cache hierarchies degrade forwarding performance, since the bottleneck is main memory access. Therefore, forwarding path allocation and input/output processing become challenging, especially when states and data structures have to be shared among multiple cores. In this context, we investigate a set of input/output processing architectures, as well as resource allocation strategies for forwarding paths. For each packet processing operation, we uncover the gains and possible implications by either running different components concurrently or replicating the same components across different cores. 1

The power of batching in the Click modular router

2012

Abstract The Click modular router has been one of the most popular software router platforms for rapid prototyping and new protocol development. Unfortunately, its internal architecture has not caught up with recent hardware advancements, and the performance remains sub-optimal in high-speed networks despite its benefit of flexible module composition. In this work, we identify the performance bottlenecks of the existing Click router and extend it to scale with modern computer systems. Our improvements focus on both I/O ...

Improved Forwarding Architecture and Resource Management for Multi-Core Software Routers

2009 Sixth IFIP International Conference on Network and Parallel Computing, 2009

Recent technological advances in commodity server architectures, with multiple multi-core CPUs, integrated memory controllers, high-speed interconnects and enhanced network interface cards, provide substantial computational capacity and thus an attractive platform for packet forwarding. However, to exploit this available capacity, we need a suitable software platform that allows effective parallel packet processing and resource management. In this paper, we at first introduce an improved forwarding architecture for software routers that enhances parallelism by exploiting hardware classification and multi-queue support, already available in recent commodity network interface cards. After evaluating the original scheduling algorithm of the widely-used Click modular router, we propose solutions for extending this scheduler for improved fairness, throughput and more precise resource management. To illustrate the potential benefits of our proposal, we implement and evaluate a few key elements of our overall design.

Pipelining router design improves parallel system performance

Filtration Industry Analyst, 2000

Efficient communication on fetching remote data is a critical parameter in distributed shared-memory multiprocessors (DSM) in order to achieve high performance. Message passing techniques are used in many modern communication systems and routers are essential building blocks for these communication systems. Hence, in this paper emphasis is placed on the design of routers for 2-ary n-cube networks. Based on a simple deadlock free algorithm, we analyze the influence of the router structure. To be more precise, the parameters considered were the clock frequency and the number of pipeline stages of the router. The performance evaluation for DSM applications shows there are significant gains in using segmented routers designs, in our evaluations, results show an improvement of up to 12% in the execution time of some applications. This improvement occurs even though the base latency of the router has increased by 40%

Towards high performance virtual routers on commodity hardware

Proceedings of the 2008 ACM CoNEXT Conference on - CONEXT '08, 2008

Modern commodity hardware architectures, with their multiple multi-core CPUs and high-speed system interconnects, exhibit tremendous power. In this paper, we study performance limitations when building both software routers and software virtual routers on such systems. We show that the fundamental performance bottleneck is currently the memory system, and that through careful mapping of tasks to CPU cores, we can achieve forwarding rates of 7 million minimum-sized packets per second on mid-range server-class systems, thus demonstrating the viability of software routers. We also find that current virtualisation systems, when used to provide forwarding engine virtualisation, yield aggregate performance equivalent to that of a single software router, a tenfold improvement on current virtual router platform performance. Finally, we identify principles for the construction of high-performance software router systems on commodity hardware, including full router virtualisation support.

Towards Low Latency Software Routers

Journal of Networks, 2015

Network devices based on commodity hardware are capable of high-speed packet processing while maintaining the programmability and extensibility of software. Thus, software-based network devices, like software routers, software-based firewalls, or monitoring systems, constitute a cost-efficient and flexible alternative to expensive, special purpose hardware. The overall packet processing performance in resource-constrained nodes can be strongly increased through parallel processing based on off-theshelf multi-core processors. However, synchronization and coordination of parallel processing may counteract the corresponding network node performance. We describe how multi-core software routers can be optimized for real-time traffic by utilizing the technologies available in commodity hardware. Furthermore, we propose a low latency extension for the Linux NAPI. For the analysis, we use our approach for modeling resource contention in resource-constrained nodes which is also implemented as a resource-management extension module for ns-3. Based on that, we derive a QoSaware software router model which we use to evaluate our performance optimizations. Our case study shows that the different scheduling strategies of a software router have significant influence on the performance of handling realtime traffic.

A Scalable High-performance Router Platform Supporting Dynamic Service Extensibility On Network and Host Processors

Industrial and Commercial Power Systems Technical Conference, Annual Meeting, 2004

Abstract: Emerging network services such as transcodingand encryption need application-specific handling of data streamswithin the network, thus requiring enormous computationalcapabilities on routers to process packets at link speed. Recentlyappeared Network Processors (NPs) are able to significantlyincrease the available processing capacities on a router by achip-multi-processor architecture. Embedded within the networkinterface card, NPs provide several code-extensible processorswith different ...

A Universal, Dynamically Adaptable and Programmable Network Router for Parallel Computers

Existing message-passing parallel computers employ routers designed for a specific interconnection network and deal with fixed data channel width. There are disadvantages to this approach, because the system design and development times are significant and these routers do not permit run time network reconfiguration. Changes in the topology of the network may be required for better performance or fault-tolerance. In this paper, we introduce a class of high- performance universal (statically and dynamically adaptable) programmable routers (UPRs) for message-passing parallel computers. The universality of these routers is based on their capability to adapt at run and/or static times according to the characteristics of the systems and/or applications. More specifically, the number of bidirectional data channels, the channel size and the I/O port mappings (for the implementation of a particular topology) can change dynamically and statically. Our research focuses on system-level specifi...

Modular router architecture for high-performance interconnection networks

Tehnicki vjesnik - Technical Gazette

Original scientific article High performance routers are fundamental building blocks of the system wide interconnection networks for high performance computing systems. Through collective interaction they provide reliable communication between the computing nodes and manage the communicational dataflow. The development process of specialized router architecture has high complexity and it requires many factors to be considered. The architecture of the highperformance routers is highly dependent on the flow control mechanism, as it dictates the way in which the packets are transferred through the network. In this paper novel high-performance "Step-Back-On-Blocking" router architecture has been proposed.

Building a robust software-based router using network processors

Operating Systems Review, 2001

Recent efforts to add new services to the Internet have increased interest in software-based routers that are easy to extend and evolve. This paper describes our experiences using emerging network processors--in particular, the Intel IXP1200~to implement a router. We show it is possible to combine an IXP1200 development board and a PC to build an inexpensive router that forwards minimumsized packets at a rate of 3.47Mpps. This is nearly an order of magnitude faster than existing pure PC-based routers, and sufficient to support 1.77Gbps of aggregate link bandwidth. At lesser aggregate line speeds, our design also allows the excess resources available on the IXP1200 to be used robustly for extra packet processing. For example, with 8 x 100Mbps links, 240 register operations and 96 bytes of state storage are available for each 64-byte packet. Using a hierarchical architecture we can guarantee line-speed forwarding rates for simple packets with the IXP1200, and still have extra capacity to process exceptional packets with the Pentium. Up to 310Kpps of the traffic can be routed through the Pentium to receive 1510 cycles of extra per-packet processing.