Marcel Rosu - Academia.edu (original) (raw)

Papers by Marcel Rosu

Research paper thumbnail of On network coprocessors for scalable, predictable media services

Abstract—This paper presents the embedded realization and experimental evaluation of a media stre... more Abstract—This paper presents the embedded realization and experimental evaluation of a media stream scheduler on Network Interface (NI) CoProcessor boards. When using media frames as scheduling units, the scheduler is able to operate in real-time on streams traversing the CoProcessor, resulting in its ability to stream video to remote clients at real-time rates. The contributions of this paper are its detailed evaluation of the effects of placing application or kernel-level functionality, like packet scheduling on NIs, rather than the host machines to which they are attached. The main benefits of such placement are 1) that traffic is eliminated from the host bus and memory subsystem, thereby allowing increased host CPU utilization for other tasks, and 2) that NI-based scheduling is immune to host-CPU loading, unlike host-based media schedulers that are easily affected even by transient load conditions. An outcome of this work is a proposed cluster architecture for building scalable ...

Research paper thumbnail of CPU Reservations and Time Constraints: Efficient, Predictable Scheduling of Independent Activities

Workstations and personal computers are increasingly being used for applications with real-time c... more Workstations and personal computers are increasingly being used for applications with real-time characteristics such as speech understanding and synthesis, media computations and I/O, and animation, often concurrently executed with traditional non-real-time workloads. This paper presents a system that can schedule multiple independent activities so that: . activities can obtain minimum guaranteed execution rates with application-specified reservation granularities via CPU Reservations, . CPU Reservations, which are of the form "reserve X units of time out of every Y units", provide not just an average case execution rate of X/Y over long periods of time, but the stronger guarantee that from any instant of time, by Y time units later, the activity will have executed for at least X time units, . applications can use Time Constraints to schedule tasks by deadlines, with on-time completion guaranteed for tasks with accepted constraints, and . both CPU Reservations and Time Con...

Research paper thumbnail of Rich Queries on Encrypted Data: Beyond Exact Matches?

Abstract. We extend the searchable symmetric encryption (SSE) protocol of [Cash et al., Crypto’13... more Abstract. We extend the searchable symmetric encryption (SSE) protocol of [Cash et al., Crypto’13] adding support for range, substring, wildcard, and phrase queries, in addition to the Boolean queries supported in the original protocol. Our techniques apply to the basic single-client scenario underlying the common SSE setting as well as to the more complex Multi-Client and Outsourced Symmetric PIR extensions of [Jarecki et al., CCS’13]. We provide performance information based on our prototype implementation, showing the practicality and scalability of our techniques to very large databases, thus extending the performance results of [Cash et al., NDSS’14] to these rich and comprehensive query types. 1

Research paper thumbnail of Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of Workstations

Parallel computing on clusters of workstations and personal computers has very high potential, si... more Parallel computing on clusters of workstations and personal computers has very high potential, since it leverages existing hardware and software. Parallel programming environments offer the user a convenient way for expressing parallel computation and communication. In fact, recently, a Message Passing Interface (MPI) has been proposed as an industrial standard for writing "portable" message-passing parallel programs. The communication part of MPI consists of the usual point-to-point communication as well as collective communication. However, existing implementations of programming environments for clusters are built on top of a point-to-point communication layer (send and receive) over local area networks (LANs) and, as a result, suffer from poor performance in the collective communication part. In this paper, we present an efficient design and implementation of the collective communication part in MPI that is optimized for clusters of workstations. Our system consists of...

Research paper thumbnail of Supporting Parallel Applications on Clusters of Workstations: The Intelligent Network Interface Approach

This paper presents a novel networking architecture designed for communication intensive parallel... more This paper presents a novel networking architecture designed for communication intensive parallel applications running on clusters of workstations (COWs) connected by high speed networks. This architecture permits (1) the transfer of selected communication-related functionality from the host machine to the network interface coprocessor, and (2) the exposure of this functionality directly to applications as instructions of aVirtual Communication Machine (VCM) implemented by the coprocessor. The user-level code interacts directly with the network coprocessor as the host kernel only 'connects' the application to the VCM and does not participate in the data transfers. The distinctive feature of our design is its flexibility: the integration of the network withthe applicationcan be varied to maximize performance. The resulting communication architecture is characterized by a very low overhead on the host processor, by latency and bandwidth close to the hardware limits, and by a...

Research paper thumbnail of A Network Co-Processor-Based Approach to Scalable Media Streaming in Servers

This paper presents the embedded construction and experimental results for a media scheduler on i... more This paper presents the embedded construction and experimental results for a media scheduler on i960 RD equipped I2O Network Interfaces (NI) used for streaming. We utilize the Distributed Virtual Communication Machine (DVCM) infrastructure developed by us which allows run-time extensions to provide scheduling for streams that may require it. The scheduling overhead of such a scheduler is 65s with the ability to stream MPEG video to remote clients at requested rates. Moreover, placement of scheduler action `close' to the network on the Network Interface (NI) allows tighter coupling of computation and communication, eliminating traffic from the host bus & memory subsystem, allowing increased host CPU utilization for other tasks without being affected by host-CPU loading. Further, This work is supported in part by the Department of Energy under the NGI program and the National Science Foundation under a grant from Divison of Advanced Networking Infrastructure and Research, by har...

Research paper thumbnail of Highly-Scalable Searchable Symmetric Encryption with Support for Boolean Queries

This work presents the design, analysis and implementation of the first sub-linear searchable sym... more This work presents the design, analysis and implementation of the first sub-linear searchable symmetric encryption (SSE) protocol that supports conjunctive search and general Boolean queries on symmetrically-encrypted data and that scales to very large data sets and arbitrarilystructured data including free text search. To date, work in this area has focused mainly on single-keyword search. For the case of conjunctive search, prior SSE constructions required work linear in the total number of documents in the database and provided good privacy only for structured attribute-value data, rendering these solutions too slow and inflexible for large practical databases. In contrast, our solution provides a realistic and practical trade-off between performance and privacy by efficiently supporting very large databases at the cost of moderate and welldefined leakage to the outsourced server (leakage is in the form of data access patterns, never as direct exposure of plaintext data or search...

Research paper thumbnail of The Effects of Wide-Area Conditions on WWW Server Performance

WWW workload generators are used to evaluate web server performance, and thus have a large impact... more WWW workload generators are used to evaluate web server performance, and thus have a large impact on what performance optimizations are applied to servers. However, current benchmarks ignore a crucial component: how these servers perform in the environment in which they are intended to be used, namely the widearea Internet.

Research paper thumbnail of A network co-processor-based approach to scalable media streaming in servers

Proceedings 2000 International Conference on Parallel Processing

This paper presents the embedded construction and experimental results for a media scheduler on i... more This paper presents the embedded construction and experimental results for a media scheduler on i960 RD equipped I2O Network Interfaces (NI) used for streaming. We utilize the Distributed Virtual Communication Machine (DVCM) infrastructure developed by us which allows run-time extensions to provide scheduling for streams that may require it. The scheduling overhead of such a scheduler is 65 s with the ability to stream MPEG video to remote clients at requested rates. Moreover, placement of scheduler action`close' to the network on the Network Interface (NI) allows tighter coupling of computation and communication, eliminating tra c from the host bus & memory subsystem, allowing increased host CPU utilization for other tasks without being a ected by host-CPU loading. Architectures to build scalable media scheduling servers are explored-by distributing media schedulers and media stream producers among NIs within a server and clustering a number of such servers using commodity hardware and software.

Research paper thumbnail of Operation securisee d'initialisation, de maintien, de mise a jour et de recuperation dans un systeme integre utilisant une fonction de controle d'acces aux donnees

Cette invention concerne des techniques permettant d'effectuer une operation securisee d'... more Cette invention concerne des techniques permettant d'effectuer une operation securisee d'initialisation, de maintien, de mise a jour et de recuperation dans un systeme integre (200). Ces techniques, qui font appel a une fonction (240) de controle de l'acces aux donnees dans le systeme integre (200), consistent a authentifier, au moyen d'un niveau actuel de logiciel, un niveau suivant de logiciel dans un systeme integre. L'authentification a lieu avant que le controle ne passe au niveau suivant de logiciel. Ensuite, une aptitude du niveau suivant de logiciel a modifier une caracteristique fonctionnelle du systeme integre peut etre selectivement limitee par la fonction (240) de controle de l'acces aux donnees. Ces techniques permettent egalement d'effectuer une operation securisee d'initialisation du systeme integre (200), de faire passer les donnees chiffrees au moyen d'un premier ensemble de cles a des donnees chiffrees au moyen d'un second en...

Research paper thumbnail of SYSTEM AND METHOD TO OPTIMIZE WEBDIENSTEKOMMUNIKATION based on the history

Research paper thumbnail of Proxy for Low-Power Operation of Wireless Clients

LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and wi... more LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , payment of royalties). Copies may be requested from IBM T. J. Wireless LAN (WLAN) interfaces are responsible for a significant fraction of the total energy consumed by a large class of mobile client devices. WLAN interfaces have power saving features which enable substantial energy reductions during long idle intervals. However, during periods of activity, when data transfers occur, the nature of the network traffic is hard to predict and this prevents the ne...

Research paper thumbnail of Method and device for decentralized determination of client and access point settings in a wireless LAN

A device for supporting the network communication data for at least one client that communicates ... more A device for supporting the network communication data for at least one client that communicates with a wireless access point and wherein the wireless interface of the client switches between at least two energy states, the apparatus comprising subsequently: an arrangement to determine the configuration of the client for the change between the energy states, the arrangement being adapted to perform at least one of the following (a), (b) and (c): (A) to determine the time of receiving an initial signaling by: - Send an examination data packet; - measuring a response delay of a client after the transmission of examination data packet; and - Proceed with the measurement of response delays of a client after the transmission of additional examination data packets until the response delay substantially equal to a previously determined minimum round-trip time, (B) to determine at least an interval are received in the successive signaling from the client by: - Send an examination data packe...

Research paper thumbnail of Communication support for cluster computing

For many parallel-computing tasks, clusters of workstations provide performance comparable to par... more For many parallel-computing tasks, clusters of workstations provide performance comparable to parallel machines. At the same time, clusters are cheaper and easier to reconfigure. However, for applications to run successfully on clusters, they must be able to tolerate communication latencies larger than in parallel machines, and they should not require message rates and data transfer bandwidths higher than those available from the cluster interconnect. This thesis addresses a fundamental limit to the improvement of cluster technologies: the achievable latencies and message rates are limited by the performance of the workstation I/O buses which must be used to interface with the cluster interconnect. We propose to overcome the I/O bus limitations by using an extensible communication architecture. This architecture, called the Distributed Virtual Communication Machine (DVCM), requires the cluster workstations to be equipped with intelligent network interface (NI) cards. The architectur...

Research paper thumbnail of Supporting Information Transfer During Organizational Changes

Research paper thumbnail of Differentiated connectivity in a pay-per-use public data access system

Research paper thumbnail of Cpu reservations: efficient predictable scheduling of independent activities

ACM Symposium on Operating Systems Principles, 1997

Research paper thumbnail of Method and System for Sharing a User-Medical-Record

Research paper thumbnail of Filesystem management and security system

Research paper thumbnail of Manipulating Results of a Media Archive Search

Research paper thumbnail of On network coprocessors for scalable, predictable media services

Abstract—This paper presents the embedded realization and experimental evaluation of a media stre... more Abstract—This paper presents the embedded realization and experimental evaluation of a media stream scheduler on Network Interface (NI) CoProcessor boards. When using media frames as scheduling units, the scheduler is able to operate in real-time on streams traversing the CoProcessor, resulting in its ability to stream video to remote clients at real-time rates. The contributions of this paper are its detailed evaluation of the effects of placing application or kernel-level functionality, like packet scheduling on NIs, rather than the host machines to which they are attached. The main benefits of such placement are 1) that traffic is eliminated from the host bus and memory subsystem, thereby allowing increased host CPU utilization for other tasks, and 2) that NI-based scheduling is immune to host-CPU loading, unlike host-based media schedulers that are easily affected even by transient load conditions. An outcome of this work is a proposed cluster architecture for building scalable ...

Research paper thumbnail of CPU Reservations and Time Constraints: Efficient, Predictable Scheduling of Independent Activities

Workstations and personal computers are increasingly being used for applications with real-time c... more Workstations and personal computers are increasingly being used for applications with real-time characteristics such as speech understanding and synthesis, media computations and I/O, and animation, often concurrently executed with traditional non-real-time workloads. This paper presents a system that can schedule multiple independent activities so that: . activities can obtain minimum guaranteed execution rates with application-specified reservation granularities via CPU Reservations, . CPU Reservations, which are of the form "reserve X units of time out of every Y units", provide not just an average case execution rate of X/Y over long periods of time, but the stronger guarantee that from any instant of time, by Y time units later, the activity will have executed for at least X time units, . applications can use Time Constraints to schedule tasks by deadlines, with on-time completion guaranteed for tasks with accepted constraints, and . both CPU Reservations and Time Con...

Research paper thumbnail of Rich Queries on Encrypted Data: Beyond Exact Matches?

Abstract. We extend the searchable symmetric encryption (SSE) protocol of [Cash et al., Crypto’13... more Abstract. We extend the searchable symmetric encryption (SSE) protocol of [Cash et al., Crypto’13] adding support for range, substring, wildcard, and phrase queries, in addition to the Boolean queries supported in the original protocol. Our techniques apply to the basic single-client scenario underlying the common SSE setting as well as to the more complex Multi-Client and Outsourced Symmetric PIR extensions of [Jarecki et al., CCS’13]. We provide performance information based on our prototype implementation, showing the practicality and scalability of our techniques to very large databases, thus extending the performance results of [Cash et al., NDSS’14] to these rich and comprehensive query types. 1

Research paper thumbnail of Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of Workstations

Parallel computing on clusters of workstations and personal computers has very high potential, si... more Parallel computing on clusters of workstations and personal computers has very high potential, since it leverages existing hardware and software. Parallel programming environments offer the user a convenient way for expressing parallel computation and communication. In fact, recently, a Message Passing Interface (MPI) has been proposed as an industrial standard for writing "portable" message-passing parallel programs. The communication part of MPI consists of the usual point-to-point communication as well as collective communication. However, existing implementations of programming environments for clusters are built on top of a point-to-point communication layer (send and receive) over local area networks (LANs) and, as a result, suffer from poor performance in the collective communication part. In this paper, we present an efficient design and implementation of the collective communication part in MPI that is optimized for clusters of workstations. Our system consists of...

Research paper thumbnail of Supporting Parallel Applications on Clusters of Workstations: The Intelligent Network Interface Approach

This paper presents a novel networking architecture designed for communication intensive parallel... more This paper presents a novel networking architecture designed for communication intensive parallel applications running on clusters of workstations (COWs) connected by high speed networks. This architecture permits (1) the transfer of selected communication-related functionality from the host machine to the network interface coprocessor, and (2) the exposure of this functionality directly to applications as instructions of aVirtual Communication Machine (VCM) implemented by the coprocessor. The user-level code interacts directly with the network coprocessor as the host kernel only 'connects' the application to the VCM and does not participate in the data transfers. The distinctive feature of our design is its flexibility: the integration of the network withthe applicationcan be varied to maximize performance. The resulting communication architecture is characterized by a very low overhead on the host processor, by latency and bandwidth close to the hardware limits, and by a...

Research paper thumbnail of A Network Co-Processor-Based Approach to Scalable Media Streaming in Servers

This paper presents the embedded construction and experimental results for a media scheduler on i... more This paper presents the embedded construction and experimental results for a media scheduler on i960 RD equipped I2O Network Interfaces (NI) used for streaming. We utilize the Distributed Virtual Communication Machine (DVCM) infrastructure developed by us which allows run-time extensions to provide scheduling for streams that may require it. The scheduling overhead of such a scheduler is 65s with the ability to stream MPEG video to remote clients at requested rates. Moreover, placement of scheduler action `close' to the network on the Network Interface (NI) allows tighter coupling of computation and communication, eliminating traffic from the host bus & memory subsystem, allowing increased host CPU utilization for other tasks without being affected by host-CPU loading. Further, This work is supported in part by the Department of Energy under the NGI program and the National Science Foundation under a grant from Divison of Advanced Networking Infrastructure and Research, by har...

Research paper thumbnail of Highly-Scalable Searchable Symmetric Encryption with Support for Boolean Queries

This work presents the design, analysis and implementation of the first sub-linear searchable sym... more This work presents the design, analysis and implementation of the first sub-linear searchable symmetric encryption (SSE) protocol that supports conjunctive search and general Boolean queries on symmetrically-encrypted data and that scales to very large data sets and arbitrarilystructured data including free text search. To date, work in this area has focused mainly on single-keyword search. For the case of conjunctive search, prior SSE constructions required work linear in the total number of documents in the database and provided good privacy only for structured attribute-value data, rendering these solutions too slow and inflexible for large practical databases. In contrast, our solution provides a realistic and practical trade-off between performance and privacy by efficiently supporting very large databases at the cost of moderate and welldefined leakage to the outsourced server (leakage is in the form of data access patterns, never as direct exposure of plaintext data or search...

Research paper thumbnail of The Effects of Wide-Area Conditions on WWW Server Performance

WWW workload generators are used to evaluate web server performance, and thus have a large impact... more WWW workload generators are used to evaluate web server performance, and thus have a large impact on what performance optimizations are applied to servers. However, current benchmarks ignore a crucial component: how these servers perform in the environment in which they are intended to be used, namely the widearea Internet.

Research paper thumbnail of A network co-processor-based approach to scalable media streaming in servers

Proceedings 2000 International Conference on Parallel Processing

This paper presents the embedded construction and experimental results for a media scheduler on i... more This paper presents the embedded construction and experimental results for a media scheduler on i960 RD equipped I2O Network Interfaces (NI) used for streaming. We utilize the Distributed Virtual Communication Machine (DVCM) infrastructure developed by us which allows run-time extensions to provide scheduling for streams that may require it. The scheduling overhead of such a scheduler is 65 s with the ability to stream MPEG video to remote clients at requested rates. Moreover, placement of scheduler action`close' to the network on the Network Interface (NI) allows tighter coupling of computation and communication, eliminating tra c from the host bus & memory subsystem, allowing increased host CPU utilization for other tasks without being a ected by host-CPU loading. Architectures to build scalable media scheduling servers are explored-by distributing media schedulers and media stream producers among NIs within a server and clustering a number of such servers using commodity hardware and software.

Research paper thumbnail of Operation securisee d'initialisation, de maintien, de mise a jour et de recuperation dans un systeme integre utilisant une fonction de controle d'acces aux donnees

Cette invention concerne des techniques permettant d'effectuer une operation securisee d'... more Cette invention concerne des techniques permettant d'effectuer une operation securisee d'initialisation, de maintien, de mise a jour et de recuperation dans un systeme integre (200). Ces techniques, qui font appel a une fonction (240) de controle de l'acces aux donnees dans le systeme integre (200), consistent a authentifier, au moyen d'un niveau actuel de logiciel, un niveau suivant de logiciel dans un systeme integre. L'authentification a lieu avant que le controle ne passe au niveau suivant de logiciel. Ensuite, une aptitude du niveau suivant de logiciel a modifier une caracteristique fonctionnelle du systeme integre peut etre selectivement limitee par la fonction (240) de controle de l'acces aux donnees. Ces techniques permettent egalement d'effectuer une operation securisee d'initialisation du systeme integre (200), de faire passer les donnees chiffrees au moyen d'un premier ensemble de cles a des donnees chiffrees au moyen d'un second en...

Research paper thumbnail of SYSTEM AND METHOD TO OPTIMIZE WEBDIENSTEKOMMUNIKATION based on the history

Research paper thumbnail of Proxy for Low-Power Operation of Wireless Clients

LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and wi... more LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , payment of royalties). Copies may be requested from IBM T. J. Wireless LAN (WLAN) interfaces are responsible for a significant fraction of the total energy consumed by a large class of mobile client devices. WLAN interfaces have power saving features which enable substantial energy reductions during long idle intervals. However, during periods of activity, when data transfers occur, the nature of the network traffic is hard to predict and this prevents the ne...

Research paper thumbnail of Method and device for decentralized determination of client and access point settings in a wireless LAN

A device for supporting the network communication data for at least one client that communicates ... more A device for supporting the network communication data for at least one client that communicates with a wireless access point and wherein the wireless interface of the client switches between at least two energy states, the apparatus comprising subsequently: an arrangement to determine the configuration of the client for the change between the energy states, the arrangement being adapted to perform at least one of the following (a), (b) and (c): (A) to determine the time of receiving an initial signaling by: - Send an examination data packet; - measuring a response delay of a client after the transmission of examination data packet; and - Proceed with the measurement of response delays of a client after the transmission of additional examination data packets until the response delay substantially equal to a previously determined minimum round-trip time, (B) to determine at least an interval are received in the successive signaling from the client by: - Send an examination data packe...

Research paper thumbnail of Communication support for cluster computing

For many parallel-computing tasks, clusters of workstations provide performance comparable to par... more For many parallel-computing tasks, clusters of workstations provide performance comparable to parallel machines. At the same time, clusters are cheaper and easier to reconfigure. However, for applications to run successfully on clusters, they must be able to tolerate communication latencies larger than in parallel machines, and they should not require message rates and data transfer bandwidths higher than those available from the cluster interconnect. This thesis addresses a fundamental limit to the improvement of cluster technologies: the achievable latencies and message rates are limited by the performance of the workstation I/O buses which must be used to interface with the cluster interconnect. We propose to overcome the I/O bus limitations by using an extensible communication architecture. This architecture, called the Distributed Virtual Communication Machine (DVCM), requires the cluster workstations to be equipped with intelligent network interface (NI) cards. The architectur...

Research paper thumbnail of Supporting Information Transfer During Organizational Changes

Research paper thumbnail of Differentiated connectivity in a pay-per-use public data access system

Research paper thumbnail of Cpu reservations: efficient predictable scheduling of independent activities

ACM Symposium on Operating Systems Principles, 1997

Research paper thumbnail of Method and System for Sharing a User-Medical-Record

Research paper thumbnail of Filesystem management and security system

Research paper thumbnail of Manipulating Results of a Media Archive Search