Matthew Hertz | Canisius College (original) (raw)

Papers by Matthew Hertz

Proceeding of the 44th ACM technical symposium on Computer science education - SIGCSE '13, 2013

Proceedings of the 4th international symposium on Memory management - ISMM '04, 2004

Heap size has a huge impact on the performance of garbage collected applications. A heap that bar... more Heap size has a huge impact on the performance of garbage collected applications. A heap that barely meets the application's needs causes excessive GC overhead, while a heap that exceeds physical memory induces paging. Choosing the best heap size a priori is impossible in multiprogrammed environments, where physical memory allocations to processes change constantly. We present an automatic heap-sizing algorithm applicable to different garbage collectors with only modest changes. It relies on an analytical model and on detailed information from the virtual memory manager. The model characterizes the relation between collection algorithm, heap size, and footprint. The virtual memory manager tracks recent reference behavior, reporting the current footprint and allocation to the collector. The collector uses those values as inputs to its model to compute a heap size that maximizes throughput while minimizing paging. We show that our adaptive heap sizing algorithm can substantially reduce running time over fixed-sized heaps. total allocation of M pages, we define c § © max c M 2 , h § min h M 2 , and f i i h c .

Proceeding of the 44th ACM technical symposium on Computer science education - SIGCSE '13, 2013

Instructors of the introductory computer science courses, commonly called "CS1" and "CS2", face a... more Instructors of the introductory computer science courses, commonly called "CS1" and "CS2", face a large number of choices when designing their classes. Instructors have available to them a multitude of ways to explain each topic as well as course-wide choices such as objects-first or objects-late or using a functional or procedural language. Understanding how these options can affect student learning would help simplify these decisions. Unfortunately, just comparing how well students perform may not be accurate as it ignores the many confounding factors that could also have made a difference. To get beyond that problem, this study investigates underlying factors that affect student learning.

Journal of Computing Sciences in Colleges, Jun 1, 2012

This work investigates which introductory topics are the most difficult to teach and sheds some l... more This work investigates which introductory topics are the most difficult to teach and sheds some light on factors that improve student learning. Unlike past studies [1], this uses the results of a survey of instructors of introductory computer science courses (commonly called" CS1" and" CS2"). The survey asked about the instructional time spent and importance placed on a range on commonly taught concepts. For each concept, the survey also asked instructors to rate their students' level of mastery at the course's end. Using ...

Proceeding of the 44th ACM technical symposium on Computer science education - SIGCSE '13, 2013

ABSTRACT Students in introductory programming courses struggle with building the mental models th... more ABSTRACT Students in introductory programming courses struggle with building the mental models that correctly describe concepts such as variables, subroutine calls, and dynamic memory usage. This struggle leads to lowered student learning outcomes and, it has been argued, the high failure and dropout rates commonly seen in these courses. We will show that accurately modeling what is occurring in memory and requiring students to trace code using this model improves student performance and increases retention. This paper presents the results of an experiment in which introductory programming courses were organized around code tracing. We present program memory traces, a new approach for tracing code that models what occurs in memory as a program executes. We use these traces to drive our lectures and to act as key pieces of our active learning activities. We report the results of student surveys showing that instructor tracing was rated as the most valuable piece of the course and students' overwhelming agreement on the importance of the tracing activities for their learning. Finally, we demonstrate that trace-based teaching led to statistically significant improvements student grades, decreased drop and failure rates, and an improvement in students' programming abilities.

… of Information Technology in the Era of …, 2002

While the design of garbage collection algorithms has come of age, the analysis of these algorith... more While the design of garbage collection algorithms has come of age, the analysis of these algorithms is still in its infancy. Current analyses are limited to merely documenting costs of individual collector executions; conclusive results, measuring across entire programs, require a theoretical foundation from which proofs can be offered. A theoretical foundation also allows abstract examination of garbage collection, enabling new designs without worrying about implementation details. We propose a theoretical framework for analyzing garbage collection algorithms and show how our framework could compute the efficiency (time cost) of garbage collectors. The central novelty of our proposed framework is its capacity to analyze costs of garbage collection over an entire program execution.

Sigplan Notices, 2011

While a conventional program uses exactly as much memory as it needs, the memory use of a garbage... more While a conventional program uses exactly as much memory as it needs, the memory use of a garbage-collected program can be adjusted by changing the size of the heap used by the garbage collector. This difference can allow applications to adapt their memory demands in response to the changing amount of available memory in a shared environment, which is increasingly important for today's multicore, multiprocessor machines.

While garbage collection's software engineering benefits are indisputable, its performance impact... more While garbage collection's software engineering benefits are indisputable, its performance impact remains controversial. Garbage collection proponents argue that its benefits outweigh its costs, but it is widely believed that garbage collection imposes an unacceptably high runtime and space performance penalty. This paper aims to settle this debate. We present the first empirical comparison of the performance costs of automatic versus explicit memory management in a garbage-collected language. Using a tracing and simulation based oracular memory manager, we execute unaltered Java programs as if they used explicit memory management. We examine the runtime, space consumption and virtual memory footprint of Java benchmarks across a range of general-purpose allocators and both copying and non-copying garbage collectors. We show that, at large heap sizes and under no memory pressure, the runtime performance of some garbage collection algorithms is competitive with the Lea memory allocator and occasionally outperforms it by up to 4%. However, our results confirm that garbage collection requires six times the physical memory to achieve this performance and suffers order-of-magnitude performance penalties when paging occurs.

Limiting the amount of memory available to a program can hamstring its performance, however in a ... more Limiting the amount of memory available to a program can hamstring its performance, however in a garbage collected environment allowing too large of a heap size can also be detrimental. Because garbage collection will occasionally access the entire heap, having a significant amount of virtual memory becomes expensive. Determining the appropriate size for a program's heap is not only important, but difficult in light of various virtual machines, operating systems, and levels of multi-programming with which the program may be run.

Thirty-one years ago, the ACM Computing Curricula used the terms "CS1" and "CS2" to designate the... more Thirty-one years ago, the ACM Computing Curricula used the terms "CS1" and "CS2" to designate the first two two courses in the introductory sequence of a computer science major. While computer science education has greatly changed since that time, we still refer to introduction to programming courses as CS1 and basic data structures courses as CS2. This common shorthand is then used to enable students to transfer between institutions and as a base of many research studies.

ACM Transactions on Programming Languages and Systems, 2007

Pretenuring can reduce copying costs in garbage collectors by allocating long-lived objects into ... more Pretenuring can reduce copying costs in garbage collectors by allocating long-lived objects into regions that the garbage collector will rarely, if ever, collect. We extend previous work on pretenuring as follows. We produce pretenuring advice that is neutral with respect to the garbage collector algorithm and configuration. We thus can and do combine advice from different applications. We find for our benchmarks that predictions using object lifetimes at each allocation site in Java programs are accurate, which simplifies the pretenuring implementation.

ACM Transactions on Programming Languages and Systems, 2006

Programmers are writing a rapidly growing number of programs in object-oriented languages, such a... more Programmers are writing a rapidly growing number of programs in object-oriented languages, such as Java and C#, that require garbage collection. Garbage collection traces and simulation speed up research by enabling deeper understandings of object lifetime behavior and quick exploration and design of new garbage collection algorithms. When generating perfect traces, the brute-force method of computing object lifetimes requires a whole-heap garbage collection at every potential collection point in the program. Because this process is prohibitively expensive, researchers often use granulated traces by collecting only periodically, for example, every 32 KB of allocation.

Proceedings of the 5th …, 2006

Most application's performance is impacted by the amount of available memory. In a traditional ap... more Most application's performance is impacted by the amount of available memory. In a traditional application, which has a fixed working set size, increasing memory has a beneficial effect up until the application's working set is met. In the presence of garbage collection this relationship becomes more complex. While increasing the size of the program's heap reduces the frequency of collections, collecting a heap with memory paged to the backing store is very expensive. We first demonstrate the presence of an optimal heap size for a number of applications running on a machine with a specific configuration. We then introduce a scheme which adaptively finds this good heap size. In this scheme, we track the memory usage and number of page faults at a program's phase boundaries. Using this information, the system selects the soft heap size. By adapting itself dynamically, our scheme is independent of the underlying main memory size, code optimizations, and garbage collection algorithm. We present several experiments on real applications to show the effectiveness of our approach. Our results show that program-level heap control provides up to a factor of 7.8 overall speedup versus using the best possible fixed heap size controlled by the virtual machine on identical garbage collectors.

Sigplan Notices, 2003

Until recently, the best performing copying garbage collectors used a generational policy which r... more Until recently, the best performing copying garbage collectors used a generational policy which repeatedly collects the very youngest objects, copies any survivors to an older space, and then infrequently collects the older space. A previous study that used garbagecollection simulation pointed to potential improvements by using an Older-First copying garbage collection algorithm. The Older-First algorithm sweeps a fixed-sized window through the heap from older to younger objects, and avoids copying the very youngest objects which have not yet had sufficient time to die. We describe and examine here an implementation of the Older-First algorithm in the Jikes RVM for Java. This investigation shows that Older-First can perform as well as the simulation results suggested, and greatly improves total program performance when compared to using a fixedsize nursery generational collector. We further compare Older-First to a flexible-size nursery generational collector in which the nursery occupies all of the heap that does not contain older objects. In these comparisons, the flexible-nursery collector is occasionally the better of the two, but on average the Older-First collector performs the best.

Proceedings of the 4th …, 2004

Sigmetrics Performance Evaluation Review, 2002

Programmers are writing a large and rapidly growing number of programs in object-oriented languag... more Programmers are writing a large and rapidly growing number of programs in object-oriented languages such as Java that require garbage collection (GC). To explore the design and evaluation of GC algorithms quickly, researchers are using simulation based on traces of object allocation and lifetime behavior. The brute force method generates perfect traces using a whole-heap GC at every potential GC point in the program. Because this process is prohibitively expensive, researchers often use granulated traces by collecting only periodically, e.g., every 32K bytes of allocation.

Sigplan Notices, 2005

Garbage collection offers numerous software engineering advantages, but interacts poorly with vir... more Garbage collection offers numerous software engineering advantages, but interacts poorly with virtual memory managers. Existing garbage collectors require far more pages than the application's working set and touch pages without regard to which ones are in memory, especially during full-heap garbage collection. The resulting paging can cause throughput to plummet and pause times to spike up to seconds or even minutes. We present a garbage collector that avoids paging. This bookmarking collector cooperates with the virtual memory manager to guide its eviction decisions. Using summary information ("bookmarks") recorded from evicted pages, the collector can perform in-memory full-heap collections. In the absence of memory pressure, the bookmarking collector matches the throughput of the best collector we tested while running in smaller heaps. In the face of memory pressure, it improves throughput by up to a factor of five and reduces pause times by up to a factor of 45 over the next best collector. Compared to a collector that consistently provides high throughput (generational mark-sweep), the bookmarking collector reduces pause times by up to 218x and improves throughput by up to 41x. Bookmarking collection thus provides greater utilization of available physical memory than other collectors while matching or exceeding their throughput.

Sigplan Notices, 2005

Garbage collection yields numerous software engineering benefits, but its quantitative impact on ... more Garbage collection yields numerous software engineering benefits, but its quantitative impact on performance remains elusive. One can compare the cost of conservative garbage collection to explicit memory management in C/C++ programs by linking in an appropriate collector. This kind of direct comparison is not possible for languages designed for garbage collection (e.g., Java), because programs in these languages naturally do not contain calls to free. Thus, the actual gap between the time and space performance of explicit memory management and precise, copying garbage collection remains unknown.

Proceeding of the 44th ACM technical symposium on Computer science education - SIGCSE '13, 2013

Proceedings of the 4th international symposium on Memory management - ISMM '04, 2004

Proceeding of the 44th ACM technical symposium on Computer science education - SIGCSE '13, 2013

Journal of Computing Sciences in Colleges, Jun 1, 2012

Proceeding of the 44th ACM technical symposium on Computer science education - SIGCSE '13, 2013

… of Information Technology in the Era of …, 2002

Sigplan Notices, 2011

ACM Transactions on Programming Languages and Systems, 2007

ACM Transactions on Programming Languages and Systems, 2006

Proceedings of the 5th …, 2006

Sigplan Notices, 2003

Proceedings of the 4th …, 2004

Sigmetrics Performance Evaluation Review, 2002

Sigplan Notices, 2005