Andreea Anghel - Academia.edu (original) (raw)

Papers by Andreea Anghel

arXiv (Cornell University), Jun 17, 2020

Modern gradient boosting software frameworks, such as XGBoost and LightGBM, implement Newton desc... more Modern gradient boosting software frameworks, such as XGBoost and LightGBM, implement Newton descent in a functional space. At each boosting iteration, their goal is to find the base hypothesis, selected from some base hypothesis class, that is closest to the Newton descent direction in a Euclidean sense. Typically, the base hypothesis class is fixed to be all binary decision trees up to a given depth. In this work, we study a Heterogeneous Newton Boosting Machine (HNBM) in which the base hypothesis class may vary across boosting iterations. Specifically, at each boosting iteration, the base hypothesis class is chosen, from a fixed set of subclasses, by sampling from a probability distribution. We derive a global linear convergence rate for the HNBM under certain assumptions, and show that it agrees with existing rates for Newton's method when the Newton direction can be perfectly fitted by the base hypothesis at each boosting iteration. We then describe a particular realization of a HNBM, SnapBoost, that, at each boosting iteration, randomly selects between either a decision tree of variable depth or a linear regressor with random Fourier features. We describe how SnapBoost is implemented, with a focus on the training complexity. Finally, we present experimental results, using OpenML and Kaggle datasets, that show that SnapBoost is able to achieve better generalization loss than competing boosting frameworks, without taking significantly longer to tune. * equal contribution. Preprint. Under review.

arXiv (Cornell University), Mar 16, 2018

We describe a new software framework for fast training of generalized linear models. The framewor... more We describe a new software framework for fast training of generalized linear models. The framework, named Snap Machine Learning (Snap ML), combines recent advances in machine learning systems and algorithms in a nested manner to reflect the hierarchical architecture of modern computing systems. We prove theoretically that such a hierarchical system can accelerate training in distributed environments where intra-node communication is cheaper than inter-node communication. Additionally, we provide a review of the implementation of Snap ML in terms of GPU acceleration, pipelining, communication patterns and software architecture, highlighting aspects that were critical for achieving high performance. We evaluate the performance of Snap ML in both single-node and multi-node environments, quantifying the benefit of the hierarchical scheme and the data streaming functionality, and comparing with other widely-used machine learning software frameworks. Finally, we present a logistic regression benchmark on the Criteo Terabyte Click Logs dataset and show that Snap ML achieves the same test loss an order of magnitude faster than any of the previously reported results, including those obtained using TensorFlow and scikit-learn.

2022 IEEE 15th International Conference on Cloud Computing (CLOUD)

Multi-cloud computing has become increasingly popular with enterprises looking to avoid vendor lo... more Multi-cloud computing has become increasingly popular with enterprises looking to avoid vendor lock-in. While most cloud providers offer similar functionality, they may differ significantly in terms of performance and/or cost. A customer looking to benefit from such differences will naturally want to solve the multi-cloud configuration problem: given a workload, which cloud provider should be chosen and how should its nodes be configured in order to minimize runtime or cost? In this work, we consider possible solutions to this multi-cloud optimization problem. We develop and evaluate possible adaptations of stateof-the-art cloud configuration solutions to the multi-cloud domain. Furthermore, we identify an analogy between multi-cloud configuration and the selection-configuration problems that are commonly studied in the automated machine learning (AutoML) field. Inspired by this connection, we utilize popular optimizers from AutoML to solve multi-cloud configuration. Finally, we propose a new algorithm for solving multi-cloud configuration, CloudBandit (CB). It treats the outer problem of cloud provider selection as a best-arm identification problem, in which each arm pull corresponds to running an arbitrary black-box optimizer on the inner problem of node configuration. Our extensive experiments indicate that (a) many state-of-the-art cloud configuration solutions can be adapted to multi-cloud, with best results obtained for adaptations which utilize the hierarchical structure of the multi-cloud configuration domain, (b) hierarchical methods from AutoML can be used for the multi-cloud configuration task and can outperform state-of-the-art cloud configuration solutions and (c) CB achieves competitive or lower regret relative to other tested algorithms, whilst also identifying configurations that have 65% lower median cost and 20% lower median time in production, compared to choosing a random provider and configuration.

Proceedings of the 3rd Workshop on Data Center - Converged and Virtual Ethernet Switching, Sep 9, 2011

Page 1. Cross-Layer Flow and Congestion Control for Datacenter Networks (Invited Paper) Andreea S... more Page 1. Cross-Layer Flow and Congestion Control for Datacenter Networks (Invited Paper) Andreea S. Anghel, Robert Birke, Daniel Crisan and Mitchell Gusat IBM Research, Zürich Research Laboratory Säumerstrasse 4, CH ...

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, 2017

The design of an energy-efficient memory subsystem is one of the key issues that system architect... more The design of an energy-efficient memory subsystem is one of the key issues that system architects face today. To achieve this goal, architects usually rely on system simulators and trace-based DRAM power models. However, their long execution time makes the approach infeasible for the designspace exploration of next-generation exascale computing systems. Analytic models, in contrast, are orders of magnitude faster. In this paper, we propose a new analytic memory-scheduleragnostic power model for DRAM, henceforth referred to as MeSAP. Similarly to state-of-the-art trace-based approaches, our analytic model achieves an average error of 20%, while being an order of magnitude faster. Furthermore, we integrate MeSAP into an analytic performance model of general-purpose processors and show its applicability to the design of a computing system targeting scientific image processing applications.

Ovidius University Annals of Chemistry, 2021

The benefits of human consumption of goat's milk are given by the presence in this milk of sh... more The benefits of human consumption of goat's milk are given by the presence in this milk of short-chain fatty acids (approximately 20% are short-chain fatty acids) and medium-chain fatty acids (55%), this milk being easier to digest. An important qualitative indicator of goat's milk with technological, nutritional and dietary impact is the fat content. Our data show that the percentage of milk fat increases immediately after parturition, then decreases for most of the lactation. This is due to two factors: a diluting effect, by increasing the volume of milk to the maximum level of lactation and a decreasing effect of lipid mobilization, which leads to a decrease in the plasma level of unesterified fatty acids (especially C18:0 and C18:1), with a role in lipid synthesis in the mammary gland. From the third month of lactation, the average daily amount of milking milk undergoes only slight variations. Also, the fat and protein percentage remain relatively constant during June-Au...

2018 19th International Scientific Conference on Electric Power Engineering (EPE), 2018

This paper describes the design of a new electronic product, integrally designed by the authors, ... more This paper describes the design of a new electronic product, integrally designed by the authors, part of a research study and of a technological transfer, from our university to the industrial market. It is a low voltage class D surge-arrester, based only on metal-oxide varistors (ZnO varistors), made specifically for low voltage residential equipment, domestic ore home appliances. Located inside a classic multiplug socket, it could protect the whole LV installation inside that building against any type of occurring overvoltage (pulsed and long ones). The results of the applicative research were transferred, to be mass produced, to a local electronic products manufacturer, Protenergo S.A. This home module has some original solutions which are making this product cheaper, easy to use and adapted to any existing low voltage installation even in ancient (refurbished or not) old buildings. We will present also a simple and original analytical method applied to choose the exact type of varistor required. Starting from researches and 3D sketches, it arrives to a fully functional product, applied on a large scale since 2015.

Summary The photoperiod is the primary environmental used t o regulate reproduction on bucks. The... more Summary The photoperiod is the primary environmental used t o regulate reproduction on bucks. The objective of scientifically research was to observe the variations of testosterone concentration in correlation with test icular volume for a period during 2 years, at White of Banat bucks. The studies were conducted at the experiment al farm of ANCC Caprirom and the biochemical determinations of testosterone concentrations were realized at the Ovidius University - Laboratory of Cellular and Molecular Biology by ELISA method. Plasma testosterone was analyzed in blood samples collected once a week. Blood samples were collected always at the sa me time of the day to avoid circadian variations. The testosterone concentratio ns on White of Banat between January and February for the first and second year of exper iment were over than basal level and in March the testosterone level decreases down to the basal level. On the end of spring, in both years of experiment, we observed a slight augme...

Summary The choice of an adequate manner of breeding and exploiting small ruminants which will le... more Summary The choice of an adequate manner of breeding and exploiting small ruminants which will lead to the increased reproduction efficiency and a lso to the increased number and quality of these animals depends on the determination of an ea rly diagnosis of gestation. This study was carried out to determine caprine pregnancy-associat ed glycoprotein (cPAG) and progesterone (P4) levels in the plasma of Carpathian goats throu ghout gestation. The cPAG levels were determined with a heterologous RIA. The P4 levels were measured by RIA. The statistical analysis of the PAG concentration values in goats w ith known reproduction grouped females into 4 groups, namely: pregnant RIA-clinically preg nant (positive diagnosis), non-pregnant RIA-clinically non-pregnant (positive diagnosis), n on-pregnant RIA-clinically pregnant (incorrect diagnosis) and embryonic mortality. The results obtained demonstrated that the precision of the correct diagnosis for pregnancy an d non-pregnancy on days 14-35...

2017 13th International Conference on Network and Service Management (CNSM), 2017

While the scale, frequency and impact of the recent cyber-and DoS-attacks have all increased, the... more While the scale, frequency and impact of the recent cyber-and DoS-attacks have all increased, the traditional security management systems are still supervised by human operators in the decisional loop. To cope with the new breed of machine-driven attacks-particularly those designed to overload the humans in the loop-the next-generation anomaly detection and attack mitigation schema, i.e. the network security management, must improve greatly in speed and accuracy: become machine-driven, too. As infrastructure we propose an FPGA-accelerated Network Function Virtualization that potentially enhances the current multi-Tbps switching fabrics with SDN-based security capabilities of vastly higher performance and scalability. As key novelties, we contribute (i) sub-ms detection lag (ii) of the top 9 Akamai attacks [1] with (iii) a real-time SDN feedback loop between a distributed programmable data plane and a centralized SDN controller, (iv) coupled via a global N:1 mirror. We validate the concept in an actual datacenter network with a new security application that can detect and mitigate real-world dDoS attacks, with lags from 430 us up to 3 ms-several orders of magnitude faster than before.

Lecture Notes in Computer Science, 2019

Color normalization is one of the main tasks in the processing pipeline of computer-aided diagnos... more Color normalization is one of the main tasks in the processing pipeline of computer-aided diagnosis (CAD) systems in histopathology. This task reduces the color and intensity variations that are typically present in stained whole-slide images (WSI) due to, e.g., non-standardization of staining protocols. Moreover, it increases the accuracy of machine learning (ML) based CAD systems. Given the vast amount of gigapixel-sized WSI data, and the need to reduce the time-to-insight, there is an increasing demand for efficient ML systems. In this work, we present a high-performance pipeline that enables big data analytics for WSIs in histopathology. As an exemplary ML inference pipeline, we employ a convolutional neural network (CNN), used to detect prostate cancer in WSIs, with stain normalization preprocessing. We introduce a set of optimizations across the whole pipeline: (i) we parallelize and optimize the stain normalization process, (ii) we introduce a multi-threaded I/O framework optimized for fast non-volatile memory (NVM) storage, and (iii) we integrate the stain normalization optimizations and the enhanced I/O framework in the ML pipeline to minimize the data transfer overheads and the overall prediction time. Our combined optimizations accelerate the end-to-end ML pipeline by \(7.2{\times }\) and \(21.2{\times }\), on average, for low and high resolution levels of WSIs, respectively. Significantly, it allows for a seamless integration of the ML-assisted diagnosis with state-of-the-art whole slide scanners, by reducing the prediction time for high-resolution histopathology images from \(\sim \)30 min to under 80 s.

2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 2017

Unsupervised anomaly detection (AD) has shown promise against the frequently new cyberattacks. Bu... more Unsupervised anomaly detection (AD) has shown promise against the frequently new cyberattacks. But, as anomalies are not always malicious, such systems generate prodigious false alarm rates. The resulting manual validation workload often overwhelms the IT operators: it slows down the system reaction by orders of magnitude and ultimately thwarts its applicability. Therefore, we propose a real-time network AD system that reduces the manual workload by coupling 2 learning stages. The first stage performs adaptive unsupervised AD using a shallow autoencoder. The second stage uses a custom nearest-neighbor classifier to filter the false positives by modeling the manual classification. We implement a prototype for 10-50Gbps speeds and evaluate it with traffic from a national network operator: we achieve 98.5% true and 1.3% false positive rates, while reducing the human intervention rate by 5x.

Future Generation Computer Systems, 2018

To reduce the capital investment required to acquire and maintain a high performance computing cl... more To reduce the capital investment required to acquire and maintain a high performance computing cluster, today many HPC users are moving to cloud. When deploying an application in the cloud, the users may a) fail to understand the interactions of the application with the software layers implementing the cloud system, b) be unaware of some hardware details of the cloud system, and c) fail to understand how sharing part of the cloud system with other users might degrade application performance. These misunderstandings may lead the users to select suboptimal cloud configurations in terms of cost or performance. In this work we propose a machine-learning methodology to support the user in the selection of the best cloud configuration to run the target workload before deploying it in the cloud. This enables the user to decide if and what to buy before facing the cost of porting and analyzing the application in the cloud. We couple a cloud-performance-prediction model (CP) on the cloud-provider side with a hardware-independent profile-prediction model (PP) on the user-side. PP captures the application-specific scaling behavior. The user profiles the target application while processing small datasets on small machines she (or he) owns, and applies machine learning to generate PP to predict the profiles for larger datasets to be processed in the cloud. CP is generated by the cloud provider to learn the relationships between the hardware-independent profile and cloud performance starting from the observations gathered by executing a set of training applications on a set of training cloud configurations. Since the profile data in use is hardware-independent the user and the provider can generate the prediction models independently possibly on heterogeneous machines. We apply the prediction models to Fortran-MPI benchmarks. The resulting relative error is below 12% for CP and 30% for PP. The optimal Pareto front of cloud configurations finally found when maximizing performance and minimizing execution cost on the prediction models is at most 25% away from the actual optimal solutions.

Frontiers in Medicine, 2019

Parallel Computing, 2017

Exascale applications will exploit a massive amount of parallelism. The analysis of computation a... more Exascale applications will exploit a massive amount of parallelism. The analysis of computation and communication requirements at thread-level provides important insight into the application behavior useful to optimize the design of the exascale architecture. Performing such an analysis is challenging because the exascale system is not available yet. The target applications can be profiled only on existing machines, processing a significantly smaller amount of data and exploiting significantly less parallelism. To tackle this problem we propose a methodology that couples a) unsupervised machine-learning techniques to consistently classify threads in different program runs, and b) extrapolation techniques to learn how thread classes behave at scale. The main contribution of this work is the classification methodology that assigns a class to each thread observed during a set of experimental runs carried out by varying the parallelism and the processed data size. Based on this classification we generate extrapolation models per thread class to predict the profile at a scale significantly larger than the initial experiments. The availability of per-thread-class extrapolation models simplifies the analysis of exascale systems because we manage a small number of thread classes rather than a huge number of individual threads. We apply the methodology to different com puting domains including: large-scale graph analytics, fluid dynamics, and radio astronomy. The proposed approach accurately classifies threads, whereas state-of-the-art techniques fail. The resulting extrapolation models have prediction errors of less than 10% for a real-life radio-astronomy case study.

Concurrency and Computation: Practice and Experience, 2017

The Graph500 benchmark attempts to steer the design of High-Performance Computing systems to maxi... more The Graph500 benchmark attempts to steer the design of High-Performance Computing systems to maximize the performance under memory-constricted application workloads. A realistic simulation of such benchmarks for architectural research is challenging due to size and detail limitations. By contrast, synthetic traffic workloads constitute one of the least resource-consuming methods to evaluate the performance. In this work, we provide a simulation tool for network architects that need to evaluate the suitability of their interconnect for BigData applications. Our development is a low computation-and memory-demanding synthetic traffic model that emulates the behavior of the Graph500 communications, and is publicly available in an open-source network simulator. The characterization of network traffic is inferred from a profile of several executions of the benchmark with different input parameters. We verify the validity of the equations in our model against an execution of the benchmark with a different set of parameters. Furthermore, we identify the impact of the node computation capabilities and network characteristics in the execution time of the model in a Dragonfly network.

Algorithms and Architectures for Parallel Processing, 2016

As BigData applications have gained momentum over the last years, the Graph500 benchmark has appe... more As BigData applications have gained momentum over the last years, the Graph500 benchmark has appeared in an attempt to steer the design of HPC systems to maximize the performance under memoryconstricted application workloads. A realistic simulation of such benchmarks for architectural research is challenging due to size and detail limitations, and synthetic traffic workloads constitute one of the least resource-consuming methods to evaluate the performance. In this work, we propose a synthetic traffic model that emulates the behavior of the Graph500 communications. Our model is empirically obtained through a characterization of several executions of the benchmark with different input parameters. We verify the validity of our model against a characterization of the execution of the benchmark with different parameters. Our model is well-suited for implementation in an architectural simulator.

Hematological Oncology, 2016

Primary nodal marginal zone lymphoma (NMZL) is a rare disease. There is no current consensus on h... more Primary nodal marginal zone lymphoma (NMZL) is a rare disease. There is no current consensus on how to treat it. The bendamustine plus rituximab (BR) regimen is effective for the treatment of follicular and other indolent lymphomas, but its efficacy in NMZL is not known. We analyzed the outcome of 14 patients diagnosed with NMZL (median age 67 years) who were treated with 375 mg/m 2 of rituximab on day 1 and 90 mg/m 2 of bendamustine on days 1 and 2. The overall and complete response rates were 93% and 71%, respectively. Major toxicity (grade 3/4 neutropenia) occurred in 5% of treatment courses. After a median follow-up of 22 months (range: 18-55), the overall survival and the free survival rates were 100% and 93%, respectively. None of the patients showing a complete or partial response developed secondary myelodysplastic syndrome/acute myeloid leukemia. Bendamustine plus rituximab was found to be an active and well-tolerated regimen leading to the rapid control of disease.

Artificial insemination (AI) in livestock is used to optimize reproduction efficiency. Compared t... more Artificial insemination (AI) in livestock is used to optimize reproduction efficiency. Compared to other semen preservation methods, cryopreservation is an established industry used worldwide for performing AI. Adequate protocols for semen collection and freezing and then for the use in the AI are set up for all the animal species. In sheep, AI with frozen-thawed semen resulted low fertility rate, which limits the practical application of this technique. Progressive sperm motility, sperm viability, sperm plasma membrane integrity and NAR were significantly (P < 0.05) higher for BIOX, MILK, and TEY extenders. Progressive motility increased significantly (p < 0.01) using licorice extract 10, 50 and 100 g/ml. Diluter type had a significant effect (p < 0.01) on sperm motility. The percentage of progressive motility in all extenders media containing LDL was also higher compared with 20% EY (control) during dilution and equilibration stages. All extenders containing LDL reduced the percentages of abnormalities after dilution as compared to control 20% egg yolk. The percentages of intact Acrosome in all other extenders containing LDL were significantly higher than 20% egg yolk extender. The highest percentage of postthaw progressive motility was recorded in extender containing 20mm glutamine. After dilution and equilibration, supplementation of glutamine at concentration of 40 and 60mm caused a significant increase in plasma membrane intact compared with control and all other concentrations tested. No significant difference between the control and the irradiated samples for viability However, the semen samples irradiated with 6.12 J/cm 2 showed a slight increase in sperm progressive motility, viability, osmotic resistance, Acrosome and DNA integrity, respect to the semen samples irradiated at low energy doses and control semen samples. Cysteine effected on the ultra-structure of the ram sperm cell within the freezing-thawing dynamics. The positive effect of Cysteine could be a result of its interraction with membranes phospholipids during the freezing, giving it a better Cryopreservation.