Dora Cai - Academia.edu (original) (raw)

Papers by Dora Cai

Research paper thumbnail of Photometric Calibration of the DES

Research paper thumbnail of Learning in relational databases: an attribute-oriented approach

Computational Intelligence, 1991

Research paper thumbnail of MAIDS: Mining Alarming Incidents from Data Streams

Real-time surveillance systems, network and telecommunication systems, and other dynamic processe... more Real-time surveillance systems, network and telecommunication systems, and other dynamic processes often generate tremendous (potentially infinite) volume of stream data. Effective analysis of such stream data poses great challenges to database and data mining researchers, due to its unique features, such as single-scan algorithm, multi-dimensional online analysis, fast response time, etc.

Research paper thumbnail of The Dark Energy Survey Data Management System: The Processing Framework

and archive the data from the Dark Energy Survey (DES) over the five year period of operation. Th... more and archive the data from the Dark Energy Survey (DES) over the five year period of operation. This paper focuses on a new adaptable processing framework developed to perform highly automated, high performance data parallel processing. The new processing framework has been used to process 45 nights of simulated DECam supernova imaging data, and was extensively used in the DES Data Challenge 4, where it was used to process thousands of square degrees of simulated DES data. 1.

Research paper thumbnail of Additional Tables: Identification of Groups in Online Environments

1. National Center for Supercomputing Applications; University of Illinois at Urbana Champaign; U... more 1. National Center for Supercomputing Applications; University of Illinois at Urbana Champaign; Urbana, USA 2. Department of Communication; University of Illinois at Urbana Champaign; Urbana, USA ... 1 Data Collection 1.1 Collect log data from DEATH, QUEST, EXPERIENCE, and ACHIEVEMENT tables. ... 2 Data Preparation 2.1 Suppress data for known "gold farmer" accounts. 2.2 Create a list of quest names from quest log. 2.3 Suppress solo quests. ... 3.1 Two or more characters constitute a "possible group" if: a. they all add the ...

Research paper thumbnail of Sterken The Photometric Calibration of the Dark Energy Survey

The Dark Energy Survey (DES) is a 5000 sq deg griz imaging survey to be conducted using a propose... more The Dark Energy Survey (DES) is a 5000 sq deg griz imaging survey to be conducted using a proposed 3 sq deg (2.2◦-diameter) wide-field mosaic camera on the CTIO Blanco 4m telescope. The primary scientific goal of the DES is to constrain dark energy cosmological parameters via four complementary methods: galaxy cluster counting, weak lensing, galaxy angular correlations, and Type Ia supernovae, supported by precision photometric redshifts. Here we present the photometric calibration plans for the DES, including a discussion of standard stars and field-to-field calibrations.

Research paper thumbnail of Application of the Dark Energy Survey Data Management System to the Blanco Cosmology Survey Data

The Dark Energy Survey (DES; operations 2010-2016) will image 5000 deg2 of the southern sky using... more The Dark Energy Survey (DES; operations 2010-2016) will image 5000 deg2 of the southern sky using a new 3 deg2 imager (DECam) for the CTIO Blanco 4-m telescope. The total data volume after the end of the survey will exceed 1 peta-byte, which requires our data management system (DMS) to offer a high degree of automated processing. Our DMS leverages

Research paper thumbnail of High-Performance Computing Based Big Data Analytics for Smart Manufacturing

Volume 3: Manufacturing Equipment and Systems

With the rapid development of sensing, communication, and computing technologies and infrastructu... more With the rapid development of sensing, communication, and computing technologies and infrastructure, today’s manufacturing industry is marching towards a big data era and a new generation of digitalization and intelligence. The availability of big data provides us with a golden opportunity to promote smart manufacturing. Nevertheless, the deployment and popularization of big data analytics in manufacturing is still at its nascent stage. One critical challenge results from the lack of high-performance computing (HPC) capability, which is crucial for responsive and intelligent decision-making in the modern manufacturing industry. To address this challenge, this paper proposes a framework and some general guidelines for implementing big data analytics in an HPC environment. The details of the whole workflow, from the prototype to the final application, are high-lighted. A case study for intelligent 3D sensing with real-world manufacturing data is presented to demonstrate the effectiven...

Research paper thumbnail of Feature Selection in Massive Game Log Analysis Using K-L Divergence

NCSA, Univ. of Illinois, Urbana, IL 61801 LMU, Munich, Germany Dept. of Comm, Univ. of Illinois U... more NCSA, Univ. of Illinois, Urbana, IL 61801 LMU, Munich, Germany Dept. of Comm, Univ. of Illinois Urbana, IL 61801 This material is based in part upon work supported by National Science Foundation under the grants IIS-0729421, IIS-1247861, OCI-0838231 and OCI-0838402, Army Research Institute under the grant W91WAW-08-C-0106, Air Force Research Lab under the grant FA8650-10-C-7010, Army Research Lab under the grant W911NF-09-2-0053, and the Deutsche Forschungsgemeinschaft (DFG). This research was also supported in part by the National Science Foundation via the XSEDE project’s Extended Collaborative Support Service under the grant NSF-OCI 1053575. The data used for this research was provided by Travian Games. We would like to thank the Gordon group at SDSC for their constant support. We would also like to thank the Campus Cluster group at NCSA/UIUC for their help hosting the game log databases. Any opinions, findings, and conclusions or recommendations expressed in this material are th...

Research paper thumbnail of Hierarchical measurement strategy for cost-effective interpolation of spatiotemporal data in manufacturing

Journal of Manufacturing Systems

Research paper thumbnail of Do Men Advance Faster Than Women? Debunking the Gender Performance Gap in Two Massively Multiplayer Online Games

Journal of Computer-Mediated Communication, 2016

Prior research on digital games illustrates a perceived gender gap in participation and performan... more Prior research on digital games illustrates a perceived gender gap in participation and performance, suggesting men as playing more and better than women. This article challenges the gender gap using longitudinal behavioral data of men and women in 2 MMOs in the United States and China. Results show that women advance at least as fast as men do in both games. Thus, perceived gender-based performance disparities seem to result from factors that are confounded with gender (i.e., amount of play), not player gender itself. We conclude that the stereotype of female players as inferior is not only false, but also a potential cause for unequal participation in digital gaming. Digital games are one of the most popular and fast-growing entertainment media in today's media landscape , with an estimated global market size of $93 billion (Gartner, 2013). The broad appeal of games has brought about a gradual demographic shift. According to the Entertainment Software Association,

Research paper thumbnail of Data-Driven Discovery of Quantitative Rules in Relational Databases

... 2) Vx [professor(x) -i ((Sex(x) = male) A (Age(x) E old) A (Birthplace(x) E Canada) A (Salary... more ... 2) Vx [professor(x) -i ((Sex(x) = male) A (Age(x) E old) A (Birthplace(x) E Canada) A (Salary(x) E high)[t: 20%]) v... . The previous discussion can be summarized in the following algorithmAlgorithm 1. LQCHRAearning quantitative characteris-tic rules from relational databases. ...

Research paper thumbnail of Attribute-Oriented Induction in Relational Databases

Research paper thumbnail of Knowledge Discovery in Databases: An Attribute-Oriented Approach

... For example, the information that "Vancouver is a city of British Columbia, whic... more ... For example, the information that "Vancouver is a city of British Columbia, which, in turn, is a ... be removed in generalization, which implies that general properties of a graduate student cannot be ... For example, "physics" can be substituted for by "science" and "Vancouver" by "BC". ...

Research paper thumbnail of Grouping game players using parallelized k-means on supercomputers

Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure - XSEDE '15, 2015

Research paper thumbnail of Discovering the influence of socioeconomic factors on online game behaviors

Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure - XSEDE '15, 2015

Research paper thumbnail of Optimized Data Loading for a Multi-Terabyte Sky Survey Repository

ACM/IEEE SC 2005 Conference (SC'05), 2005

Research paper thumbnail of CoMine: efficient mining of correlated patterns

Third IEEE International Conference on Data Mining, 2003

Research paper thumbnail of Supporting TeraGrid Applications by Advanced Database Technologies

ABSTRACT Multiple scientific research projects making use of TeraGrid resources have been support... more ABSTRACT Multiple scientific research projects making use of TeraGrid resources have been supported by very large databases. All those projects have similar data flows. First, a tremendous amount of data is collected and transferred from external sources using TeraGrid toolkits. Second, the grid computing-enabled data processing workflow is applied to validate the data, extract the metadata information, and upload the data into scientific databases. Third, for the life time of the projects, the database data is constantly retrieved, manipulated and analyzed using TeraGrid facilities. Since all those projects have accumulated multiple terabytes of data, integrating grid computing with database technology has become increasingly important to effectively support scientific databases on the TeraGrid. In contrast to other kinds of databases, a scientific database is not only a warehouse for data archiving, but also an analysis test-bed for scientific and engineering research. In scientific databases, one must manage massive volumes of data, heterogeneous data sources, high data dimensionality, and intensive query processing. These demanding characteristics have made the support of scientific databases on TeraGrid a great challenge. To meet this challenge, we have performed some studies on advanced database technologies for supporting scientific research on TeraGrid resources. In this paper, we present the results of our study. After extensive experiments on a variety of techniques, we have developed and implemented four techniques that significantly improve the performance and management of scientific databases: (1) parallelizing database operations to fully utilize available computational resources, (2) applying the load-merge mechanism to speed up data loading, (3) constructing a primary / standby environment for data replication and load balancing, and (4) applying stored procedures to implement complex logic and simplify multi- step transactions to a single database operation. All these techniques are aimed to take full advantage of the TeraGrid facilities, support robust data management, and provide an efficient computational environment for the scientific and engineering research community. Although this paper only reports on the study of three scientific databases, on the domains of Astronomy, Social Science and Materials Science, respectively, we believe our findings offer a valuable guideline for very large database support across all science domains.

Research paper thumbnail of Champions of Equality: Examining Gender Egalitarianism in Virtual Teams across Cultures

2015 48th Hawaii International Conference on System Sciences, 2015

Research paper thumbnail of Photometric Calibration of the DES

Research paper thumbnail of Learning in relational databases: an attribute-oriented approach

Computational Intelligence, 1991

Research paper thumbnail of MAIDS: Mining Alarming Incidents from Data Streams

Real-time surveillance systems, network and telecommunication systems, and other dynamic processe... more Real-time surveillance systems, network and telecommunication systems, and other dynamic processes often generate tremendous (potentially infinite) volume of stream data. Effective analysis of such stream data poses great challenges to database and data mining researchers, due to its unique features, such as single-scan algorithm, multi-dimensional online analysis, fast response time, etc.

Research paper thumbnail of The Dark Energy Survey Data Management System: The Processing Framework

and archive the data from the Dark Energy Survey (DES) over the five year period of operation. Th... more and archive the data from the Dark Energy Survey (DES) over the five year period of operation. This paper focuses on a new adaptable processing framework developed to perform highly automated, high performance data parallel processing. The new processing framework has been used to process 45 nights of simulated DECam supernova imaging data, and was extensively used in the DES Data Challenge 4, where it was used to process thousands of square degrees of simulated DES data. 1.

Research paper thumbnail of Additional Tables: Identification of Groups in Online Environments

1. National Center for Supercomputing Applications; University of Illinois at Urbana Champaign; U... more 1. National Center for Supercomputing Applications; University of Illinois at Urbana Champaign; Urbana, USA 2. Department of Communication; University of Illinois at Urbana Champaign; Urbana, USA ... 1 Data Collection 1.1 Collect log data from DEATH, QUEST, EXPERIENCE, and ACHIEVEMENT tables. ... 2 Data Preparation 2.1 Suppress data for known "gold farmer" accounts. 2.2 Create a list of quest names from quest log. 2.3 Suppress solo quests. ... 3.1 Two or more characters constitute a "possible group" if: a. they all add the ...

Research paper thumbnail of Sterken The Photometric Calibration of the Dark Energy Survey

The Dark Energy Survey (DES) is a 5000 sq deg griz imaging survey to be conducted using a propose... more The Dark Energy Survey (DES) is a 5000 sq deg griz imaging survey to be conducted using a proposed 3 sq deg (2.2◦-diameter) wide-field mosaic camera on the CTIO Blanco 4m telescope. The primary scientific goal of the DES is to constrain dark energy cosmological parameters via four complementary methods: galaxy cluster counting, weak lensing, galaxy angular correlations, and Type Ia supernovae, supported by precision photometric redshifts. Here we present the photometric calibration plans for the DES, including a discussion of standard stars and field-to-field calibrations.

Research paper thumbnail of Application of the Dark Energy Survey Data Management System to the Blanco Cosmology Survey Data

The Dark Energy Survey (DES; operations 2010-2016) will image 5000 deg2 of the southern sky using... more The Dark Energy Survey (DES; operations 2010-2016) will image 5000 deg2 of the southern sky using a new 3 deg2 imager (DECam) for the CTIO Blanco 4-m telescope. The total data volume after the end of the survey will exceed 1 peta-byte, which requires our data management system (DMS) to offer a high degree of automated processing. Our DMS leverages

Research paper thumbnail of High-Performance Computing Based Big Data Analytics for Smart Manufacturing

Volume 3: Manufacturing Equipment and Systems

With the rapid development of sensing, communication, and computing technologies and infrastructu... more With the rapid development of sensing, communication, and computing technologies and infrastructure, today’s manufacturing industry is marching towards a big data era and a new generation of digitalization and intelligence. The availability of big data provides us with a golden opportunity to promote smart manufacturing. Nevertheless, the deployment and popularization of big data analytics in manufacturing is still at its nascent stage. One critical challenge results from the lack of high-performance computing (HPC) capability, which is crucial for responsive and intelligent decision-making in the modern manufacturing industry. To address this challenge, this paper proposes a framework and some general guidelines for implementing big data analytics in an HPC environment. The details of the whole workflow, from the prototype to the final application, are high-lighted. A case study for intelligent 3D sensing with real-world manufacturing data is presented to demonstrate the effectiven...

Research paper thumbnail of Feature Selection in Massive Game Log Analysis Using K-L Divergence

NCSA, Univ. of Illinois, Urbana, IL 61801 LMU, Munich, Germany Dept. of Comm, Univ. of Illinois U... more NCSA, Univ. of Illinois, Urbana, IL 61801 LMU, Munich, Germany Dept. of Comm, Univ. of Illinois Urbana, IL 61801 This material is based in part upon work supported by National Science Foundation under the grants IIS-0729421, IIS-1247861, OCI-0838231 and OCI-0838402, Army Research Institute under the grant W91WAW-08-C-0106, Air Force Research Lab under the grant FA8650-10-C-7010, Army Research Lab under the grant W911NF-09-2-0053, and the Deutsche Forschungsgemeinschaft (DFG). This research was also supported in part by the National Science Foundation via the XSEDE project’s Extended Collaborative Support Service under the grant NSF-OCI 1053575. The data used for this research was provided by Travian Games. We would like to thank the Gordon group at SDSC for their constant support. We would also like to thank the Campus Cluster group at NCSA/UIUC for their help hosting the game log databases. Any opinions, findings, and conclusions or recommendations expressed in this material are th...

Research paper thumbnail of Hierarchical measurement strategy for cost-effective interpolation of spatiotemporal data in manufacturing

Journal of Manufacturing Systems

Research paper thumbnail of Do Men Advance Faster Than Women? Debunking the Gender Performance Gap in Two Massively Multiplayer Online Games

Journal of Computer-Mediated Communication, 2016

Prior research on digital games illustrates a perceived gender gap in participation and performan... more Prior research on digital games illustrates a perceived gender gap in participation and performance, suggesting men as playing more and better than women. This article challenges the gender gap using longitudinal behavioral data of men and women in 2 MMOs in the United States and China. Results show that women advance at least as fast as men do in both games. Thus, perceived gender-based performance disparities seem to result from factors that are confounded with gender (i.e., amount of play), not player gender itself. We conclude that the stereotype of female players as inferior is not only false, but also a potential cause for unequal participation in digital gaming. Digital games are one of the most popular and fast-growing entertainment media in today's media landscape , with an estimated global market size of $93 billion (Gartner, 2013). The broad appeal of games has brought about a gradual demographic shift. According to the Entertainment Software Association,

Research paper thumbnail of Data-Driven Discovery of Quantitative Rules in Relational Databases

... 2) Vx [professor(x) -i ((Sex(x) = male) A (Age(x) E old) A (Birthplace(x) E Canada) A (Salary... more ... 2) Vx [professor(x) -i ((Sex(x) = male) A (Age(x) E old) A (Birthplace(x) E Canada) A (Salary(x) E high)[t: 20%]) v... . The previous discussion can be summarized in the following algorithmAlgorithm 1. LQCHRAearning quantitative characteris-tic rules from relational databases. ...

Research paper thumbnail of Attribute-Oriented Induction in Relational Databases

Research paper thumbnail of Knowledge Discovery in Databases: An Attribute-Oriented Approach

... For example, the information that "Vancouver is a city of British Columbia, whic... more ... For example, the information that "Vancouver is a city of British Columbia, which, in turn, is a ... be removed in generalization, which implies that general properties of a graduate student cannot be ... For example, "physics" can be substituted for by "science" and "Vancouver" by "BC". ...

Research paper thumbnail of Grouping game players using parallelized k-means on supercomputers

Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure - XSEDE '15, 2015

Research paper thumbnail of Discovering the influence of socioeconomic factors on online game behaviors

Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure - XSEDE '15, 2015

Research paper thumbnail of Optimized Data Loading for a Multi-Terabyte Sky Survey Repository

ACM/IEEE SC 2005 Conference (SC'05), 2005

Research paper thumbnail of CoMine: efficient mining of correlated patterns

Third IEEE International Conference on Data Mining, 2003

Research paper thumbnail of Supporting TeraGrid Applications by Advanced Database Technologies

ABSTRACT Multiple scientific research projects making use of TeraGrid resources have been support... more ABSTRACT Multiple scientific research projects making use of TeraGrid resources have been supported by very large databases. All those projects have similar data flows. First, a tremendous amount of data is collected and transferred from external sources using TeraGrid toolkits. Second, the grid computing-enabled data processing workflow is applied to validate the data, extract the metadata information, and upload the data into scientific databases. Third, for the life time of the projects, the database data is constantly retrieved, manipulated and analyzed using TeraGrid facilities. Since all those projects have accumulated multiple terabytes of data, integrating grid computing with database technology has become increasingly important to effectively support scientific databases on the TeraGrid. In contrast to other kinds of databases, a scientific database is not only a warehouse for data archiving, but also an analysis test-bed for scientific and engineering research. In scientific databases, one must manage massive volumes of data, heterogeneous data sources, high data dimensionality, and intensive query processing. These demanding characteristics have made the support of scientific databases on TeraGrid a great challenge. To meet this challenge, we have performed some studies on advanced database technologies for supporting scientific research on TeraGrid resources. In this paper, we present the results of our study. After extensive experiments on a variety of techniques, we have developed and implemented four techniques that significantly improve the performance and management of scientific databases: (1) parallelizing database operations to fully utilize available computational resources, (2) applying the load-merge mechanism to speed up data loading, (3) constructing a primary / standby environment for data replication and load balancing, and (4) applying stored procedures to implement complex logic and simplify multi- step transactions to a single database operation. All these techniques are aimed to take full advantage of the TeraGrid facilities, support robust data management, and provide an efficient computational environment for the scientific and engineering research community. Although this paper only reports on the study of three scientific databases, on the domains of Astronomy, Social Science and Materials Science, respectively, we believe our findings offer a valuable guideline for very large database support across all science domains.

Research paper thumbnail of Champions of Equality: Examining Gender Egalitarianism in Virtual Teams across Cultures

2015 48th Hawaii International Conference on System Sciences, 2015