Dora Cai - Academia.edu (original) (raw)
Papers by Dora Cai
Computational Intelligence, 1991
Real-time surveillance systems, network and telecommunication systems, and other dynamic processe... more Real-time surveillance systems, network and telecommunication systems, and other dynamic processes often generate tremendous (potentially infinite) volume of stream data. Effective analysis of such stream data poses great challenges to database and data mining researchers, due to its unique features, such as single-scan algorithm, multi-dimensional online analysis, fast response time, etc.
and archive the data from the Dark Energy Survey (DES) over the five year period of operation. Th... more and archive the data from the Dark Energy Survey (DES) over the five year period of operation. This paper focuses on a new adaptable processing framework developed to perform highly automated, high performance data parallel processing. The new processing framework has been used to process 45 nights of simulated DECam supernova imaging data, and was extensively used in the DES Data Challenge 4, where it was used to process thousands of square degrees of simulated DES data. 1.
1. National Center for Supercomputing Applications; University of Illinois at Urbana Champaign; U... more 1. National Center for Supercomputing Applications; University of Illinois at Urbana Champaign; Urbana, USA 2. Department of Communication; University of Illinois at Urbana Champaign; Urbana, USA ... 1 Data Collection 1.1 Collect log data from DEATH, QUEST, EXPERIENCE, and ACHIEVEMENT tables. ... 2 Data Preparation 2.1 Suppress data for known "gold farmer" accounts. 2.2 Create a list of quest names from quest log. 2.3 Suppress solo quests. ... 3.1 Two or more characters constitute a "possible group" if: a. they all add the ...
The Dark Energy Survey (DES) is a 5000 sq deg griz imaging survey to be conducted using a propose... more The Dark Energy Survey (DES) is a 5000 sq deg griz imaging survey to be conducted using a proposed 3 sq deg (2.2◦-diameter) wide-field mosaic camera on the CTIO Blanco 4m telescope. The primary scientific goal of the DES is to constrain dark energy cosmological parameters via four complementary methods: galaxy cluster counting, weak lensing, galaxy angular correlations, and Type Ia supernovae, supported by precision photometric redshifts. Here we present the photometric calibration plans for the DES, including a discussion of standard stars and field-to-field calibrations.
The Dark Energy Survey (DES; operations 2010-2016) will image 5000 deg2 of the southern sky using... more The Dark Energy Survey (DES; operations 2010-2016) will image 5000 deg2 of the southern sky using a new 3 deg2 imager (DECam) for the CTIO Blanco 4-m telescope. The total data volume after the end of the survey will exceed 1 peta-byte, which requires our data management system (DMS) to offer a high degree of automated processing. Our DMS leverages
Volume 3: Manufacturing Equipment and Systems
With the rapid development of sensing, communication, and computing technologies and infrastructu... more With the rapid development of sensing, communication, and computing technologies and infrastructure, today’s manufacturing industry is marching towards a big data era and a new generation of digitalization and intelligence. The availability of big data provides us with a golden opportunity to promote smart manufacturing. Nevertheless, the deployment and popularization of big data analytics in manufacturing is still at its nascent stage. One critical challenge results from the lack of high-performance computing (HPC) capability, which is crucial for responsive and intelligent decision-making in the modern manufacturing industry. To address this challenge, this paper proposes a framework and some general guidelines for implementing big data analytics in an HPC environment. The details of the whole workflow, from the prototype to the final application, are high-lighted. A case study for intelligent 3D sensing with real-world manufacturing data is presented to demonstrate the effectiven...
NCSA, Univ. of Illinois, Urbana, IL 61801 LMU, Munich, Germany Dept. of Comm, Univ. of Illinois U... more NCSA, Univ. of Illinois, Urbana, IL 61801 LMU, Munich, Germany Dept. of Comm, Univ. of Illinois Urbana, IL 61801 This material is based in part upon work supported by National Science Foundation under the grants IIS-0729421, IIS-1247861, OCI-0838231 and OCI-0838402, Army Research Institute under the grant W91WAW-08-C-0106, Air Force Research Lab under the grant FA8650-10-C-7010, Army Research Lab under the grant W911NF-09-2-0053, and the Deutsche Forschungsgemeinschaft (DFG). This research was also supported in part by the National Science Foundation via the XSEDE project’s Extended Collaborative Support Service under the grant NSF-OCI 1053575. The data used for this research was provided by Travian Games. We would like to thank the Gordon group at SDSC for their constant support. We would also like to thank the Campus Cluster group at NCSA/UIUC for their help hosting the game log databases. Any opinions, findings, and conclusions or recommendations expressed in this material are th...
Journal of Manufacturing Systems
Journal of Computer-Mediated Communication, 2016
Prior research on digital games illustrates a perceived gender gap in participation and performan... more Prior research on digital games illustrates a perceived gender gap in participation and performance, suggesting men as playing more and better than women. This article challenges the gender gap using longitudinal behavioral data of men and women in 2 MMOs in the United States and China. Results show that women advance at least as fast as men do in both games. Thus, perceived gender-based performance disparities seem to result from factors that are confounded with gender (i.e., amount of play), not player gender itself. We conclude that the stereotype of female players as inferior is not only false, but also a potential cause for unequal participation in digital gaming. Digital games are one of the most popular and fast-growing entertainment media in today's media landscape , with an estimated global market size of $93 billion (Gartner, 2013). The broad appeal of games has brought about a gradual demographic shift. According to the Entertainment Software Association,
... 2) Vx [professor(x) -i ((Sex(x) = male) A (Age(x) E old) A (Birthplace(x) E Canada) A (Salary... more ... 2) Vx [professor(x) -i ((Sex(x) = male) A (Age(x) E old) A (Birthplace(x) E Canada) A (Salary(x) E high)[t: 20%]) v... . The previous discussion can be summarized in the following algorithmAlgorithm 1. LQCHRAearning quantitative characteris-tic rules from relational databases. ...
... For example, the information that "Vancouver is a city of British Columbia, whic... more ... For example, the information that "Vancouver is a city of British Columbia, which, in turn, is a ... be removed in generalization, which implies that general properties of a graduate student cannot be ... For example, "physics" can be substituted for by "science" and "Vancouver" by "BC". ...
Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure - XSEDE '15, 2015
Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure - XSEDE '15, 2015
ACM/IEEE SC 2005 Conference (SC'05), 2005
Third IEEE International Conference on Data Mining, 2003
ABSTRACT Multiple scientific research projects making use of TeraGrid resources have been support... more ABSTRACT Multiple scientific research projects making use of TeraGrid resources have been supported by very large databases. All those projects have similar data flows. First, a tremendous amount of data is collected and transferred from external sources using TeraGrid toolkits. Second, the grid computing-enabled data processing workflow is applied to validate the data, extract the metadata information, and upload the data into scientific databases. Third, for the life time of the projects, the database data is constantly retrieved, manipulated and analyzed using TeraGrid facilities. Since all those projects have accumulated multiple terabytes of data, integrating grid computing with database technology has become increasingly important to effectively support scientific databases on the TeraGrid. In contrast to other kinds of databases, a scientific database is not only a warehouse for data archiving, but also an analysis test-bed for scientific and engineering research. In scientific databases, one must manage massive volumes of data, heterogeneous data sources, high data dimensionality, and intensive query processing. These demanding characteristics have made the support of scientific databases on TeraGrid a great challenge. To meet this challenge, we have performed some studies on advanced database technologies for supporting scientific research on TeraGrid resources. In this paper, we present the results of our study. After extensive experiments on a variety of techniques, we have developed and implemented four techniques that significantly improve the performance and management of scientific databases: (1) parallelizing database operations to fully utilize available computational resources, (2) applying the load-merge mechanism to speed up data loading, (3) constructing a primary / standby environment for data replication and load balancing, and (4) applying stored procedures to implement complex logic and simplify multi- step transactions to a single database operation. All these techniques are aimed to take full advantage of the TeraGrid facilities, support robust data management, and provide an efficient computational environment for the scientific and engineering research community. Although this paper only reports on the study of three scientific databases, on the domains of Astronomy, Social Science and Materials Science, respectively, we believe our findings offer a valuable guideline for very large database support across all science domains.
2015 48th Hawaii International Conference on System Sciences, 2015
Computational Intelligence, 1991
Real-time surveillance systems, network and telecommunication systems, and other dynamic processe... more Real-time surveillance systems, network and telecommunication systems, and other dynamic processes often generate tremendous (potentially infinite) volume of stream data. Effective analysis of such stream data poses great challenges to database and data mining researchers, due to its unique features, such as single-scan algorithm, multi-dimensional online analysis, fast response time, etc.
and archive the data from the Dark Energy Survey (DES) over the five year period of operation. Th... more and archive the data from the Dark Energy Survey (DES) over the five year period of operation. This paper focuses on a new adaptable processing framework developed to perform highly automated, high performance data parallel processing. The new processing framework has been used to process 45 nights of simulated DECam supernova imaging data, and was extensively used in the DES Data Challenge 4, where it was used to process thousands of square degrees of simulated DES data. 1.
1. National Center for Supercomputing Applications; University of Illinois at Urbana Champaign; U... more 1. National Center for Supercomputing Applications; University of Illinois at Urbana Champaign; Urbana, USA 2. Department of Communication; University of Illinois at Urbana Champaign; Urbana, USA ... 1 Data Collection 1.1 Collect log data from DEATH, QUEST, EXPERIENCE, and ACHIEVEMENT tables. ... 2 Data Preparation 2.1 Suppress data for known "gold farmer" accounts. 2.2 Create a list of quest names from quest log. 2.3 Suppress solo quests. ... 3.1 Two or more characters constitute a "possible group" if: a. they all add the ...
The Dark Energy Survey (DES) is a 5000 sq deg griz imaging survey to be conducted using a propose... more The Dark Energy Survey (DES) is a 5000 sq deg griz imaging survey to be conducted using a proposed 3 sq deg (2.2◦-diameter) wide-field mosaic camera on the CTIO Blanco 4m telescope. The primary scientific goal of the DES is to constrain dark energy cosmological parameters via four complementary methods: galaxy cluster counting, weak lensing, galaxy angular correlations, and Type Ia supernovae, supported by precision photometric redshifts. Here we present the photometric calibration plans for the DES, including a discussion of standard stars and field-to-field calibrations.
The Dark Energy Survey (DES; operations 2010-2016) will image 5000 deg2 of the southern sky using... more The Dark Energy Survey (DES; operations 2010-2016) will image 5000 deg2 of the southern sky using a new 3 deg2 imager (DECam) for the CTIO Blanco 4-m telescope. The total data volume after the end of the survey will exceed 1 peta-byte, which requires our data management system (DMS) to offer a high degree of automated processing. Our DMS leverages
Volume 3: Manufacturing Equipment and Systems
With the rapid development of sensing, communication, and computing technologies and infrastructu... more With the rapid development of sensing, communication, and computing technologies and infrastructure, today’s manufacturing industry is marching towards a big data era and a new generation of digitalization and intelligence. The availability of big data provides us with a golden opportunity to promote smart manufacturing. Nevertheless, the deployment and popularization of big data analytics in manufacturing is still at its nascent stage. One critical challenge results from the lack of high-performance computing (HPC) capability, which is crucial for responsive and intelligent decision-making in the modern manufacturing industry. To address this challenge, this paper proposes a framework and some general guidelines for implementing big data analytics in an HPC environment. The details of the whole workflow, from the prototype to the final application, are high-lighted. A case study for intelligent 3D sensing with real-world manufacturing data is presented to demonstrate the effectiven...
NCSA, Univ. of Illinois, Urbana, IL 61801 LMU, Munich, Germany Dept. of Comm, Univ. of Illinois U... more NCSA, Univ. of Illinois, Urbana, IL 61801 LMU, Munich, Germany Dept. of Comm, Univ. of Illinois Urbana, IL 61801 This material is based in part upon work supported by National Science Foundation under the grants IIS-0729421, IIS-1247861, OCI-0838231 and OCI-0838402, Army Research Institute under the grant W91WAW-08-C-0106, Air Force Research Lab under the grant FA8650-10-C-7010, Army Research Lab under the grant W911NF-09-2-0053, and the Deutsche Forschungsgemeinschaft (DFG). This research was also supported in part by the National Science Foundation via the XSEDE project’s Extended Collaborative Support Service under the grant NSF-OCI 1053575. The data used for this research was provided by Travian Games. We would like to thank the Gordon group at SDSC for their constant support. We would also like to thank the Campus Cluster group at NCSA/UIUC for their help hosting the game log databases. Any opinions, findings, and conclusions or recommendations expressed in this material are th...
Journal of Manufacturing Systems
Journal of Computer-Mediated Communication, 2016
Prior research on digital games illustrates a perceived gender gap in participation and performan... more Prior research on digital games illustrates a perceived gender gap in participation and performance, suggesting men as playing more and better than women. This article challenges the gender gap using longitudinal behavioral data of men and women in 2 MMOs in the United States and China. Results show that women advance at least as fast as men do in both games. Thus, perceived gender-based performance disparities seem to result from factors that are confounded with gender (i.e., amount of play), not player gender itself. We conclude that the stereotype of female players as inferior is not only false, but also a potential cause for unequal participation in digital gaming. Digital games are one of the most popular and fast-growing entertainment media in today's media landscape , with an estimated global market size of $93 billion (Gartner, 2013). The broad appeal of games has brought about a gradual demographic shift. According to the Entertainment Software Association,
... 2) Vx [professor(x) -i ((Sex(x) = male) A (Age(x) E old) A (Birthplace(x) E Canada) A (Salary... more ... 2) Vx [professor(x) -i ((Sex(x) = male) A (Age(x) E old) A (Birthplace(x) E Canada) A (Salary(x) E high)[t: 20%]) v... . The previous discussion can be summarized in the following algorithmAlgorithm 1. LQCHRAearning quantitative characteris-tic rules from relational databases. ...
... For example, the information that "Vancouver is a city of British Columbia, whic... more ... For example, the information that "Vancouver is a city of British Columbia, which, in turn, is a ... be removed in generalization, which implies that general properties of a graduate student cannot be ... For example, "physics" can be substituted for by "science" and "Vancouver" by "BC". ...
Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure - XSEDE '15, 2015
Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure - XSEDE '15, 2015
ACM/IEEE SC 2005 Conference (SC'05), 2005
Third IEEE International Conference on Data Mining, 2003
ABSTRACT Multiple scientific research projects making use of TeraGrid resources have been support... more ABSTRACT Multiple scientific research projects making use of TeraGrid resources have been supported by very large databases. All those projects have similar data flows. First, a tremendous amount of data is collected and transferred from external sources using TeraGrid toolkits. Second, the grid computing-enabled data processing workflow is applied to validate the data, extract the metadata information, and upload the data into scientific databases. Third, for the life time of the projects, the database data is constantly retrieved, manipulated and analyzed using TeraGrid facilities. Since all those projects have accumulated multiple terabytes of data, integrating grid computing with database technology has become increasingly important to effectively support scientific databases on the TeraGrid. In contrast to other kinds of databases, a scientific database is not only a warehouse for data archiving, but also an analysis test-bed for scientific and engineering research. In scientific databases, one must manage massive volumes of data, heterogeneous data sources, high data dimensionality, and intensive query processing. These demanding characteristics have made the support of scientific databases on TeraGrid a great challenge. To meet this challenge, we have performed some studies on advanced database technologies for supporting scientific research on TeraGrid resources. In this paper, we present the results of our study. After extensive experiments on a variety of techniques, we have developed and implemented four techniques that significantly improve the performance and management of scientific databases: (1) parallelizing database operations to fully utilize available computational resources, (2) applying the load-merge mechanism to speed up data loading, (3) constructing a primary / standby environment for data replication and load balancing, and (4) applying stored procedures to implement complex logic and simplify multi- step transactions to a single database operation. All these techniques are aimed to take full advantage of the TeraGrid facilities, support robust data management, and provide an efficient computational environment for the scientific and engineering research community. Although this paper only reports on the study of three scientific databases, on the domains of Astronomy, Social Science and Materials Science, respectively, we believe our findings offer a valuable guideline for very large database support across all science domains.
2015 48th Hawaii International Conference on System Sciences, 2015