Maozhen Li - Profile on Academia.edu (original) (raw)

Papers by Maozhen Li

Product or company names used in this set are for identification purposes only. Inclusion of the ... more Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark. Library of Congress Cataloging-in-Publication Data Handbook of research on P2P and grid systems for service-oriented computing : models, methodologies and applications / Nick Antonopoulos ... [et al.]. p. cm. Includes bibliographical references and index.

Future Generation Computer Systems, 2006

This paper presents a predictable and grouped genetic algorithm (PGGA) for job scheduling. The no... more This paper presents a predictable and grouped genetic algorithm (PGGA) for job scheduling. The novelty of the PGGA is twofold: (1) a job workload estimation algorithm is designed to estimate a job workload based on its historical execution records, (2) the divisible load theory (DLT) is applied to predict an optimal solution in searching a large scheduling space so that the convergence process can be speeded up. Comparison with traditional scheduling methods such as first-come-first-serve (FCFS), random scheduling and a typical genetic algorithm (TGA) indicates that the PGGA is more effective and efficient in finding optimal scheduling solutions.

A MapReduce based distributed LSI

Latent Semantic Indexing is a widely used text mining technology nowadays due its effectiveness i... more Latent Semantic Indexing is a widely used text mining technology nowadays due its effectiveness in dealing with the problems of synonymy and polysemy within a proper matrix scale. However LSI is enormously computationally intensive especially for processing large scale data. And effective solution is to increase the computational power available to LSI using multiple computing nodes. In this paper we propose a novel MapReduce based distributed LSI using Hadoop distributed computing architecture to implement K-means algorithm to cluster the documents and then using LSI on the clustered results. We evaluated the performances of the proposed MapReduce based LSI and comparison are made with standalone LSI. The results show a great improvement of LSI's performance in terms of speed.

Evaluating Machine Learning Techniques for Automatic Image Annotations

The past decade has seen a rapid development in content based image retrieval (CBIR). CBIR is the... more The past decade has seen a rapid development in content based image retrieval (CBIR). CBIR is the retrieval of images based on their low level features such as color, texture, shape etc. To improve the retrieval accuracy, the research focus has been shifted from designing sophisticated low-level feature extraction algorithms to reducing the `semantic gap' between the visual features and the richness of human semantics. Image annotation techniques have been proposed to facilitate CBIR. This paper evaluates 7 representative machine learning techniques for automatic image annotations using 5000 images. An image annotation prototype is implemented and the evaluation results are presented and analyzed.

Grid-based Semantic Integration and dissemination of medical information

Healthcare data and medical information need to be seamlessly accessible and available at all tim... more Healthcare data and medical information need to be seamlessly accessible and available at all times to the various healthcare stakeholders. Inability to share, integrate and access critical healthcare information is a challenge for the healthcare IT. Moreover, semantic interoperability of health-related heterogeneous data sources is a challenging issue and healthGrids are expected to address this challenge in a systematic manner. This paper proposes a new architecture: ASIDS (architecture for semantic integration of data sources), that could be a potential candidate for solving the challenge of semantic interoperability of geographically distributed heterogeneous data sources. ASIDS has three main components that are loosely coupled (through interfaces) in a distributed manner. This architecture sets the basis for future research in terms of implementing a healthGrid application in real environments.

The past few years have seen the Grid rapidly evolving towards a service-oriented computing infra... more The past few years have seen the Grid rapidly evolving towards a service-oriented computing infrastructure. With the OGSA facilitating this evolution, it is expected that WSRF will be acting as the main an enabling technology to drive the Grid further. Resource monitoring plays a critical role in managing a large-scale Grid system. This paper presents GREMO, a lightweight resource monitor developed with Globus Toolkit 4 (GT4) for monitoring CPU and memory of computing nodes in a Windows and Linux environments.

The grid - core technologies

Copyright © 2005 John Wiley &amp; Sons Ltd, The Atrium, Southern Gate, Chichester, West S... more Copyright © 2005 John Wiley &amp; Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England Telephone (+44) 1243 779777 Email (for orders and customer service enquiries): cs-books@wiley.co.uk Visit our Home Page on www.wiley.com All Rights ...

Concurrency and Computation: Practice and Experience, 2000

A Problem Solving Environment (PSE) is a complete, integrated computing environment for composing... more A Problem Solving Environment (PSE) is a complete, integrated computing environment for composing, compiling and running applications in a speci c problem area or domain. Parts of the PSE are domain independent, such as the Visual Programming Composition Environment (VPCE), which may be used for constructing application in a number of di erent domains, however, other parts are domain speci c, such as rules to support particular types of components. A domain independent VPCE is rst described, which serves as a user interface for a PSE, and uses Java and CORBA to provide a framework of tools to enable the construction of scienti c applications from components.

A Problem Solving Environment (PSE) should aim to hide implementation and systems details from ap... more A Problem Solving Environment (PSE) should aim to hide implementation and systems details from application developers, to enable a scientist or engineer to concentrate on the science. A PSE is, by definition, problem domain specific, but the infrastructure for a PSE can be problem domain independent. A domain independent infrastructure for a PSE is described, followed by two application dependent PSEs for Molecular Dynamics and Boundary Element codes that make use of our generic PSE infrastructure.

The past few years have seen the Grid is evolving as a service-oriented computing infrastructure.... more The past few years have seen the Grid is evolving as a service-oriented computing infrastructure. It is envisioned that various resources in a future Grid environment will be exposed as services. Service discovery becomes an issue of vital importance for utilising Grid facilities. This paper presents RSSM, a Rough Sets based service matchmaking algorithm for service discovery that can deal with uncertainty of service properties when matching service advertisements with service requests. The evaluation results show that the RSSM algorithm is more effective in service discovery compared with other mechanisms such as UDDI and OWL-S.

Discovering complex associations, anomalies and patterns in distributed data sets is gaining popu... more Discovering complex associations, anomalies and patterns in distributed data sets is gaining popularity in a range of scientific, medical and business applications. Various algorithms are employed to perform data analysis within a domain, and range from statistical to machine learning and AI based techniques. Several issues need to be addressed however to scale such approaches to large data sets, particularly when these are applied to data distributed at various sites. As new analysis techniques are identified, the core tool set must enable easy integration of such analytical components. Similarly, results from an analysis engines must be sharable, to enable storage, visualisation or further analysis of results.

The architecture of a component based environment for constructing scientific applications — gene... more The architecture of a component based environment for constructing scientific applications — generally referred to as a Problem Solving Environment (PSE), is described. Each component is a self-contained program, and may be a sequential code developed in C, Fortran or Java, or may contain internal parallelism using MPI or PVM libraries. A user visually constructs an application by combining components from a local or remote repository as a data flow graph. Components are self-documenting, with their interfaces defined in XML, which enables a user to search for components suitable to a particular application, enables a component to be configured when instantiated, enables each component to register with an event listener and facilitates the sharing of components between repositories. The data flow graph is also encoded in XML, and sent to a resource manager for executing the application on a workstation cluster, or a heterogeneous environment made of workstations and high performance parallel machines. Components in the PSE can also wrap legacy codes. We also describe the architecture and implementation of a molecular dynamics application based on the Lennard-Jones code [18], containing MPI calls, executed on a cluster of workstations, and based on our generic component model. A user can submit simulation data to the application remotely using a Java based user interface. Users need not download any softwares for the simulation and do not need to know the exact implementation.

Future Generation Computer Systems, 2001

Techniques for wrapping an MPI-based molecular dynamics (MD) simulation code as Java/CORBA compon... more Techniques for wrapping an MPI-based molecular dynamics (MD) simulation code as Java/CORBA components, for use within a distributed component based problem solving environment (CB-PSE), is presented. A legacy code for simulating a Lennard-Jones fluid is first wrapped as a single CORBA object, followed by division of the code into computational sub-units, where each sub-unit is wrapped as a CORBA object containing MPI calls, and run on a cluster of workstations -enabling different MPI implementations to inter-operate. Using a Java implementation, users can submit simulation tasks through a Web based interface, without needing to know implementation details of the legacy code, or the exact interaction between sub-units within the code. We provide performance comparisons of wrapping the entire MD code as a single object versus wrapping sub-units within it, and offer a simple performance model to explain our findings.

Peer-to-peer Networking and Applications, 2009

Information services play a crucial role in grid environments in that the state information can b... more Information services play a crucial role in grid environments in that the state information can be used to facilitate the discovery of resources and the services available to meet user requirements, and also to help tune the performance of a grid system. However, the large size and dynamic nature of the grid brings forth a number of challenges for information services. This paper presents PIndex, a grouped peer-to-peer network that can be used for scalable grid information services. PIndex builds on Globus MDS4, but introduces peer groups to dynamically split the large grid information search space into many small sections to enhance its scalability and resilience. PIndex is subsequently modeled with Colored Petri Nets for performance evaluation. The simulation results show that PIndex is scalable and resilient in dealing with a large number of peer nodes.

Journal of Parallel and Distributed Computing, 2003

This paper describes techniques used to leverage high performance legacy codes as CORBA component... more This paper describes techniques used to leverage high performance legacy codes as CORBA components to a distributed problem solving environment. It first briefly introduces the software architecture adopted by the environment. Then it presents a CORBA oriented wrapper generator (COWG) which can be used to automatically wrap high performance legacy codes as CORBA components. Two legacy codes have been wrapped with COWG. One is an MPI-based molecular dynamic simulation (MDS) code, the other is a finite element based computational fluid dynamics (CFD) code for simulating incompressible Navier-Stokes flows. Performance comparisons between runs of the MDS CORBA component and the original MDS legacy code on a cluster of workstations and on a parallel computer are also presented. Wrapped as CORBA components, these legacy codes can be reused in a distributed computing environment. The first case shows that high performance can be maintained with the wrapped MDS component. The second case shows that a Web user can submit a task to the wrapped CFD component through a Web page without knowing the exact implementation of the component. In this way, a user's desktop computing environment can be extended to a high performance computing environment using a cluster of workstations or a parallel computer.

Software - Practice and Experience, 2004

This paper presents WSOWG, a Web-services-oriented wrapper generator for automatically wrapping n... more This paper presents WSOWG, a Web-services-oriented wrapper generator for automatically wrapping non-networked legacy codes as Web services for reuse in distributed problem-solving environments. Using WSOWG, a finite element based computational fluid dynamics (CFD) legacy code has been wrapped as a Web service. A problem-solving environment for simulating incompressible Navier–Stokes flows has also been implemented. A user makes use of the CFD service through a Web page without knowing the exact implementation of the service. In this way, a user's computing environment can be extended to a heterogeneous distributed computing environment. Performance evaluation shows that the overhead to invoke the CFD Web service generated by WSOWG using Simple Object Access Protocol (SOAP) and CORBA Internet Inter-ORB Protocol (IIOP) is reasonable compared with that of invoking another CFD Web service manually wrapped from the CFD legacy code using SOAP only. Copyright © 2004 John Wiley & Sons, Ltd.

With the wide adoption of Open Grid Services Architecture (OGSA) and Web Services Resource Framew... more With the wide adoption of Open Grid Services Architecture (OGSA) and Web Services Resource Framework (WSRF), the Grid is emerging as a service-oriented computing infrastructure for engineers and scientists to solve data and computationally intensive problems. It is envisioned that computing resources in a future Grid environment will be exposed as services. Service discovery becomes an issue of vital importance for a wider uptake of the Grid. This paper presents RSSM, a Rough Sets based service matchmaking algorithm for service discovery with an aim to tolerate uncertainty in identifying service properties. The evaluation results show that the RSSM algorithm is more effective in service discovery compared with other mechanisms such as UDDI and OWL-S.

Future Generation Computer Systems, 2004

This paper presents SGrid, a service-oriented model for the Semantic Grid. Each Grid service in S... more This paper presents SGrid, a service-oriented model for the Semantic Grid. Each Grid service in SGrid is a Web service with certain domain knowledge. A Web services oriented wrapper generator has been implemented to automatically wrap legacy codes as Grid services exposed as Web services. Each wrapped Grid service is supplemented with domain ontology and registered with a Semantic Grid Service Ontology Repository using a Semantic Services Register. Using the wrapper generator, a finite element based computational fluid dynamics (CFDs) code has been wrapped as a Grid service, which can be published, discovered and reused in SGrid.

PortalLab: A Web Services Toolkit for Building Semantic Grid Portals

Abstract Grid is computer-based infrastructure that provides dependable, consistent, pervasive ac... more Abstract Grid is computer-based infrastructure that provides dependable, consistent, pervasive access to distributed resources. Built on top of a Grid, a Semantic Grid is a service-oriented infrastructure that provides a range of computation, information and knowledge services. A purpose of a Grid portal is to provide easy and seamless access to Grid heterogeneous resources and services through a Web-based user interface. This paper presents PortalLab, a Web Services oriented toolkit for designing, integrating and building ...