Web Pages Research Papers - Academia.edu (original) (raw)

In this communication a design of embedded web server based on the Ethernet technology for remote monitoring of weather parameters is presented. This web server monitors parameters viz. temperature and humidity and transmits this... more

In this communication a design of embedded web server based on the Ethernet technology for remote monitoring of weather parameters is presented. This web server monitors parameters viz. temperature and humidity and transmits this information in the form of HTML web-page. The input sensors LM35 semiconductor temperature sensor and SY-HS-220 humidity module have been employed, providing accuracy about 1° centigrade and 2% relative humidity. The web server provides simultaneous access to multiple nodes on the network.

This work shows that it is possible to exploit text and image content characteristics of logo and trademark images in Web pages for enhancing the performance of retrievals on the Web. Searching for important (authoritative) Web pages and... more

This work shows that it is possible to exploit text and image content characteristics of logo and trademark images in Web pages for enhancing the performance of retrievals on the Web. Searching for important (authoritative) Web pages and images is a desirable feature of many Web search engines and is also taken into account. State-of-the-art methods for assigning higher ranking to important Web pages, over other Web pages satisfying the query selection criteria, are considered and evaluated. PicASHOW exploits this idea in retrieval of important images on the Web using link information alone. WPicASHOW (Weighted PicASHOW), is a weighted scheme for co-citation analysis incorporating within the link analysis method of PicASHOW the text and image content of the queries and of the Web pages. The experimental results demonstrate that Web search methods utilizing content information (or combination of content and link information) perform significantly better than methods using link information alone.

This paper introduces the TactoWeb tool. TactoWeb is a Web browser allowing users with visual impairment to explore Web pages using tactile and audio feedbacks. It is used in conjunction with the Tactograph device or the iFeel mouse. We... more

This paper introduces the TactoWeb tool. TactoWeb is a Web browser allowing users with visual impairment to explore Web pages using tactile and audio feedbacks. It is used in conjunction with the Tactograph device or the iFeel mouse. We first present a comparative study of existing tools that give users with visual impairment access to Web pages. The aim of this study is to identify the capabilities and limitations of these tools in order to define important features which are needed to improve navigation on the Web for users with visual impairment. TactoWeb is designed to make spatial navigation possible, with better audio and tactile feedbacks. It should be superior to sequential exploration with only audio feedback.

Providing accessible Web pages is becoming a key concern for many providers of electronic information. There are many people who find accessing Web pages difficult and among these, vision impaired users are perhaps the group with the... more

Providing accessible Web pages is becoming a key concern for many providers of electronic information. There are many people who find accessing Web pages difficult and among these, vision impaired users are perhaps the group with the greatest needs. The Web is a strong visual environment and most designers use this aspect of the environment as a critical element in their interface and information design. Such strategies, while providing many opportunities for mainstream Web users, provide limiting and impeding outcomes for visually impaired Web users. There are a number of accessibility standards that now exist to inform and guide the designers of Web pages but little is known about precisely how best these standards can be applied and achieved. This paper will describe a study undertaken in the Australian context that sought to explore how the goals of accessibility influenced the design process and the design outcomes of an online learning environment designed to cater for visually impaired users. It is a study of the TruVision Project, a Web-based learning setting, designed to aid visually impaired users to gain an elementary qualification in Information Technology.

The Rich News system for semantically annotating television news broadcasts and augmenting them with additional web content is described. Online news sources were mined for material reporting the same stories as those found in television... more

The Rich News system for semantically annotating television news broadcasts and augmenting them with additional web content is described. Online news sources were mined for material reporting the same stories as those found in television broadcasts, and the text of these pages was semantically annotated using the KIM knowledge management platform. This resulted in more effective indexing than would have been possible if the programme transcript was indexed directly, owing to the poor quality of transcripts produced using automatic speech recognition. In addition, the associations produced between web pages and television broadcasts enables the automatic creation of augmented interactive television broadcasts and multi-media websites.

Phishing websites are forged web pages that are created by malicious people to mimic web pages of real websites and it attempts to defraud people of their personal information. Detecting and identifying Phishing websites is really a... more

Phishing websites are forged web pages that are created by malicious people to mimic web pages of real websites and it attempts to defraud people of their personal information. Detecting and identifying Phishing websites is really a complex and dynamic problem involving many factors and criteria, and because of the subjective considerations and the ambiguities involved in the detection, Fuzzy Logic model can be an effective tool in assessing and identifying phishing websites than any other traditional tool since it offers a more natural way of dealing with quality factors rather than exact values. In this paper, we present novel approach to overcome the 'fuzziness' in traditional website phishing risk assessment and propose an intelligent resilient and effective model for detecting phishing websites. The proposed model is based on FL operators which is used to characterize the website phishing factors and indicators as fuzzy variables and produces six measures and criteria's of website phishing attack dimensions with a layer structure. Our experimental results showed the significance and importance of the phishing website criteria (URL & Domain Identity) represented by layer one, and the variety influence of the phishing characteristic layers on the final phishing website rate.

In this paper, we propose a hybrid approach of Arabic scripts web page language identification based on decision tree and ARTMAP approaches. We use the decision tree approach to find the general identities of a web document, be it an... more

In this paper, we propose a hybrid approach of Arabic scripts web page language identification based on decision tree and ARTMAP approaches. We use the decision tree approach to find the general identities of a web document, be it an Arabic script-based or a non-Arabic-based. Then, we use the selected representations of identified pages from the decision tree approach as

We deal in this paper with the problem of automatically generating the style and the layout of web pages and web sites in a real world application where many web sites are considered. One of the main difficulty is to take into account the... more

We deal in this paper with the problem of automatically generating the style and the layout of web pages and web sites in a real world application where many web sites are considered. One of the main difficulty is to take into account the user preferences which are crucial in web sites design. We propose the use of an interactive genetic algorithm which generates solutions (styles, layouts) and which lets the user select the solutions that he favors based on their graphical representation. Two encodings have been defined for this problem, one linear and fixed length encoding for representing the style, and one variable length encoding based on HTML tables syntax for the layout. We present the results obtained by our method and the analysis of users behavior.

This paper presents an information retrieval methodology which uses Formal Concept Analysis in conjunction with semantics to provide contextual answers to users' queries. User formulates a query on a set of heterogeneous data sources.... more

This paper presents an information retrieval methodology which uses Formal Concept Analysis in conjunction with semantics to provide contextual answers to users' queries. User formulates a query on a set of heterogeneous data sources. This set is semantically unified by the proposed notion conceptual context. A context can be global: it defines a semantic space the user can query -or instantaneous-it defines the current position of the user in the semantic space. Our methodology consists first in a pre-treatment providing the global conceptual context and then in an online contextual processing of users' requests, associated to an instantaneous context. This methodology can be applied to heterogeneous data sources such as web pages, databases, email, personal documents and images, etc. One interest of our approach is to perform a more relevant and refined information retrieval and contextual navigation, closer to the users' expectation.

As the World Wide Web continues to grow, so does the need for effective approaches to processing users' queries that retrieve the most relevant information. Most search engines provide the user with many web pages, but at varying levels... more

As the World Wide Web continues to grow, so does the need for effective approaches to processing users' queries that retrieve the most relevant information. Most search engines provide the user with many web pages, but at varying levels of relevancy. The Semantic Web has been proposed to retrieve and use more semantic information from the web. However, the capture and processing of semantic information is a difficult task because of the well-known problems that machines have with processing semantics. This research proposes a heuristic-based methodology for building context aware web queries. The methodology expands a user's query to identify possible word senses and then makes the query more relevant by restricting it using relevant information from the WordNet lexicon and the DARPA DAML library of domain ontologies. The methodology is implemented in a prototype. Initial testing of the prototype and comparison to results obtained from Google show that this heuristic based approach to processing queries can provide more relevant results to users, especially when query terms are ambiguous and/or when the methodology's heuristics are invoked.

In order to provide a portable and accurate typing system for the unambiguous characterization of pathogenic Escherichia coli isolates to the scientific community, we have constructed an online database for MultiLocus Sequence Typing of... more

In order to provide a portable and accurate typing system for the unambiguous characterization of pathogenic Escherichia coli isolates to the scientific community, we have constructed an online database for MultiLocus Sequence Typing of pathogenic E. coli (EcMLST) using current internet and open source technology. The system consists of an XML specification of the E. coli MLST system, and a set of perl modules defining the database tables and generating dynamic web pages for querying the database. It is implemented on a Sun server running the Apache web server. The underlying tier is the MySQL database system. Currently, the database contains nucleotide sequence data and annotated allelic profile data of 15 housekeeping genes for 600 representative E. coli isolates. Access to the central-held typing and epidemiology data is supported by parametric searching, full-text searching, as well as query interface links to the reference center of Shiga Toxin-producing E. coli (STEC, http://www.shigatox.net/stec). EcMLST has been used by public health laboratories and researchers for epidemiology and evolutionary studies. The system can be accessed at

Web archives preserve the history of born-digital content and oer great potential for sociologists, business analysts, and legal experts on intellectual property and compliance issues. Data quality is crucial for these purposes. Ideally,... more

Web archives preserve the history of born-digital content and oer great potential for sociologists, business analysts, and legal experts on intellectual property and compliance issues. Data quality is crucial for these purposes. Ideally, crawlers should gather sharp captures of entire Web sites, but the politeness etiquette and completeness requirement mandate very slow, long-duration crawling while Web sites undergo changes. This

The European Internet Accessibility Observatory (EIAO) project has developed an observatory for performing large scale automatic web accessibility evaluations of public sector web sites in Europe. The architecture includes a distributed... more

The European Internet Accessibility Observatory (EIAO) project has developed an observatory for performing large scale automatic web accessibility evaluations of public sector web sites in Europe. The architecture includes a distributed web crawler that crawls web sites for links until either a given budget of web pages have been identified or the web site has been crawled exhaustively. Subsequently, a uniform random subset of the crawled web pages is sampled and sent for accessibility evaluation. The evaluation results are stored in a Resource Description Format (RDF) database that later is loaded into the EIAO data warehouse using an Extract-Transform-Load (ETL) tool. The aggregated indicator results in the data warehouse are finally presented in a Plone based online reporting tool. This paper describes the final version of the EIAO architecture and outlines some of the technical and architectural challenges that the project faced, and the solutions developed towards building a system capable of regular large-scale accessibility evaluations with sufficient capacity and stability. It also outlines some possible future architectural improvements.

Web pages are designed to be read by people, not machines. Consequently, searching and reusing information on the Web is a difficult task without human participation. Adding semantics (i.e meaning) to a Web page would help machines to... more

Web pages are designed to be read by people, not machines. Consequently, searching and reusing information on the Web is a difficult task without human participation. Adding semantics (i.e meaning) to a Web page would help machines to understand Web contents and better support the Web search process. One of the latest developments in this field is Google's Rich Snippets, a service for Web site owners to add semantics to their Web pages. In this paper we provide an approach to automatically annotate a Web page with Rich Snippets RDFa tags. Exploiting several heuristics and a named entity recognition technique, our method is capable of recognizing and annotating a subset of Rich Snippets' vocabulary, i.e., all attributes of its Review concept, and the names of Person and Organization concepts. We implemented an on-line service and evaluated the accuracy of the approach on real E-commerce Web sites.

The internet has become one of the most important means of communication in all areas of our life. In the paper we focused on central and local government bodies and their attitude towards information and communication technology. By... more

The internet has become one of the most important means of communication in all areas of our life. In the paper we focused on central and local government bodies and their attitude towards information and communication technology. By analysing web pages, inquiring public servants and testing the responses on citizens' questions we tried to discover influences of Internet on better informing of citizens, their participation in making decisions of public interest and communication between citizens and central and local government bodies.

A new multidisciplinary design and entrepreneurship course is being offered at Bilkent University. The course entitled "Innovative Product Design and Development I-II" is open to senior students of six different departments. It... more

A new multidisciplinary design and entrepreneurship course is being offered at Bilkent University. The course entitled "Innovative Product Design and Development I-II" is open to senior students of six different departments. It is a two-semester course that brings together students from Departments of Computer Engineering, Economics, Electrical Engineering, Graphic Design, Industrial Engineering, and Management. A team of six professors, one

This case study reports the investigations into the feasibility and reliability of calculating impact factors for web sites, called Web Impact F actors (Web-IF ). The study analyses a selection of seven small and medium scale national and... more

This case study reports the investigations into the feasibility and reliability of calculating impact factors for web sites, called Web Impact F actors (Web-IF ). The study analyses a selection of seven small and medium scale national and four large web domains as well as six institutional web sites over a series of snapshots taken of the web during a month. The data isolation and calculation methods are described and the tests discussed. The results thus far demonstrate that Web-IF s are calculable with high con dence for national and sector domains whilst institutional Web-IF s should be approached with caution. The data isolation method makes use of sets of inverted but logically identical Boolean set operations and their mean values in order to generate the impact factors associated with internal-(self-) link web pages and external-link web pages. Their logical sum is assumed to constitute the workable frequency of web pages linking up to the web location in question. The logical operations are necessary to overcome the variations in retrieval outcome produced by the AltaVista search engine.

We describe Syskill & Webert, a software agent that learnsto rate pages on the World Wide Web (WWW), decidingwhat pages might interest a user. The user rates exploredpages on a three point scale, and Syskill & Webert learnsa user profile... more

We describe Syskill & Webert, a software agent that learnsto rate pages on the World Wide Web (WWW), decidingwhat pages might interest a user. The user rates exploredpages on a three point scale, and Syskill & Webert learnsa user profile by analyzing the information on each page.The user profile can be used in two ways. First, it can beused to

Modern web search engines are expected to return the top-k results efficiently. Although many dynamic index pruning strategies have been proposed for efficient top-k computation, most of them are prone to ignoring some especially... more

Modern web search engines are expected to return the top-k results efficiently. Although many dynamic index pruning strategies have been proposed for efficient top-k computation, most of them are prone to ignoring some especially important factors in ranking functions, such as term-proximity (the distance relationship between query terms in a document). In our recent work [Zhu, M., Shi, S., Li,

In this paper we present an English grammar and style checker for non-native English speakers. The main characteristic of this checker is the use of an Internet search engine. As the number of web pages written in English is immense, the... more

In this paper we present an English grammar and style checker for non-native English speakers. The main characteristic of this checker is the use of an Internet search engine. As the number of web pages written in English is immense, the system hypothesizes that a piece of text not found on the Web is probably badly written. The system also hypothesizes that the Web will provide examples of how the content of the text segment can be expressed in a gramatical and idiomatic way. So, after the checker warns the user about the odd character of a text segment, the Internet engine searches for contexts that will be helpful for the user to decide whether he/she corrects the segment or not. By means of a search engine, the checker also suggests the writer to use expressions which are more frequent on theWeb other than the expression he/she actually wrote. Although the system is currently being developed for teachers of the Open University of Catalonia, the checker can also be useful for sec...

This paper introduces the concept of a classification tool for Web pages called WebClassify, which uses modified naive Bayesian algorithm with multinomial model to classify pages into various categories. The tool starts the classification... more

This paper introduces the concept of a classification tool for Web pages called WebClassify, which uses modified naive Bayesian algorithm with multinomial model to classify pages into various categories. The tool starts the classification from downloading training Web text from Internet, preparing the hypertext for mining, and then storing Web data in a local database. The paper also gives an

This paper attempts to provide insight as to how to guarantee a statement like: My PHP script produces WML. To expand a little, the emphasis is to ensure that a script always produces a valid WML page. The context is where pages in a... more

This paper attempts to provide insight as to how to guarantee a statement like: My PHP script produces WML. To expand a little, the emphasis is to ensure that a script always produces a valid WML page. The context is where pages in a web-site are being created by an embedded scripting language (like PHP, ASP, Perl) and also that the resulting pages are to conform to a strict tagged markup scheme like WML or XHTML. Although there are validators for static pages there is nothing available to check that a page containing embedded scripting will (always) generate valid documents. What is required is a validator for dynamic web pages.

We have implemented a browser companion called PadPrints that dynamically builds a graphical history-map of visited web pages. PadPrints relies on Pad++, a zooming user interface (ZUI) development substrate, to display the history-map... more

We have implemented a browser companion called PadPrints that dynamically builds a graphical history-map of visited web pages. PadPrints relies on Pad++, a zooming user interface (ZUI) development substrate, to display the history-map using minimal screen space. PadPrints functions in conjunction with a traditional web browser but without requiring any browser modifications. We performed two usability studies of PadPrints. The first addressed general navigation effectiveness. The second focused on history-related aspects of navigation. In tasks requiring returns to prior pages, users of PadPrints completed tasks in 61.2% of the time required by users of the same browser without PadPrints. We also observed significant decreases in the number of pages accessed when using PadPrints. Users found browsing with PadPrints more satisfying than using Netscape alone.

User Navigation Behavior Mining (UNBM) mainly studies the problems of extracting the interesting user access patterns from user access sequences (UAS), which are usually used for user access prediction and web page recommendation. Through... more

User Navigation Behavior Mining (UNBM) mainly studies the problems of extracting the interesting user access patterns from user access sequences (UAS), which are usually used for user access prediction and web page recommendation. Through analyzing the real world web data, we find most of user access sequences carrying hybrid features of different patterns, rather than a single one. Therefore, the methods that categorize one access sequence into a single pattern, can hardly obtain good quality results. Due to this problem, we propose a multi-task learning approach based on multiple data domain description model (MDDD), which simultaneously captures correlations among patterns and allowing categorizing one UAS into more than one patterns.The experimental results show that our method achieves high performances in both precision and recall by virtue of using MDDD model.

Using Internet Archive's Wayback Machine, a random sample of websites from 1997-2002 were retrospectively analyzed for effects that technology has on accessibility for persons with disabilities and compared to government websites.... more

Using Internet Archive's Wayback Machine, a random sample of websites from 1997-2002 were retrospectively analyzed for effects that technology has on accessibility for persons with disabilities and compared to government websites. Analysis of Variance (ANOVA) and Tukey's HSD were used to determine differences among years. Random websites become progressively inaccessible through the years (p<0.0001) [as shown by increasing Web Accessibility Barrier (WAB) scores], while complexity of the websites increased through the years (p<0.0001). Pearson's correlation (r) was performed to correlate accessibility and complexity: r=0.463 (p<0.01). Government websites remain accessible while increasing in complexity: r=0.14 (p<0.041). It is concluded that increasing complexity, oftentimes caused by adding new technology to a Web page, inadvertently contributes to increasing barriers to accessibility for persons with disabilities.

Today, many organizations maintain a variety of systems and databases in a complex ad-hoc architecture that does not seem to fulfill the needs for company-wide unstructured information management in business processes, business functions,... more

Today, many organizations maintain a variety of systems and databases in a complex ad-hoc architecture that does not seem to fulfill the needs for company-wide unstructured information management in business processes, business functions, and the extended enterprise. We describe a framework to implement Enterprise Content Management (ECM) in order to address this problem. ECM refers to the technologies, tools, and methods used to capture, manage, store, preserve, and deliver content (e.g. documents, graphics, drawings, web pages) across an enterprise. The framework helps to select content objects that can be brought under ECM to create business value and guide the IT investments needed to realize ECM. The framework was tested in a large high tech organization.

Determining similarity between web pages is a key factor for the success of many web mining applications such as recommendation systems and adaptive web sites. In this paper, we propose a new hybrid method of distributed learning automata... more

Determining similarity between web pages is a key factor for the success of many web mining applications such as recommendation systems and adaptive web sites. In this paper, we propose a new hybrid method of distributed learning automata and graph partitioning to determine similarity between web pages using the web usage data. The idea of the proposed method is that if different users request a couple of pages together, then these pages are likely to correspond to the same information needs therefore can be considered similar. In the proposed method, a learning automaton is assigned to each web page and tries to find the similarities between that page and other pages of a web site utilizing the results of a graph partitioning algorithm performed on the graph of the web site. Computer experiments show that the proposed method outperforms Hebbian algorithm and the only learning automata based method reported in the literature.

This paper describes an attack concept termed Drive-by Pharming where an attacker sets up a web page that, when simply viewed by the victim (on a JavaScript-enabled browser), attempts to change the DNS server settings on the victim’s home... more

This paper describes an attack concept termed Drive-by Pharming where an attacker sets up a web page that, when simply viewed by the victim (on a JavaScript-enabled browser), attempts to change the DNS server settings on the victim’s home broadband router. As a result, future DNS queries are resolved by a DNS server of the attacker’s choice. The attacker can direct the victim’s Internet traffic and point the victim to the attacker’s own web sites regardless of what domain the victim thinks he is actually going to, potentially leading to the compromise of the victim’s credentials. The same attack methodology can be used to make other changes to the router, like replacing its firmware. Routers could then host malicious web pages or engage in click fraud. Since the attack is mounted through viewing a web page, it does not require the attacker to have any physical proximity to the victim nor does it require the explicit download of traditional malicious software. The attack works under the reasonable assumption that the victim has not changed the default management password on their broadband router.

Large web search engines are now processing billions of queries per day. Most of these queries are interactive in nature, requiring a response in fractions of a second. However, there are also a number of important scenarios where large... more

Large web search engines are now processing billions of queries per day. Most of these queries are interactive in nature, requiring a response in fractions of a second. However, there are also a number of important scenarios where large batches of queries are submitted for various web mining and system optimization tasks that do not require an immediate response. Given the significant cost of executing search queries over billions of web pages, it is a natural question to ask if such batches of queries can be more efficiently executed than interactive queries.

Web is a vast data repository. By mining from this data efficiently, we can gain valuable knowledge. Unfortunately, in addition to useful content there are also many Web documents considered harmful (e.g. pornography, terrorism, illegal... more

Web is a vast data repository. By mining from this data efficiently, we can gain valuable knowledge. Unfortunately, in addition to useful content there are also many Web documents considered harmful (e.g. pornography, terrorism, illegal drugs). Web mining that includes three main areas – content, structure, and usage mining – may help us detect and eliminate these sites. In this paper, we concentrate on applications of Web content and Web structure mining. First, we introduce a system for detection of pornographic textual Web pages. We discuss its classification methods and depict its architecture. Second, we present analysis of relations among Czech academic computer science Web sites. We give an overview of ranking algorithms and determine importance of the sites we analyzed.

We describe here a methodology to combine two different techniques for Semantic Relation Extraction from texts. On the one hand, generic lexicosyntactic patterns are applied to the linguistically analyzed corpus to detect a first set of... more

We describe here a methodology to combine two different techniques for Semantic Relation Extraction from texts. On the one hand, generic lexicosyntactic patterns are applied to the linguistically analyzed corpus to detect a first set of pairs of co-occurring words, possibly involved in "syntagmatic" relations. On the other hand, a statistical unsupervised association system is used to obtain a second set of pairs of "distributionally similar" terms, that appear to occur in similar contexts, thus possibly involved in "paradigmatic" relations. The approach aims at learning ontological information by filtering the candidate relations obtained through generic lexico-syntactic patterns and by labelling the anonymous relations obtained through the statistical system. The resulting set of relations can be used to enrich existing ontologies and for semantic annotation of documents or web pages.

Web accessibility rules, i.e., the conditions to be met by Web sites in order to be considered accessible for all, can be (partially) checked automatically in many different ways. Many Web accessibility evaluators have been developed... more

Web accessibility rules, i.e., the conditions to be met by Web sites in order to be considered accessible for all, can be (partially) checked automatically in many different ways. Many Web accessibility evaluators have been developed during the last years. For applying the W3C guidelines, their programmers have to apply subjective criteria, thus leading to different interpretations of these guidelines. As a result, it is easy to obtain different evaluation results when different evaluation tools are applied to a common sample page. However, accessibility rules can be better expressed formally and declaratively in rules that assert conditions over the markup. We have found that XSLT can be used to represent templates addressing many accessibility rules involving the markup of Web pages. Even more, we have found that some specific conditions relaying in the prose of the XHTML specification not previously formalized in the XHTML grammar (the official DTD or XML Schemas) could also be formalized in XSLT rules as well. Thus, we have developed WAEX as a Web Accessibility Evaluator in a single XSLT file. Such XSLT file contains 70+ singular accessibility and XHTML-specific rules not previously addressed by the official DTDs or Schemas from W3C.

Researchers in the ontology-design field have developed the content for ontologies in many domain areas. Recently, ontologies have become increasingly common on the World- Wide Web where they provide semantics for annotations in Web... more

Researchers in the ontology-design field have developed the content for ontologies in many domain areas. Recently, ontologies have become increasingly common on the World- Wide Web where they provide semantics for annotations in Web pages. This distributed nature of ontology development has led to a large number of ontologies covering overlapping domains. In order for these ontologies to be reused,

In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engines, to drive traffic to certain pages for fun or profit. This paper... more

In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engines, to drive traffic to certain pages for fun or profit. This paper considers some previously-undescribed techniques for automatically detecting spam pages, examines the effectiveness of these techniques in isolation and when aggregated using classification algorithms. When combined, our heuristics correctly identify 2,037 (86.2%) of the 2,364 spam pages (13.8%) in our judged collection of 17,168 pages, while misidentifying 526 spam and non-spam pages (3.1%).

Abstract: ASP .NET web applications typically employ server controls to provide dynamic web pages, and data-bound server controls to display and maintain database data. Most developers use default properties of ASP .NET server controls... more

Abstract: ASP .NET web applications typically employ server controls to provide dynamic web pages, and data-bound server controls to display and maintain database data. Most developers use default properties of ASP .NET server controls when developing web applications, which allows for rapid development of workable applications. However, creating a high-performance, multi-user, and scalable web application requires enhancement of server controls using custom-made code. In this empirical study we ...

A mobile user may voluntarily disconnect itself from the Web server to save its battery life and avoid high communication prices. To allow Web pages to be updated while the mobile user is disconnected from the Web server, updates can be... more

A mobile user may voluntarily disconnect itself from the Web server to save its battery life and avoid high communication prices. To allow Web pages to be updated while the mobile user is disconnected from the Web server, updates can be staged in the mobile unit and propagated back to the Web server upon reconnection. We analyze algorithms for supporting disconnected write operations and develop a performance model which helps identify the optimal length of the disconnection period under which the cost of update propagation is minimized. The analysis result is particularly applicable to Web applications which allow wireless mobile users to modify Web contents while on the move. We also show how the result can be applied to real-time Web applications such that the mobile user can determine the longest disconnection period such that it can still propagate updates to the server before the deadline with a minimum communication cost

Nowadays, people frequently use different keyword-based web search engines to find the information they need on the Web. However, many words are polysemous and, when these words are used to query a search engine, its output usually... more

Nowadays, people frequently use different keyword-based web search engines to find the information they need on the Web. However, many words are polysemous and, when these words are used to query a search engine, its output usually includes links to web pages referring to their different meanings. Besides, results with different meanings are mixed up, which makes the task of finding the relevant information difficult for the users, especially if the user-intended meanings behind the input keywords are not among the most popular on the Web.

We present a detailed statistical analysis of the characteristics of partial Web graphs obtained by sub-sampling a large collection of Web pages. We show that in general the macroscopic properties of the Web are better represented by a... more

We present a detailed statistical analysis of the characteristics of partial Web graphs obtained by sub-sampling a large collection of Web pages. We show that in general the macroscopic properties of the Web are better represented by a shallow exploration of a large number of sites than by a deep exploration of a limited set of sites. We also describe and quantify the bias induced by the different sampling strategies, and show that it can be significant even if the sample covers a large fraction of the collection. ... 1. INTRODUCTION The number of Web pages that ...

Nowadays more and more business activities are operated through web and the web plays a vital role in the interests of both businesses and their shareholders. However, the very good features of web such as its popularity, accessibility... more

Nowadays more and more business activities are operated through web and the web plays a vital role in the interests of both businesses and their shareholders. However, the very good features of web such as its popularity, accessibility and openness, has provided more opportunities for security breaches by malicious users. That is why the rate of successful attacks on web and web applications are increasing. Many approaches have been introduced so far to reduce the rate of successful attacks of many kinds. Any technique that can detect these vulnerabilities and mitigate the security problems of web applications is useful to organizations seeking for more reliability from the security viewpoint. In this paper we first introduce the control flow tampering attack, which is one of the notable attacks against web applications, and present our approach for countering this attack using web application firewall.

Background: Publication records and citation indices often are used to evaluate academic performance. For this reason, obtaining or computing them accurately is important. This can be difficult, largely due to a lack of complete knowledge... more

Background: Publication records and citation indices often are used to evaluate academic performance. For this reason, obtaining or computing them accurately is important. This can be difficult, largely due to a lack of complete knowledge of an individual’s publication list and/or lack of time available to manually obtain or construct the publication-citation record. While online publication search engines have somewhat addressed these problems, using raw search results can yield inaccurate estimates of publication-citation records and citation indices. Methodology: In this paper, we present a new, automated method that produces estimates of an individual’s publicationcitation record from an individual’s name and a set of domain-specific vocabulary that may occur in the individual’s publication titles. Because this vocabulary can be harvested directly from a research web page or online (partial) publication list, our method delivers an easy way to obtain estimates of a publication-c...

Web advertising, one of the major sources of income for a large number of Web sites, is aimed at suggesting products and services to the ever growing population of Internet users. A significant part of Web advertising consists of textual... more

Web advertising, one of the major sources of income for a large number of Web sites, is aimed at suggesting products and services to the ever growing population of Internet users. A significant part of Web advertising consists of textual ads, the ubiquitous short text messages usually marked as sponsored links. There are two primary channels for distributing ads: Sponsored Search (or Paid Search Advertising) and Content Match (or Contextual Advertising). In this paper, we concentrate on the latter, which is devoted to display commercial ads within the content of third-party Web pages. In the literature, several approaches estimated the ad relevance based on co-occurrence of the same words or phrases within the ad and within the page. However, targeting mechanisms based solely on phrases found within the text of the page can lead to problems. In order to solve these problems, matching mechanisms that combine a semantic phase with the traditional syntactic phase have been proposed. We are mainly interested in studying the impact of the syntactic phase on contextual advertising. In particular, we perform a comparative study on text summarization in contextual advertising. Results show that implementing effective text summarization techniques may help to improve the corresponding contextual advertising system.

The aim of this research study was to evaluate the websites of Jordan's universities from the usability perspective. Two online automated tools, namely: html toolbox and web page analyze were used along with a questionnaire directed... more

The aim of this research study was to evaluate the websites of Jordan's universities from the usability perspective. Two online automated tools, namely: html toolbox and web page analyze were used along with a questionnaire directed towards users of these websites. Tools were used to measure the websites internal attributes which can not be perceived by users, such as html code error, download time, and size of html page. The questionnaire was developed and designed based on 23 usability criteria divided into 5 categories. Each category deals with one usability aspect. The results showed that the overall usability level of the studied Websites is acceptable. However, there are some weaknesses in some aspects of the design, interface, and performances. Suggestions are provided in the study to enhance the usability of these websites.

A social network is a set of people (or organizations or other social entities) connected by a set of social relationships, such as friendship, co-working or information exchange. Social network analysis focuses on the analysis of... more

A social network is a set of people (or organizations or other social entities) connected by a set of social relationships, such as friendship, co-working or information exchange. Social network analysis focuses on the analysis of patterns of relationships among people, organizations, states and such social entities. Social network analysis provides both a visual and a mathematical analysis of human relationships. Web can also be considered as a social network. Social networks are formed between Web pages by hyperlinking to other Web pages. In this paper a state of the art survey of the works done on social network analysis ranging from pure mathematical analyses in graphs to analyzing the social networks in Semantic Web is given. The main goal is to provide a road map for researchers working on different aspects of Social Network Analysis.

This study explored e-learning problems and solutions reported by 223 students with disabilities, 58 campus disability service providers, 28 professors, and 33 e-learning professionals from Canadian colleges and universities. All four... more

This study explored e-learning problems and solutions reported by 223 students with disabilities, 58 campus disability service providers, 28 professors, and 33 e-learning professionals from Canadian colleges and universities. All four groups indicated, via online questionnaires, problems with: accessibility of websites and course/learning management systems (CMS); accessibility of digital audio and video; inflexible time limits built into online exams; PowerPoint/data projection during lectures; course materials in PDF, and lack of needed adaptive technologies. Students also mentioned technical difficulties using e-learning and connecting to websites and CMS, problems downloading and opening files, web pages that would not load, video clips taking too long to download, poor use of e-learning by professors and their own lack of knowledge working with elearning. Disability service providers, too, mentioned the poor use of e-learning by professors as well as poor accessibility of course notes and materials in many formats. E-learning professionals noted difficulties with inaccessible course notes and materials. Professors identified mainly problems raised by the other groups. Sixtyseven percent of students, 53% of service providers, 36% of e-learning professionals and 35% of professors indicated that at least one of their three e-learning problems remained unresolved. We discuss how the different roles and perspectives of the four participant groups influence their views, and make recommendations addressing identified common e-learning problems.

– Malaysia is a land and structure deformation active country due to long term rainy season which is common phenomena in monsoon tropical region. To secure this hazard under control, Land Stability Monitoring System (LandSMS), a web-based... more

– Malaysia is a land and structure deformation active country due to long term rainy season which is common phenomena in monsoon tropical region. To secure this hazard under control, Land Stability Monitoring System (LandSMS), a web-based deformation ...