Web Users and Web Document Classifiers (original) (raw)
Related papers
Web Users and Web Document Classifiers: Emergent Cognitive Phenomena
… . http://www. cs. ucd. ie/staff/nick/ …
Abstract: We are currently developing an approach to introduce heuristics based on facts and phenomena from cognitive science in the design of automatic Web document classifiers. The analysis of the results of the study has revealed new and emergent cognitive phenomena ...
Cognitive modelling and Web search: Some heuristics and insights
The paper presents an approach to introduce heuristics based on facts and phenomena from cognitive science in the design of web-based interface systems. The interface to the user employs a web-document genre classifier, inspired by the "word-length and word-frequency" effect. The result is faster and efficient document search, classification and retrieval. The paper discusses the encountered insights like least-effort strategy in user assessment of web documents, the implicit user expectation for interaction with "intelligent" interface, as well as the increasing demand for document summaries. Some interface design guidelines, falling out from the current study, are outlined.
Java-Servlet Technology for Building New Web Document Classifiers
2008
The paper presents the working architecture of an existing module for web search, classification and presentation of documents with different lexical features. Unlike the existing search engines, the module discriminates aspects of the style of the document-readability, explanation, illustrations and summarization. The visualization component enhances perceptual support of these features when retrieval of searched documents take place. We present the current state of research on the classifier and give examples from the educational, medical and environmental domains.
A Comparative Study of Web Document Classification Approaches
With the continuous growth in the World Wide Web, the need arises for indexing and classifying Web documents for fast retrieval of relevant information accessible through it. Traditionally, classification has been accomplished manually. A recent study revealed that, there exist about 29.7 billion pages on the Web in February 2007, which means that manual classification would be infeasible and reflects the need for automated techniques for accomplishing this task. Though Web documents should follow the basic definitions on Hypertext Markup Language, they are known to be unstructured or semi-structured, which imposes new challenges to Web classification especially in the area of feature selection. The objective of this paper is to investigate Web document classification approaches, and compare between recent techniques proved promising in literature within this field. Traditionally, automatic classification is performed by extracting information for representing a web document from th...
Automated subject classification of textual web pages, for browsing
2005
With the exponential growth of the World Wide Web, automated subject classification of Web pages has become a major research issue in information and computer sciences. Organizing Web pages into a hierarchical structure for subject browsing is gaining more recognition as an important tool in information-seeking processes.
ACIRD: Intelligent Internet Document Organization and Retrieval
IEEE Transactions on Knowledge and Data Engineering, 2002
Classifier for the Internet Resource Discovery (ACIRD), which uses machine learning techniques to organize and retrieve Internet documents. ACIRD consists of a knowledge acquisition process, document classifier and two-phase search engine. T he knowledge acquisition process of ACIRD automatically learns classification knowledge from classified Internet documents. The document classifier applies learned classification knowledge to classify newly collected Internet documents into one or more classes.
PROJECT MLEXAI: APPLYING MACHINE LEARNING TO WEB DOCUMENT CLASSIFICATION
We present work on project MLExAI, funded by the National Science Foundation with a goal of unifying the artificial intelligence (AI) course around the theme of machine learning. Our work involves the development, implementation, and testing of an adaptable framework for the presentation of core AI topics that emphasizes the relationship between AI and computer science. A suite of adaptable hands-on laboratory projects that can be closely integrated into a one-term AI course and which would supplement introductory AI texts has been developed. The paper focuses on one of these projects, how it meets our goal, and presents our experiences using it. The project involves the development of a learning system for web document classification. Students investigate the process of classifying hypertext documents, called tagging, and apply machine learning techniques and data mining tools for automatic tagging. A summary of our experiences using the projects during four course offerings over the last two years are also presented.
A Decision Support Method for Finding Appropriate Information on the Web Documents
2015
Today, the Web has been expanded dramatically and hence, looking up desired information in a vast ocean of available data is a difficult task for users. So, we need methods which using a targeted search, help users in making decisions for choosing the appropriate documents according the desired content. In presented information retrieval technique, web documents are introduced to the user as search results. To resolve this problem can be used semantic extraction. That conclusion is valid for extraction if related subject pages identified initially. Semantic extraction ontology is one of these methods. This paper puts to evaluation the extent of relationship between a Semi structured HTML and ontology using some statistical techniques. Then with calculate the density of the document and compared with the expected density ontology in an acceptable limitation, documents related with ontology predicted. Then with calculate the density of the document and compared with the expected densi...