Search Engines Research Papers - Academia.edu (original) (raw)
www.egitim.com web sitelerini eğitsel bir model yardımıyla indexler ve kullanıcıların tercih ettiği sınıf seviyelerinde özelleştirilmiş arama sonuçları sunar. İleri teknoloji ve gelişmiş algoritmaları ile yeni bir çığır açan dikey arama... more
www.egitim.com web sitelerini eğitsel bir model yardımıyla indexler ve kullanıcıların tercih ettiği sınıf seviyelerinde özelleştirilmiş arama sonuçları sunar. İleri teknoloji ve gelişmiş algoritmaları ile yeni bir çığır açan dikey arama motorudur.
With increasingly higher numbers of non-English language web searchers the problems of efficient handling of non-English Web documents and user queries are becoming major issues for search engines. The main aim of this review paper is to... more
With increasingly higher numbers of non-English language web searchers the problems of efficient handling of non-English Web documents and user queries are becoming major issues for search engines. The main aim of this review paper is to make researchers aware of the existing problems in monolingual non-English Web retrieval by providing an overview of open issues. A significant number of
Multimedia data is generally stored in compressed form in order to efficiently utilize the available storage facilities. Access to archives is dependent on our ability to browse compressed multimedia information-retrieval and tracking... more
Multimedia data is generally stored in compressed form in order to efficiently utilize the available storage facilities. Access to archives is dependent on our ability to browse compressed multimedia information-retrieval and tracking from coded video databases. In this paper, a novel visual search engine for video retrieval and tracking from compressed multimedia databases is proposed. The goal of the project is the implementation of a visual browser that operates in a distributed environment where users initiate video searches and retrieve relevant video information simultaneously from multiple video archives. Being presented with a query in the form of template images of objects, the system operates on the compressed video to find the images or video sequences where those objects are present and their positions in the image. Upon user's request, the system will decompress and display only the video sequences of interest.
- by Nikos Korfiatis and +1
- •
- Programming Languages, Information Retrieval, Semantics, Metadata
Information retrieval technology has been central to the success of the Web. For semantic web documents or annotations to have an impact, they will have to be compatible with Web based indexing and retrieval technology. We discuss some of... more
Information retrieval technology has been central to the success of the Web. For semantic web documents or annotations to have an impact, they will have to be compatible with Web based indexing and retrieval technology. We discuss some of the underlying problems and issues central to extending information retrieval systems to handle annotations in semantic web languages. We also describe three prototype systems that we have implemented to explore these ideas.
On May 18, 2009, British computer scientist Stephen Wolfram officially launched a new search product called Wolfram j Alpha (WA). This launch was preceded by months of speculation and hype online about exactly what WA would be and how it... more
On May 18, 2009, British computer scientist Stephen Wolfram
officially launched a new search product called Wolfram
j
Alpha
(WA). This launch was preceded by months of speculation and
hype online about exactly what WA would be and how it would
compare to Google and other search engines. This article will
explore the basic features of WA, show some example queries
and results, and discuss the usefulness and limitations of this
new tool.
KEYWORDS Computation, Internet, search engines, Wolfram
|Alpha
Purpose – To explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH). Design/methodology/approach – The Web... more
Purpose – To explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH). Design/methodology/approach – The Web intelligence techniques in this study are a combination of link analysis (web structure mining), web server log file analysis (web usage mining), and text analysis (web content mining), utilizing the power of commercial search engines and drawing upon the information science fields of bibliometrics and webometrics. LexiURL is a computer program designed to calculate summary statistics for lists of links or URLs. Its output is a series of standard reports, for example listing and counting all of the different domain names in the data. Findings – Link data, when analysed together with user transaction log files (i.e., Web referring domains) can provide insights into who is using a digital library and when, and who could be using the digital library if...
The growing digitalization of scientific research practices is reflected in the content that academic and governmental institutions put on their websites, many of which are not optimized so that their contents reach visibility in search... more
The growing digitalization of scientific research practices is reflected in the content that academic and governmental institutions put on their websites, many of which are not optimized so that their contents reach visibility in search results of Google. Through the mapping of search engine results, this article analyzes the visibility of Ibero-American governmental, educational and research institutions in the results of Google in relation to a group of keywords related to the areas of Science, Research and Innovation. By analyzing the results of these pages in the search results in a specific period we can determine that, although few exceptions, the algorithms used by Google increase the visibility of educational and research institutions in Ibero-America (IA) along with those of each country in function of the national search option offered by the search engine. The indicators obtained both for web presence and web visibility indicate that pages that appear more frequently in the first positions in IA countries are not owned by national institutions, but from other countries. Moreover, we have observed that governmental and educational institutions are most visible than research institutions. While previously social networks are not so far popular for this type of institutions, they are recently gaining positions. However, this study is exploratory and a longitudinal research would eliminate fluctuations of web data.
- by Simone Belli and +1
- •
- Scientometrics, Website Visibility, Search Engines, Google
Die Trefferpräsentation in Suchmaschinen wird im Kontext der Qualitätsmessung bei Suchmaschinen betrachtet. Suchmaschinen verändern ihre Ergebnispräsentation durch die Einbindung von Treffern, die über HTML-Seiten aus dem Web hinausgehen.... more
Die Trefferpräsentation in Suchmaschinen wird im Kontext der Qualitätsmessung bei Suchmaschinen betrachtet. Suchmaschinen verändern ihre Ergebnispräsentation durch die Einbindung von Treffern, die über HTML-Seiten aus dem Web hinausgehen. Dies können Multimedia-Inhalte sein, aber auch weitere Textkollektionen, die einer besonderen Behandlung bedürfen. Zwei Ansätze stehen sich bei der Einbindung gegenüber: das Zusammenführen der Treffer aus allen Quellen in einer Ergebnisliste und die abgegrenzte, aber gemeinsam präsentierte Darstellung der Ergebnisse. Diese neuen Formen der Ergebnispräsentation haben Auswirkungen auf die Evaluierung der Treffer. Neue Formen der Trefferpräsentation eignen sich auch für (thematische) Portale und können eine bedeutende Hilfe für die Nutzer darstellen.
Google Scholar is one of the major academic search engines but its ranking algorithm for academic articles is unknown. We performed the first steps to reverse-engineering Google Scholar’s ranking algorithm and present the results in... more
Google Scholar is one of the major academic search
engines but its ranking algorithm for academic articles is
unknown. We performed the first steps to reverse-engineering
Google Scholar’s ranking algorithm and present the results in
this research-in-progress paper. The results are: Citation counts
is the highest weighed factor in Google Scholar’s ranking
algorithm. Therefore, highly cited articles are found significantly
more often in higher positions than articles that have been cited
less often. As a consequence, Google Scholar seems to be more
suitable for finding standard literature than gems or articles by
authors advancing a new or different view from the mainstream.
However, interesting exceptions for some search queries
occurred. Moreover, the occurrence of a search term in an
article’s title seems to have a strong impact on the article’s
ranking. The impact of search term frequencies in an article’s
full text is weak. That means it makes no difference in an article’s
ranking if the article contains the query terms only once or
multiple times. It was further researched whether the name of an
author or journal has an impact on the ranking and whether
differences exist between the ranking algorithms of different
search modes that Google Scholar offers. The answer in both of
these cases was "yes". The results of our research may help
authors to optimize their articles for Google Scholar and enable
researchers to estimate the usefulness of Google Scholar with
respect to their search intention and hence the need to use
further academic search engines or databases.
Deep web refers to the hidden portion of the WWW (World Wide Web) which cannot be accessed directly. One of the important issues in the WWW is how to search the hidden Web. Several techniques have been proposed in order to address this... more
Deep web refers to the hidden portion of the WWW (World Wide Web) which cannot be accessed directly. One of the important issues in the WWW is how to search the hidden Web. Several techniques have been proposed in order to address this issue. In this paper, we have surveyed the current problems of retrieving information from hidden Web and proposed a solution to solve these problems using probability, iterative deepening search and graph theory.
PURPOSE To assess the popularity and problems associated with pay-per-click (PPC) schemes. Is there a link between income generated via PPC to the offering of free Internet searching? A paid placement service is also called... more
In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. Search engines are retrieve the efficient information. We collected data on the Internet from... more
In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. Search engines are retrieve the efficient information. We collected data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. The trends cited by the sources are consistent and point to exponential growth in the past and in the coming decade. Hence it is not surprising that about 85% of Internet users surveyed claim using search engines and search services to find specific information and users are not satisfied with the performance of the current generation of search engines; the slow retrieval speed, communication delays, and poor quality of retrieved results. Web agents, programs acting autonomously on some task, are already present in the form of spiders, crawler, and robots. Agents offer substantial benefits and hazards, and because of this, their development must invo...
Shape matching is one of the main research procedures performed by archaeologists. Real time 3D computer graphics, 3D digitisation and content based retrieval are technologies that can facilitate the automation of the shape matching... more
Shape matching is one of the main research procedures performed by archaeologists. Real time 3D computer graphics, 3D digitisation and content based retrieval are technologies that can facilitate the automation of the shape matching process. As pottery plays a significant role in understanding ancient societies, we focus on the development of compact shape descriptors that can be used for content based retrieval of complete or nearly complete 3D pottery replicas. In this work, we present shape descriptors that exploit the axial symmetry feature and attempt to enhance the archaeological study of pottery by using 3D graphics technologies. We evaluated the performance of the descriptors using a 3D pottery ground truth repository as a test bed. We created an experimental 3D pottery search engine and attempted to integrate a content based retrieval mechanism in a 3D virtual reality environment.
During the 1980’s the internet was introduced into the world. The internet as it was then, consisted of a web of linked computers, which enabled people to access information within the boundaries of the internet’s infrastructure. During... more
During the 1980’s the internet was introduced into the world. The internet as it was then, consisted of a web of linked computers, which enabled people to access information within the boundaries of the internet’s infrastructure. During the early years of the internet a lot of university computers around the world started to join the internet, focusing on the innovation of the internet and thereby exploring the possibilities of a global information and communication structure (Castells, 2001). Over the course of the last twenty years the internet has grown, developed and innovated beyond anyone’s wildest imagination. Billions of internet pages and applications have become available to us through the use of internet, making the internet the largest accumulation of data in the whole history of the human race. Coping with the immense number of pages and the incomprehensible amount of data are search engines. They are our guides to this vast and enormous cyberspace.
Mathematical modeling is an important step for developing many advanced technologies in various domains such as network security, data mining and etc… This lecture introduces a process that the speaker summarizes from his past practice... more
Mathematical modeling is an important step for developing many advanced technologies in various domains such as network security, data mining and etc… This lecture introduces a process that the speaker summarizes from his past practice of mathematical modeling and algorithmic solutions in IT industry, as an applied mathematician, algorithm specialist or software engineer , and even as an entrepreneur. A practical problem from DLP system will be used as an example for creating math models and providing algorithmic solutions.
Building an effective Information Retieval (IR) System for such a complex sector like a library, is indeed a challenging task. Creating this kind of applications, one must be aware of basic IR methodologies and structures used, as well as... more
Building an effective Information Retieval (IR) System for such a complex sector like a library, is indeed a challenging task. Creating this kind of applications, one must be aware of basic IR methodologies and structures used, as well as the User's Experience requirements. Combining different technologies for creating different kinds of applications has been one of the major problems in software reuse. However, in recent years, many frameworks that offer a complete suite for developing cross-platform applications have been developed. In this paper, we present a combination of breakthrough technologies and frameworks for developing a Python based RestAPI, an Administrator's side desktop application and a cross-platform application powered by Ionic, Angular, HTML and CSS, as the main IR tool of UCY's library.
Statechart diagrams have inherent complexity which keeps increasing every time the diagrams are modified. This complexity poses problems in comprehending statechart diagrams. The study of cognitive complexity has over the years provided... more
Statechart diagrams have inherent complexity which keeps increasing every time the diagrams are modified. This complexity poses problems in comprehending statechart diagrams. The study of cognitive complexity has over the years provided valuable information for the design of improved software systems. Researchers have proposed numerous metrics that have been used to measure and therefore control the complexity of software. However, there is inadequate literature related to cognitive complexity metrics that can apply to measure statechart diagrams. In this study, a literature survey of statechart diagrams is conducted to investigate if there are any gaps in the literature. Initially, a description of UML and statechart diagrams is presented, followed by the complexities associated with statechart diagrams and finally an analysis of existing cognitive complexity metrics and metrics related to statechart diagrams. Findings indicate that metrics that employ cognitive weights to measure statechart diagrams are lacking.
These complex and at times contradictory judgments emerge from 1) an online survey of more than 2,000 middle and high school teachers drawn from the Advanced Placement (AP) and National Writing Project (NWP) communities; and 2) a series... more
These complex and at times contradictory judgments emerge from 1) an online survey of more than 2,000 middle and high school teachers drawn from the Advanced Placement (AP) and National Writing Project (NWP) communities; and 2) a series of online and offline focus groups with middle and high school teachers and some of their students. The study was designed to explore teachers’ views of the ways today’s digital environment is shaping the research and writing habits of middle and high school students. Building on the Pew Internet Project’s prior work about how people use the internet and, especially, the information-saturated digital lives of teens, this research looks at teachers’ experiences and observations about how the rise of digital material affects the research skills of today’s students.
About 40% of current internet traffic and bandwidth consumption is due to web crawlers that retrieve pages for indexing by the different search engines. This traffic and bandwidth consumption will increase in future due to the exponential... more
About 40% of current internet traffic and bandwidth consumption is due to web crawlers that retrieve pages for indexing by the different search engines. This traffic and bandwidth consumption will increase in future due to the exponential growth of the web. We address this problem by introducing an efficient indexing system based on mobile crawlers. Our approach employs mobile agents to crawl/re-crawl the pages. These mobile agent based crawlers retrieve the pages, process them, compare their data, and then compress them before sending them to the search engine for indexing. Our results have shown that our approach is more efficient than the previous crawling techniques.
The spell-checking system assists to detect spelling-errors and advise the appropriate corrections in the word documents automatically. This paper is concerned with the implementation of an interactive Bangla spell checking tool, and... more
The spell-checking system assists to detect spelling-errors and advise the appropriate corrections in the word documents automatically. This paper is concerned with the implementation of an interactive Bangla spell checking tool, and demonstrating how to use it in a general-purpose search engine effectively. From the literature, it is seen that Bangla has complex grammatical and orthographical rules for spelling, and the study addressed different challenges in generating suggestions for Bangla phonetic-errors. The study also presented a combined technique of the string-matching algorithm and a reverse dictionary lookup method for detecting the typographical-errors as well as cognitive phonetic-errors in Bangla language. Both algorithms have been implemented into a search engine app for locating a misspelled word, and listing the suggested candidate-words. Finally, the intended word is chosen from the candidate words that were generated by the spell checker, and used it for searching in the search engine platform. Two-level of databases (the lexicon and secondary database) were used to design the proposed system. The lexicon database contains about 58,319 correctly spelled Bangla words, and a secondary database contains about 46,881 stemmed-key data.
- by Md. Mijanur Rahman and +1
- •
- Search Engines, Spell Checking
Search engines (e.g. Google.com, Yahoo.com, and Bing.com) have become the dominant model of online search. Large and small e-commerce provide built-in search capability to their visitors to examine the products they have. While most large... more
Search engines (e.g. Google.com, Yahoo.com, and Bing.com) have become the dominant model of online search. Large and small e-commerce provide built-in search capability to their visitors to examine the products they have. While most large business are able to hire the necessary skills to build advanced search engines, small online business still lack the ability to evaluate the results of their search engines, which means losing the opportunity to compete with larger business. The purpose of this paper is to build an open-source model that can measure the relevance of search results for online businesses as well as the accuracy of their underlined algorithms. We used data from a Kaggle.com competition in order to show our model running on real data.
The internet has developed into the number one port for communication and information. With the millions of websites available, search engines play a vital role in filtering information online. Over time, both search engines and technical... more
The internet has developed into the number one port for communication and information. With the millions of websites available, search engines play a vital role in filtering information online. Over time, both search engines and technical equipment have been changing: web-enabled devices have been getting smaller, new search engines were launched, new search features added, and the presentation of the search results has undergone modifications. Users have been changing their search habits, too, they have been getting more experienced in searching the web. Analytical search engine studies, e.g. logfile analyses, are numerous. User-focused studies of search engine behavior and user satisfaction, on the other hand, are still rare. However, certain scenarios, such as exploratory searches, and the users' subjective satisfaction may not be captured in those analyses. Nonetheless, the users' subjective opinions and skills eventually decide upon the achieved search performance and the success of a search engine. The last user-focused search engine studies in Germany are from more than a decade ago. This study investigates user search engine behavior and satisfaction from the perspective of internet users in Germany. An survey was conducted to learn about search engine usage frequency, preferred search engines, and priorities for search engine selection, and to assess specific search habits, such as preferred search query language, customization of the search engine language settings, and the necessity for repetitive searches and query rephrasing as perceived by the users. The survey also evaluated the users' satisfaction with the search results in general, the search results on the first page, as well as the results retrieved with German versus English search queries. Where possible, the paper compares the findings with the user-focused studies from 2003 and 2005.
Internet has become the first option to look for hotels when people travel, which makes it a great opportunity for hotels to have a web site with a booking engine included, which will offer comfort to their clients, and expand their sales... more
Internet has become the first option to look for hotels when people travel, which makes it a great opportunity for hotels to have a web site with a booking engine included, which will offer comfort to their clients, and expand their sales to a wider market. This paper presents a customisable online booking system, which facilitates the control of room bookings to administrators of small hotels, managing the sales from their own web site; and facilitates and guarantees the bookings for their potential guests. Additionally, the online system can be automatically visualized in different mobile devices, following the responsive web design pattern.
Objectives - To evaluate and compare the results produced by Summon and EBSCO Discovery Service (EDS) for the types of searches typically performed by library users at North Carolina State University. Also, to compare the performance of... more
Requirements Engineering is the set of activities involved in creation, managing, documenting, and maintaining a requirements' set for a product. Engineering involves the use of systematic repeatability techniques to ensure that the... more
Requirements Engineering is the set of activities involved in creation, managing, documenting, and maintaining a requirements' set for a product. Engineering involves the use of systematic repeatability techniques to ensure that the Software Requirements are complete, consistent, valid, and verifiable. Software Requirements Specification is an organized process oriented toward defining, documenting and maintaining requirements throughout the development life cycle. Many authors suggest that requirements should always focus their claims on what the software product needs to address, without specifying how to implement them. However, the detail of Software Requirements is influenced by several factors such as: organizational thinking; existing specification standards; and regulatory needs. This work fits exactly with regulatory needs, where the characteristics of Software Requirements Specification in Regulated Environments such as aeronautics, railways and medical are presented and explored. This paper presents and analysis of software requirements specification characteristics in regulated environments. The four characteristics identified are: consistency (internal and external), unambiguity, verifiability, and traceability. The paper also describes the three standards used in these regulated environments (RTCA DO-178C, IEC 62279 and IEC 62304) and examines their similarities and differences from a Requirements Specification standpoint. The similarities and differences will be used to address a future requirements framework universal process that can be configured to address each standard by the usage of Software Process Lines.
المستخلص تهدف الدراسة إلى وضع مواصفات دليل بحث الخرائط الطبوغرافية المصرية، وذلك بالتطبيق على موقع الهيئة المصرية العامة للمساحة، تبدأ الدراسة بمقدمة منهجية شاملة، ثم تقدم تمهيد عن أدلة ومحركات البحث، ثم تناقش الدراسة المعايير التي يجب... more
This book explains open source tools that are simple to use to enable programmers who do not possess extensive computational or linguistic experience to develop search, categorization and text analysis applications. The sample code... more
Three online tools that students often use to assist them with their language needs are online translators, online dictionaries, and search engines. An experimental study was conducted with 310 participants taking Spanish or French to... more
Three online tools that students often use to assist them with their language needs are online translators, online dictionaries, and search engines. An experimental study was conducted with 310 participants taking Spanish or French to investigate both the amount of usage of these three resources among third- and fourth-semester university students, and student attitudes related to online dictionaries and translators. The results show that nearly nine out of ten students (87.7%) say they use online dictionaries for graded work at least sometimes. Surprisingly, the exact same percentage (87.7%) report online translator use despite the fact that online translators are prohibited at the institution where the study was conducted. Search engine use was lower, but still represents just over three out of four students. Similar but smaller percentages were found for all three tools on non-graded language practice. Participants held almost exclusively positive views of online dictionaries (93.9%), whereas opinions on online translators were mixed, but still mostly positive (75.6%). This study highlights the prevalence with which these online tools are used as well as a variety of student opinions. The results are discussed, suggestions for further research are given, and implications for teaching are provided.