Harry Halpin | W3C - Academia.edu (original) (raw)

Papers by Harry Halpin

Research paper thumbnail of The complex dynamics of collaborative tagging

Proceedings of the 16th …, Jan 1, 2007

The debate within the Web community over the optimal means by which to organize information often... more The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including whether coherent categorization schemes can emerge from unsupervised tagging by users. This paper uses data from the social bookmarking site del.icio.us to examine the dynamics of collaborative tagging systems. In particular, we examine whether the distribution of the frequency of use of tags for "popular" sites with a long history (many tags and many users) can be described by a power law distribution, often characteristic of what are considered complex systems. We produce a generative model of collaborative tagging in order to understand the basic dynamics behind tagging, including how a power law distribution of tags could arise. We empirically examine the tagging history of sites in order to determine how this distribution arises over time and to determine the patterns prior to a stable distribution. Lastly, by focusing on the high-frequency tags of a site where the distribution of tags is a stabilized power law, we show how tag co-occurrence networks for a sample domain of tags can be used to analyze the meaning of particular tags given their relationship to other tags.

Research paper thumbnail of The dynamics and semantics of collaborative tagging

… of the 1st Semantic Authoring and …, Jan 1, 2006

The debate within the Web community over the optimal means by which to organize information often... more The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including the dynamics of such systems and whether coherent classification schemes can emerge from undirected tagging by users. Currently millions of users are using collaborative tagging without centrally organizing principles, and many suspect this exhibits features considered to be indicative of a complex system. If this is the case, it remains to be seem whether collaborative tagging by users over time leads to emergent classification schemes that could be formalized into an ontology usable by the Semantic Web. This paper uses data from "popular" tagged sites on the social bookmarking site del.icio.us to examine the dynamics of such collaborative tagging systems. In particular, we are trying to determine whether the distribution of tag frequencies stabilizes, which indicates a degree of cohesion or consensus among users about the optimal tags to describe particular sites. We use tag co-occurrence networks for a sample domain of tags to analyze the meaning of particular tags given their relationship to other tags and automatically create an ontology. We also produce a generative model of collaborative tagging in order to model and understand some of the basic dynamics behind the process.

Research paper thumbnail of Identity, reference, and meaning on the web

Proceedings of the Workshop on Identity, Meaning and …, Jan 1, 2006

Problems of reference, identity, and meaning are becoming increasingly endemic on the Web. We foc... more Problems of reference, identity, and meaning are becoming increasingly endemic on the Web. We focus first on the convergence between Web architecture and classical problems in philosophy, leading to the advent of "philosophical engineering." We survey how the Semantic Web initiative in particular provoked an "identity crisis" for the Web due to its use of URIs for both "things" and web pages and the W3C's proposed solution. The problem of reference is inspected in relation to both the direct object theory of reference of Russell and the causal theory of reference of Kripke, and the proposed standards of new URN spaces and Published Subjects. Then we progress onto the problem of meaning in light of the Fregean slogan of the priority of meaning over reference and the notion of logical interpretation. The popular notions of "social meaning" and the practice of tagging as a possible solution is analyzed in light of the ideas of Lewis on convention. Finally, we conclude that a full notion of meaning, identity, and reference may be possible, but that it is an open problem on how practical implementations and standards can be created.

Research paper thumbnail of When owl: sameAs isn't the same: An analysis of identity links on the semantic web

… of the WWW2010 workshop on Linked …, Jan 1, 2010

In Linked Data, the use of owl:sameAs is ubiquitous in 'inter-linking' data-sets. However, there ... more In Linked Data, the use of owl:sameAs is ubiquitous in 'inter-linking' data-sets. However, there is a lurking suspicion within the Linked Data community that this use of owl:sameAs may be somehow incorrect, in particular with regards to its interactions with inference. In fact, owl:sameAs can be considered just one type of 'identity link,' a link that declares two items to be identical in some fashion. After reviewing the definitions and history of the problem of identity in philosophy and knowledge representation, we outline four alternative readings of owl:sameAs, showing with examples how it is being (ab)used on the Web of data. Then we present possible solutions to this problem by introducing alternative identity links that rely on named graphs.

Research paper thumbnail of One document to bind them: combining XML, web services, and the semantic web

… of the 15th international conference on …, Jan 1, 2006

We present a paradigm for uniting the diverse strands of

Research paper thumbnail of In defense of ambiguity

International Journal on Semantic Web and …, Jan 1, 2008

• There are two distinct relationships between names and things.

Research paper thumbnail of Emergence of consensus and shared vocabularies in collaborative tagging systems

ACM Transactions on the Web ( …, Jan 1, 2009

This paper uses data from the social bookmarking site del.icio.us to empirically examine the dyna... more This paper uses data from the social bookmarking site del.icio.us to empirically examine the dynamics of collaborative tagging systems and to study how coherent categorization schemes emerge from unsupervised tagging by individual users.

Research paper thumbnail of When owl: sameAs isn't the same: An analysis of identity in linked data

The Semantic Web– …, Jan 1, 2010

In Linked Data, the use of owl:sameAs is ubiquitous in interlinking data-sets. There is however, ... more In Linked Data, the use of owl:sameAs is ubiquitous in interlinking data-sets. There is however, ongoing discussion about its use, and potential misuse, particularly with regards to interactions with inference. In fact, owl:sameAs can be viewed as encoding only one point on a scale of similarity, one that is often too strong for many of its current uses. We describe how referentially opaque contexts that do not allow inference exist, and then outline some varieties of referentially-opaque alternatives to owl:sameAs. Finally, we report on an empirical experiment over randomly selected owl:sameAs statements from the Web of data. This theoretical apparatus and experiment shed light upon how owl:sameAs is being used (and misused) on the Web of data.

Research paper thumbnail of A framework for text mining services

Proceedings of the UK …, Jan 1, 2004

The growth of online scientific literature, coupled with the growing maturity of text processing ... more The growth of online scientific literature, coupled with the growing maturity of text processing technology, has boosted the importance of text mining as a potentially crucial tool. However, there are several challenges to be addressed before sophisticated text mining services can be deployed within emerging workflow environments. Our work contributes at two levels. At the invocation level, we have developed a flexible XML-based pipeline architecture which allows non-XML processors to be readily integrated. At the description/discovery level, we have developed a broker for service composition, and an accompanying domain ontology, that leverage the OWL-S approach to service profiles.

Research paper thumbnail of Extracting common sense knowledge from Wikipedia

Proceedings of the Workshop on Web Content …, Jan 1, 2006

Much of the natural language text found on the web contains various kinds of generic or "common s... more Much of the natural language text found on the web contains various kinds of generic or "common sense" knowledge, and this information has long been recognized by artificial intelligence as an important supplement to more formal approaches to building Semantic Web knowledge bases. Consequently, we are exploring the possibility of automatically identifying "common sense" statements from unrestricted natural language text and mapping them to RDF. Our hypothesis is that common sense knowledge is often expressed in the form of generic statements such as Coffee is a popular beverage, and thus our work has focussed on the challenge of automatically identifying generic statements. We have been using the Wikipedia xml corpus as a rich source of common sense knowledge. For evaluation, we have been using the existing annotation of generic entities and relations in the ace 2005 corpus.

Research paper thumbnail of The complex dynamics of collaborative tagging

Proceedings of the 16th …, Jan 1, 2007

The debate within the Web community over the optimal means by which to organize information often... more The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including whether coherent categorization schemes can emerge from unsupervised tagging by users. This paper uses data from the social bookmarking site del.icio.us to examine the dynamics of collaborative tagging systems. In particular, we examine whether the distribution of the frequency of use of tags for "popular" sites with a long history (many tags and many users) can be described by a power law distribution, often characteristic of what are considered complex systems. We produce a generative model of collaborative tagging in order to understand the basic dynamics behind tagging, including how a power law distribution of tags could arise. We empirically examine the tagging history of sites in order to determine how this distribution arises over time and to determine the patterns prior to a stable distribution. Lastly, by focusing on the high-frequency tags of a site where the distribution of tags is a stabilized power law, we show how tag co-occurrence networks for a sample domain of tags can be used to analyze the meaning of particular tags given their relationship to other tags.

Research paper thumbnail of The dynamics and semantics of collaborative tagging

… of the 1st Semantic Authoring and …, Jan 1, 2006

The debate within the Web community over the optimal means by which to organize information often... more The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including the dynamics of such systems and whether coherent classification schemes can emerge from undirected tagging by users. Currently millions of users are using collaborative tagging without centrally organizing principles, and many suspect this exhibits features considered to be indicative of a complex system. If this is the case, it remains to be seem whether collaborative tagging by users over time leads to emergent classification schemes that could be formalized into an ontology usable by the Semantic Web. This paper uses data from "popular" tagged sites on the social bookmarking site del.icio.us to examine the dynamics of such collaborative tagging systems. In particular, we are trying to determine whether the distribution of tag frequencies stabilizes, which indicates a degree of cohesion or consensus among users about the optimal tags to describe particular sites. We use tag co-occurrence networks for a sample domain of tags to analyze the meaning of particular tags given their relationship to other tags and automatically create an ontology. We also produce a generative model of collaborative tagging in order to model and understand some of the basic dynamics behind the process.

Research paper thumbnail of Identity, reference, and meaning on the web

Proceedings of the Workshop on Identity, Meaning and …, Jan 1, 2006

Problems of reference, identity, and meaning are becoming increasingly endemic on the Web. We foc... more Problems of reference, identity, and meaning are becoming increasingly endemic on the Web. We focus first on the convergence between Web architecture and classical problems in philosophy, leading to the advent of "philosophical engineering." We survey how the Semantic Web initiative in particular provoked an "identity crisis" for the Web due to its use of URIs for both "things" and web pages and the W3C's proposed solution. The problem of reference is inspected in relation to both the direct object theory of reference of Russell and the causal theory of reference of Kripke, and the proposed standards of new URN spaces and Published Subjects. Then we progress onto the problem of meaning in light of the Fregean slogan of the priority of meaning over reference and the notion of logical interpretation. The popular notions of "social meaning" and the practice of tagging as a possible solution is analyzed in light of the ideas of Lewis on convention. Finally, we conclude that a full notion of meaning, identity, and reference may be possible, but that it is an open problem on how practical implementations and standards can be created.

Research paper thumbnail of When owl: sameAs isn't the same: An analysis of identity links on the semantic web

… of the WWW2010 workshop on Linked …, Jan 1, 2010

In Linked Data, the use of owl:sameAs is ubiquitous in 'inter-linking' data-sets. However, there ... more In Linked Data, the use of owl:sameAs is ubiquitous in 'inter-linking' data-sets. However, there is a lurking suspicion within the Linked Data community that this use of owl:sameAs may be somehow incorrect, in particular with regards to its interactions with inference. In fact, owl:sameAs can be considered just one type of 'identity link,' a link that declares two items to be identical in some fashion. After reviewing the definitions and history of the problem of identity in philosophy and knowledge representation, we outline four alternative readings of owl:sameAs, showing with examples how it is being (ab)used on the Web of data. Then we present possible solutions to this problem by introducing alternative identity links that rely on named graphs.

Research paper thumbnail of One document to bind them: combining XML, web services, and the semantic web

… of the 15th international conference on …, Jan 1, 2006

We present a paradigm for uniting the diverse strands of

Research paper thumbnail of In defense of ambiguity

International Journal on Semantic Web and …, Jan 1, 2008

• There are two distinct relationships between names and things.

Research paper thumbnail of Emergence of consensus and shared vocabularies in collaborative tagging systems

ACM Transactions on the Web ( …, Jan 1, 2009

This paper uses data from the social bookmarking site del.icio.us to empirically examine the dyna... more This paper uses data from the social bookmarking site del.icio.us to empirically examine the dynamics of collaborative tagging systems and to study how coherent categorization schemes emerge from unsupervised tagging by individual users.

Research paper thumbnail of When owl: sameAs isn't the same: An analysis of identity in linked data

The Semantic Web– …, Jan 1, 2010

In Linked Data, the use of owl:sameAs is ubiquitous in interlinking data-sets. There is however, ... more In Linked Data, the use of owl:sameAs is ubiquitous in interlinking data-sets. There is however, ongoing discussion about its use, and potential misuse, particularly with regards to interactions with inference. In fact, owl:sameAs can be viewed as encoding only one point on a scale of similarity, one that is often too strong for many of its current uses. We describe how referentially opaque contexts that do not allow inference exist, and then outline some varieties of referentially-opaque alternatives to owl:sameAs. Finally, we report on an empirical experiment over randomly selected owl:sameAs statements from the Web of data. This theoretical apparatus and experiment shed light upon how owl:sameAs is being used (and misused) on the Web of data.

Research paper thumbnail of A framework for text mining services

Proceedings of the UK …, Jan 1, 2004

The growth of online scientific literature, coupled with the growing maturity of text processing ... more The growth of online scientific literature, coupled with the growing maturity of text processing technology, has boosted the importance of text mining as a potentially crucial tool. However, there are several challenges to be addressed before sophisticated text mining services can be deployed within emerging workflow environments. Our work contributes at two levels. At the invocation level, we have developed a flexible XML-based pipeline architecture which allows non-XML processors to be readily integrated. At the description/discovery level, we have developed a broker for service composition, and an accompanying domain ontology, that leverage the OWL-S approach to service profiles.

Research paper thumbnail of Extracting common sense knowledge from Wikipedia

Proceedings of the Workshop on Web Content …, Jan 1, 2006

Much of the natural language text found on the web contains various kinds of generic or "common s... more Much of the natural language text found on the web contains various kinds of generic or "common sense" knowledge, and this information has long been recognized by artificial intelligence as an important supplement to more formal approaches to building Semantic Web knowledge bases. Consequently, we are exploring the possibility of automatically identifying "common sense" statements from unrestricted natural language text and mapping them to RDF. Our hypothesis is that common sense knowledge is often expressed in the form of generic statements such as Coffee is a popular beverage, and thus our work has focussed on the challenge of automatically identifying generic statements. We have been using the Wikipedia xml corpus as a rich source of common sense knowledge. For evaluation, we have been using the existing annotation of generic entities and relations in the ace 2005 corpus.