Picturing Usenet: Mapping Computer-Mediated Collective Action (original) (raw)
Abstract
Usenet is a complex socio-technical phenomenon, containing vast quantities of information. The sheer scope and complexity make it a challenge to understand the many dimensions across which people and communication are interlinked. In this work, we present visualizations of several aspects and scales of Usenet that combine to highlight the range of variation found in newsgroups. We examine variations within hierarchies, newsgroups, authors, and social networks. We find a remarkable diversity, with clear variations that mark starting points for mapping the broad sweep of behavior found in this and other social cyberspaces. Our findings provide the basis for initial recommendations for those cultivating, managing, contributing, or consuming collectively constructed conversational content.
Introduction: Goals, Roles, and Social Structures
Conversational social cyberspaces are repositories of messages and replies to messages; these collections of messages can aggregate into rich social institutions and content collections. These environments have come to play a ubiquitous and central role in knowledge management, social and technical support, and medical and political decision-making (to name just a few consequential arenas); as such, understanding computer-mediated social interactions within these spaces is becoming increasingly important. Baseline data about the range of variation of these spaces and their participants have been mostly unavailable, making systematic management or evaluation difficult.
To explore the nature of conversational social cyberspaces and the relationships they host, we have created a set of tools based around the application of data mining and information visualization to collections of data within social cyberspaces. To facilitate the exploration of multiple theoretical concerns related to collective action, social network theory, and the emergence of roles and leadership in computer mediated interactions, we have applied these tools to an analysis of Usenet newsgroups These tools provide a view of Usenet that suggests approaches for practitioners and participants seeking to manage and cultivate the output of computer-mediated collective efforts.
The Netscan project has focused on analyzing collections of messages drawn from threaded discussion environments like Usenet, web based forums, and bulletin boards. Netscan represents an effort to collect, process, and create visual interfaces for the data generated within Usenet over a five-year period. These data consist of 1.2 billion Usenet messages created by 48 million distinct identities and sorted into 150,000 different newsgroups. We describe a range of dimensions along which this data can be measured using “social accounting metadata.” Social accounting metadata are metrics dealing with social information of cyberspaces, such as the number of messages and authors in a newsgroup, the number of posts or replies of an author, or the number of authors who posted to a thread.
In this article, we begin to categorize newsgroups, authors, and threads in terms of their patterns of activity and development. Newsgroups range in size from desolate places that attract no messages to a tiny minority of newsgroups that attract tens of thousands of messages and participants per month. Newsgroups can be distinguished in terms of their population size and growth, the stability of the core population of long-term participants, and their inter-connection with other newsgroups. Authors also come in multiple forms, from the dominant type of one-time only posters, to various forms of active contributors which include “Answer People,”“Questioners,”“Trolls,”“Spammers/Binary Posters,” and “Flame Warrior/Conversationalist.”
We present a series of visualizations of Usenet newsgroups and user activity. These visualizations show newsgroup hierarchies, newsgroup populations, authors, and conversation threads. These images help convey the range of variation among these entities and suggest some initial designs for possible interfaces for end users of these spaces. We suggest for conversational social cyberspaces in particular and computer-mediated collective action systems in general several next steps for the enhancement of interfaces.
Strategies for Studying Social Cyberspaces
The study of social cyberspaces is growing rapidly across social, computer, and information sciences; and conventions about how such research should be accomplished are emerging (Howard, 2003; Paccagnella, 1997; Rosen Woelfel, Krikorian, & Barnett, 2003). In this section, we briefly consider some prominent general recommendations, as well as research specific to different analysis strategies: thread and conversation structure, descriptive statistics and comparison, content analysis, ethnography, and network analysis. For brevity, we omit other relevant areas, including inferential statistics, experiments, surveys, and simulations.
Several research tools combine analysis of thread structure with other types of data (Krikorian & Ludwig, 2003; Sack, 2000, 2002; Smith, 1999). Krikorian and Ludwig (2003) provide a unique movie-like interface for watching the temporal development of thread structures. Netscan (Smith, 1999, 2003; Smith & Fiore, 1999) combines thread structure with meta-statistics on actors and newsgroups, greatly facilitating comparison across newsgroups. Krikorian and Kiyomiya (2002) offer an alternative framework for studying meta-statistics about participation in Usenet newsgroups, which they use to predict persistence and death of Usenet newsgroups. Additionally, Warren Sack's Conversation Map marries analysis of thread structure with social and semantic relationships, facilitating content and structure based triangulation (Sack, 2000, 2002). All of these strategies allow researchers to link message content to the broader social context of the conversation.
Because many online settings generate qualitative data in a quantified setting, those data are especially appropriate for systematic content analysis (Howard, 2002; Neuendorf, 2002). More generally, several automated systems for content analysis of online data are being developed (Sack, 2000, 2002; for chat data, see Rosen et al., 2003).
Cyber ethnography and strategies/tools for ethnographic study of cyberspace include Hine (2000), Howard (2002), Mann and Stewart (2000), Markham (1998), Miller and Slater (2000), and Morrill and Fine (1997). Many researchers further draw attention to issues of identity, roles, and social identity in cyberspace (Golder, 2003; Hogg, Abrams, Otten, & Hinkle, 2004; Turkle, 1995). The integration of ethnographic analysis of identity and roles with other methods (network analysis, descriptive statistics) provides promising directions for research.
The generation of social accounting metadata in Usenet and other venues for computer-mediated interaction suggests a fourth level of analysis: comparison of discussion behavior through descriptive statistics. The major methodological contribution of this article is to describe and demonstrate strategies for combining descriptive statistics with visualizations of those data. Thus we follow up on the methodological challenge articulated by Smith and Fiore (2001):
The diverse facets of online discussions—the messages themselves, their temporal and logical sequence, the relationships of their authors—do not integrate easily with each other. Thus, grasping the nature and extent of interaction in a complex conversation from just one kind of interface is difficult or impossible.
Data and Methods
The Usenet
Interaction in Usenet consists of posting new messages and replying to existing messages. These conversations are organized in hierarchies, within newsgroups, and within threads. The Usenet is a distribution system for the exchange of text-based messages, providing services similar to a set of publicly archived email lists. Each newsgroup is named in such a way that it is grouped together with others into general areas called hierarchies, which are indicated by the prefix attached to the name of the newsgroup. For example, “rec.” indicates discussion topics about recreation; we refer to this as the “rec” hierarchy. At the core of newsgroup activity is the generation and exchange of messages, which are publicly accessible. Like email, the messages are sent asynchronously; unlike mailing lists, they are publicly archived.
Usenet is certainly not the only repository of threaded conversations. Indeed, the last five years have been a period during which a vast number of people have come online. Numerous communication tools are available, from new forms of interaction like instant messaging, weblogs, and wikis, to the widespread availability of private message boards and webforum systems. The trends and patterns presented here should be considered against this backdrop of social and technical change (see Herring, 2004).
Usenet's general use has changed, too. In the last several years, as peer-to-peer file sharing has exploded across the Internet, file transfer has been an important part of Usenet traffic. Many of the text messages on Usenet are in fact portions of music files, encoded for transfer and sent to newsgroups such as “alt.binaries.music.mp3.”
Still, conversation and social interaction on Usenet continue to be important motivators, and the bulk of people sending messages to Usenet seem to be doing so as part of conversations. In this study, our analyses focus on hierarchy, newsgroup, and actor level metrics of posts and replies.
Research Approach
Fundamentally, our research studies the nature of social interaction through the behaviors recorded in the Usenet. We identify three general research objectives: 1) to characterize and measure interaction in the Usenet; 2) to develop typologies of interaction that take place in the Usenet; and 3) to describe how these types of interaction are distributed across actors, newsgroups, threads, and hierarchies.
Our analysis addresses a series of more concrete research questions, each combined with methods of analysis made possible due to the extensive data we have collected from Usenet.
How do Newsgroup Hierarchies Vary?
Starting at a very broad level, we investigate the distribution of types of newsgroups across the various hierarchies of Usenet. This research vein asks: Which newsgroups are growing? Is social interaction in Usenet growing or diminishing? How does the nature of interaction in the group affect the organization of hierarchies?
In addition, we compare sub-hierarchies to each other in order to establish whether there are real differences among “neighborhoods” of Usenet: Are the popular culture groups structured differently from the technical newsgroups?
How Does Interaction Within Newsgroups Vary?
Newsgroups vary not just in their content, but in the nature of the social interaction that takes place in them. We attempt to apply social accounting metadata to distinguish newsgroups from each other. Are newsgroups visibly different based on their function or utility? Are the discussions or activities within a newsgroup reflected by the group's structure?
How do Participants' Contributions to Usenet Vary?
Participants vary in what they post, how often, how much, and to which newsgroups. We attempt to examine the structure of posters' frequency of posting, and their relationships to other posters, in order to understand the differences among different posts. Can we find visible structural differences among different roles of users?
Samples and Analysis Strategies
We sampled from the Netscan dataset using two general strategies: overview and selective analysis based on empirical trends. The data collected, and the exact sampling strategy, varied according to each of the methods of analysis employed, because in each, the unit of analysis differed. We briefly describe the sample and analysis methods for each of the three inquiries described above.
To address our first research question, how newsgroup hierarchies vary, we explore the overall organization and change of Usenet across time using the Treemap visualization strategy (Shneiderman, 2004; Smith, 2001). We begin by sampling all hierarchies from Usenet during the period 2000-2004. We compare annual rates of replies for four Usenet hierarchies: one for all of Usenet, and three subsidiary hierarchies for the same period. These periods are compared along different metrics to see whether social interaction has changed over time.
For our second research question, how interaction within newsgroups varies, we studied the 1,000 most active newsgroups during the years 2003 and 2004, concentrating on number of posts and replies. We generated Newsgroup Crowd visualizations for each newsgroup, and compared the resulting visualizations and noted systematic differences in the distribution of individual posting behavior across newsgroups on different topics. We isolated general “types” of newsgroups, choosing several newsgroups from each type as examples to analyze in greater detail.
For our third research question, how participants' contributions to Usenet vary, we investigated different types of actors by inspecting the 1,000 most active participants across all of Usenet, in terms of both the number of posts and the number of messages to which they had replied. We generated visualizations of these actors' annual posting behaviors, and compared these to identify major actor types. We then compared the types we observed to theoretical articulations of social roles in Usenet (Golder, 2003), and followed up in greater depth on several of these types, focusing especially on “Answer People.” For this analysis we combined the analysis of role through patterned behavior (see Burke & Reitzes, 1991) with network visualization (Scott, 2000; White, Boorman, & Breiger, 1976) to investigate the overlap between role as behavior and role as structural position. In this analysis, we focused in on structure of behavior within a single technical help newsgroup in order to better investigate the “Answer person” and related “question person” roles.
Throughout this work, we look in detail at the newsgroup “microsoft,public.windows.server.general.” This group is interesting as a site in which a fair amount of technical conversation happens: It is a fairly active newsgroup, but certainly not the largest or most active within its hierarchy. The group is not heavily plagued by spam; the members there largely ask and answer questions and engage in discussions. We examined several of the most frequent posters and several other participants. Throughout this article, we will revisit those individuals, and the newsgroup, in order to connect their roles and positions as exposed through the variety of visualizations and displays. As such, the newsgroup provides a good baseline to illustrate some of the sorts of activities in which we are interested.
Analysis
We pursue our three research objectives through three separate analyses. All three analyses are based upon results collected from the Netscan database of Usenet messages. All three analyses present a qualitative perspective that is based on quantitative data: That is, we use visualizations to compare different portions of the online social environments. The three analyses we present here focus first on the hierarchy of newsgroups, second on the collective social behavior within newsgroups, and third on individual differences among people.
Analysis 1: How do Newsgroup Hierarchies Vary?
Taken as a whole, Usenet is a complex, multifaceted environment. In order to understand the social activity of particular newsgroups, it is helpful to view patterns at the macro level, across time, and across different metrics (Smith & Fiore, 2001). The Treemap (Shneiderman, 2004) allows us to depict hierarchies of newsgroups as boxes nested inside boxes, each with an area depending on a selected metric and colored by the change on another metric. Using this visualization, we can measure and reveal important patterns between collections of newsgroups within hierarchies, and within newsgroups across time. Figures 1a and 1b are Treemaps for all of Usenet for January of 2000 and 2004, depicting the volume of messages posted to each newsgroup during that month. Labels that define categories are centered on their area, with font size correlated to the area of mapping. Newsgroups are colored by their change in volume: A green newsgroup has more messages than it had had the previous year; a red newsgroup has fewer. The intensity of the color indicates the relative change.
Figure 1a
Posts to all of Usenet: Treemap for January 2000. The hierarchies alt.binaries and rec are highlighted. Click on images to enlarge
Which Newsgroups Are Growing?
Looking at the Treemap for the January 2000 in Figure 1a, the several top-level newsgroup hierarchies are immediately visible: “alt” (for “alternative”) is the largest top-level hierarchy. In the “alt” newsgroups, people discuss almost everything conceivable—from music, to politics, to astrology. Near the bottom center is rec (for “recreation”), which includes topics like rec.kites, where people go to post about flying kites. Usenet is an international system, and foreign language hierarchies like “tw.” (Taiwan), “de” (German), and “uk” (British) are visible immediately above the “rec” hierarchy.
The bottom left quadrant of the 2000 Treemap is filled by the “alt.binaries” hierarchy, which contains newsgroups dedicated to exchanging files: pictures, music, video, and software. For comparison, the hierarchy “rec,” dedicated to discussions of recreation, is highlighted in yellow.
The most striking difference between January 2000 (Figure 1a) and January 2004 (Figure 1b) is the growth in the proportional size of the alt.binaries hierarchy as compared to other areas of alt and the rest of Usenet in general. From this point of view the more social activities of Usenet—discussion, support, and technical help—are being overtaken by file exchange.
Is Social Interaction in Usenet Growing or Diminishing?
While the volume of posts may be greatest in these binary groups, this by no means suggests that the remainder of Usenet activity is becoming less important. We can examine newsgroups to look for more social activity, and attempt to compare 2000 to 2004 in terms of interaction. The Treemaps in Figure 1 have counted all posts in the newsgroup. Yet these initial posts without replies can be indicative of spam and file transfer activities, rather than interaction and conversation. Figure 2 recalculates Figure 1, this time scaled by the number of replies that occurred in the newsgroup rather than the number of posts.
Figure 1b
Posts to all of Usenet: Treemap for January 2004. The hierarchies alt.binaries and rec are highlighted. Click on images to enlarge
These data suggest that the more social aspects of Usenet are still thriving in volume. Figures 1a and 1b seem to suggest that the “alt” hierarchy is growing relative to other areas of Usenet. In Figures 2a and 2b, however, we see the opposite result—the proportion of people posting replies in those newsgroups has dropped compared to the rest of Usenet. In terms of conversation, “alt” is shrinking. Measuring activity by the number of posts rather than the number of replies exaggerates the apparent importance of binaries.
Treemaps can show change in size, relative to other groups, but they do not show change in absolute volume. Figures 3a and 3b plot the total number of posts and replies by year for all of Usenet, and for three major sub-areas. Figure 3a shows that, in absolute terms, stand-alone posts are growing rapidly in all of Usenet, while replies are declining per year. This suggests that the relative growth in posts of binaries did not occur at the expense of conversation. Figure 3b bolsters this: The number of replies in “alt” (which contains “alt.binaries”) has declined slightly, while the number of posts has grown substantially. In contrast, Figure 3d shows clear growth in both posts and replies in a technical help area (the “microsoft.public” hierarchy, dedicated to discussing Microsoft products and technology).
Figure 2a
Replies to all of Usenet treemap for January 2000. alt.binaries and rec hierarchies are highlighted in yellow. Click on images to enlarge
These different trajectories may reflect the changes in the larger online environment. The period of our study captures a legal crackdown on peer-to-peer file trading, which may have driven file traders to Usenet; it also covers a period of expanding opportunities for political and social discourse in online discussion forums, including blogs, discussion boards, and mailing lists. Overall, Usenet is characterized by a relatively constant and high level of social interaction (98 million replies per year), but activity levels vary by subject area, with social and political discussion areas reflecting slight declines.
How Does the Nature of Interaction in the Group Affect the Organization of Hierarchies?
In the next set of figures, we look at small subsections of the Usenet in order to better understand the activity in those newsgroups. Here we draw a comparison between newsgroups where the primary focal activity is the provision of technical help, and newsgroups where the primary activity is discussion and debate.
Figure 4 depicts all of the newsgroups in the microsoft.public hierarchy that host discussion newsgroups on Microsoft software and hardware products. The primary focus of these newsgroups is the seeking and provision of help. What is striking about this Treemap is the strongly nested hierarchical system of dividing newsgroups. This is in sharp contrast to areas where activity is primarily focused on discussion and debate.
Compare the deeply-nested structure in technical help newsgroups of Figure 5 to the relative absence of nesting in “alt.fan,” a hierarchy of newsgroups that are focused on appreciation of some topic or prominent social figure. Alt.fan.rush-limbaugh and alt.fan.cecil-adams are two popular examples.
Figure 3
Number of messages (top) and number of replies (below) for all Usenet and three sub-hierarchies between 2000 and 2004
What accounts for the striking absence of nested hierarchy in fan groups, and its extensive presence in technical help groups? Part of the answer may stem from the nature of the content areas: Logical subsections may be more salient in software discussions. Another part may stem from the different social histories of the hierarchies: There is more collaboration and control over the creation of newsgroups in the “microsoft.public” hierarchy, while it is fairly easy for users to casually create an “alt” group. This may lead to a broader, flatter hierarchy. These different hierarchies reflect different group behavior.
Analysis 2: How Does the Interaction Within Newsgroups Vary?
The collection of newsgroups that make up Usenet is tremendously varied, dedicated to a range of activities including discussion, answering questions, argument, and file sharing; newsgroups can be repositories of flame wars (Golder, 2003; Donath, 1999; Kayany, 1998) or besieged by spam. A new user trying to fill a particular need may be stymied merely by having trouble finding a newsgroup with an appropriate tone or purpose.
The Newsgroup Crowd for Microsoft.public.windows.server.general in Figure 6 (Viégas & Smith, 2004) is intended to help resolve this difficulty by summarizing the aggregate behavior of different persons in the newsgroup. Individual behavior can be summarized, in part, by understanding how often an entity shows up in a particular newsgroup, the degree to which they linger in a particular thread, and the amount to which they contribute to the newsgroup as a whole. A person who contributes to a newsgroup fairly often is more likely to be an active participant, while a person who shows up for a very short period of time and posts a great many messages is more likely to be posting parts of a binary file. A person who puts many messages in a thread is likely to be involved in a conversation.
Figure 4
Treemap of newsgroups by number of replies in Microsoft.public during all of 2004. Microsoft.windows.server.general is highlighted in yellow. Click on image to enlarge
The Newsgroup Crowd plots four axes for each person on a two-dimensional scatter plot, using the size and colors of the dots to show how many messages a given person posted. The vertical axis of the plot shows the number of days during which the person posted on the newsgroup; the horizontal axis shows (in log scale) the number of posts per thread that the person contributed. The log scale helps separate small differences in thread length—which at low numbers are important—while allowing larger values to blend together. We see from the crowd in Figure 6, for example, that most people are along the left side (few messages per thread) with a fair number of people showing up 100 or more times. In addition, the image is colored by the most recent time the person has posted in the group. Those who have posted recently are shown in red, while those who haven not posted for an extended period of time are shown in blue.
This visualization gives us tools for distinguishing individual behavior within and across groups. We will focus on several individuals who are involved in the newsgroup microsoft.public.windows.server.general. For convenience, we refer to them by aliases: “Local,”“Cynic,” and so on. In the next section these people are discussed in more detail; however, it is worth noting how some of the different roles reflect within this diagram. Answer People, who are present for a high proportion of days and post few messages per thread, can be found near the top left of the diagram. “Local,” an occasional participant in the group, is in the crowd at bottom left; “Cynic” is in the cluster a little below the 100 days line.
In Figure 6 the Answer People showed up for most of the 365 days possible in 2004. There are virtually no people who post more than 10 messages to any thread. The “plume” on the left side shows that most people take a small number of turns during most conversations, which is indicative of technical support newsgroups.
The schematic that reflects our current understanding in Figure 7 shows some of the relevant clusters of characteristic behavior within various newsgroups that we have observed. They are not, however, based on a statistical analysis of the newsgroups. We can use these clusters to get a rough idea of the breakdown of the newsgroup, and to learn something about the population.
Figure 5
Replies in alt.fan during all of 2004. Two prominent examples are alt.fan.rush-limbaugh and alt.fan.cecil-adams Click on image to enlarge
Below, compare the four thumbnails in Figure 8 of different newsgroup crowds. Several features of these crowd diagrams make them meaningful and interpretable. Contrast 8a with 8b. 8a is a conversation space (the “adobe photoshop for mac lounge”); note that no users post more than 100 times or more than 10 days. The largest circles, at top right, are relatively frequent discussers and frequent contributors, who get into extended discussions. 8b is a tall, thin plume across the left side of a binaries newsgroup. This may be indicative of automatic posting tools where users may “be seen” many days, but each message is in its own thread. Compare the much more active conversations in 8c and 8d. Both are discussion newsgroups: Few users contribute only one message to a thread (sparse left sides); however, many users show up frequently to talk a lot. Yet the newsgroups have differing norms for the degree of conversation: In 8c, Conversationalists seem to contribute around four messages to a thread. In contrast, the discussion newsgroup in 8d is less orderly: Many people contribute only one message to a thread, while others contribute hundreds. However, the newsgroup regulars—those who post on the most days—seem to converge on a middle value closer to 10. These visualizations have two implications. First, of course, one can use them for an overview of the newsgroup: “This newsgroup seems dedicated to questions and answers; I would rather see a group with more discussion.” These interpretations are potentially valuable for the reader trying to find information, for the analyst trying to choose a field site or to characterize behavior, or for a newsgroup maintainer to assess the health of the newsgroups. We imagine integrating this sort of diagram into user interfaces that list newsgroups: A user could very quickly distinguish the sort of conversation, and perhaps choose to look somewhere else for the information they need. Second, however, are the statistical considerations. The insights from the schematic suggest that a newsgroup's character can be determined from the relative ratios of different sorts of participants. A newsgroup with a strong left plume acts more like a binaries distribution newsgroup (8b); a newsgroup with a tight bottom-left corner and an “Answer-Person plume” looks like a question-and-answer newsgroup (8a); a newsgroup with a broad peak (8c, 8d) looks like a discussion. Statistically, these should be separable from each other: It should be possible to determine heuristically what sort of newsgroup a user is entering, and present additional information about it based on these statistics. (For example, a user searching for a particular keyword might be asked, “Are you looking for a newsgroup that is predominantly populated by discussion, Q&A, or flame war?”)
Figure 6
Newsgroup Crowd Visualization for Microsoft.public.windows.server.general for all of 2003 on log/linear axes
Analysis 3: How do Participants' Contributions to Usenet Vary?
Golder (2003) and other authors (Kayany, 1998; Kim, 2000) have qualitatively observed a subset of newsgroups and a subset of the people within those newsgroups in attempts to generalize about social roles across Usenet. With this approach, authors in Usenet who participate (i.e., “post”) more in the newsgroup under observation are usually studied because they are easiest to discover and their posting patterns and behavior are somewhat exposed.
To build upon this customary method, we have taken a quasi-qualitative approach to spotting unique patterns and usage of Usenet authors by using social accounting metadata to guide our decision in choosing which authors to study qualitatively. Using the visualization tools we have built based on the Netscan system, we are able to view an individual's posting behavior across all of Usenet for a selected time period.
In this section, we discuss contributions of individuals and their social networks who are representative of very active authors (i.e., posters) in Usenet. While we acknowledge that every thriving newsgroup has an important population of “lurkers” (Nonnecke & Preece, 2003) who do not post to the group, they are not a focus of study in this research. In the remainder of this section, we discuss a visualization tool we use to study authors, the types of authors we saw in our data set, and social networks for different types of authors.
The AuthorLines visualization (Viégas & Smith, 2004) and a social network view (Wasserman & Faust, 1994; White, Fisher, & O'Madadhain, 2004) provide a birds-eye view of the posting behavior of an individual for one year across all of Usenet. AuthorLines generates a temporal series of double histograms displaying a user's posting behavior over an entire year (see Figures 10–12), revealing detailed patters about a person's posting behavior. Threads initiated by an author appear at the top of the horizontal line as red circles, while threads to which the author posted but did not initiate appear in blue circles at the bottom of the horizontal lines. Each vertical line represents one week of the year, showing threads to which the author has contributed during that week. The size of the circle correlates with the number of messages contributed by the author to that thread. As the number of messages in a thread increases, the size of the circle increases and becomes more translucent. With this in mind, it is easy to see when an individual has contributed large numbers of messages to a thread that might occur in a debate or the posting of many messages that make up a large binary file. When using the visualization interactively, users can select thread circles to get more information about the selected thread, which can provide a good sense of the kinds of subject matter to which the author contributes and how much he/she contributes to each of them over time.
Figure 8
Newsgroup Crowds for four different newsgroups for all of 2003. All are on log/linear axes. Click on images to enlarge
a) adobe.photoshop.mac.lounge
Fiore, Tiernan, and Smith (2002) report that metrics such as longevity, frequency of participation, and the amount of messages an author contributes to each thread correlate highly with readers' subjective evaluations of the author. In the absence of explicit reputation systems, users of newsgroups seem to employ the above metrics informally and implicitly when interacting with others online in order to weigh and contextualize messages from different authors.
Social network analysis and visualization provide a different perspective on how actors' roles differ within a newsgroup. While AuthorLines emphasizes a single actor's temporal sequence of messages and replies, the exploratory visualization of network relationships places a single author in context with the people around him/her. Figure 9 is a full network diagram of the microsoft.public.windows.server.general newsgroup highlighting key authors and distinct identities of authors in Usenet. Each node represents a person and each link represents a directed tie in which one actor replied to another; thick ties represent repeated responses. Through this view we see the complex interrelations among participants in the newsgroup as represented by their replies to others. It is quickly evident that the group is far from uniform: In addition to those members at core and periphery, there are other distinguishing features.
Figure 7
A schematic diagram of posting patterns within newsgroups. Number of days active is along the vertical axis; horizontal axis is number of messages per thread.
Types of Authors in Usenet
We have identified some characteristic patterns of several different types of authors in the newsgroups studied. These types have been observed elsewhere, but unlike prior work, we provide steps towards identifying these actor types from patterns in their posting behavior rather than the content of their posts. We have alluded to several of these types earlier; here, we attempt to articulate some of the important features of their interaction. They are the Answer Person, the Questioner, the Troll, the Spammer, the Binary Poster, the Flame Warrior, and the Conversationalist.
A Questioner is a person who asks a question within a Q&A newsgroup, looking for an answer. For example, an anonymous contributor to microsoft.windows.server.general wrote:
I tried using GHOST to move the contents on my WIN2000 Servers hard drive to a new bigger drive. It said it copied it ok but I get errors on boot up with the new drive. Any suggestions? Thanks.
An Answer person, like Answer1, contributes answers. Here, he puts in a suggestion:
Hi,
After you ghost it, boot into AD Recovery Mode (or Safe Mode if it isn't a DC) and let it find the new hardware.
Might want to try an in-place upgrade as well (you'll need to re-apply all patches afterwards, be sure to have a backup)
In this case, a second Answer person gives a slightly less polite-but equally helpful-answer:
“I get errors on boot up.”
Usually, the error messages are not random, and actually tells you what is wrong. If you do not know how to read them, include them in your post and someone else might know how to read them.
You maybe also want to look at the Ghost KB here: http://www.symantec.com/techsupp/enterprise/select_product_kb.html
(Click Symantec Ghost)
We can compare the previous example to the Troll. A Troll attempts to cause disruption within a newsgroup by asking (and often successfully dragging out) a provocative question. In this conversation, for example, a Troll-like Cynic tries to drag the Answer People into a discussion of game playing at work with an innocuous-seeming question:
Hi Experts,
…
Since I am an MCSE working at he helpdesk and I have antivirus up-to-date on my home PC and use a firewall. I want to know how to adjust my static routes when connected via the VPN so I can do split-tunneling again. Can any expert out there tell me how I can do this?
An Answer person responds:
You're trying to connect to the corporate network. You need to abide by the corporate security standards, MCSE or no. The admins did the right thing. (Answer1)
As does another:
I agree, your admins disabled your ability to do split tunneling because it is a security risk. Don't attempt to bypass your company's security policy.
The Troll responds:
OK. I agree with you. Especially since I am only a “paper” MCSE! The hardest part is I won't be able to play my on-line multi-player game (see below) while VPN'd into work anymore! :(
And is chastised:
I imagine that if you told your network admins that you needed to enable split tunnelling so that you could play multi-player games over the Internet, they would be less than willing to help you out. Play games when you aren't connected to the VPN and presumably working. Problem solved!
To illustrate several of these types, we draw both an AuthorLines diagram and a network view. The network views show replies: An edge from A to B means that A wrote a message responding to a message from B. In addition, the networks are constrained to the immediate neighborhood around A: They show only the people who A has replied to, or who have replied to A.
Our tools do not fully distinguish all of these types easily: For example, it is still difficult to distinguish between a flame warrior and a Conversationalist. We discuss these different roles here, separating their roles. We should note that we have attempted to find particularly distinctive images representing these types: people who post a great deal or are very active. In fact, of course, many users display less dramatic behavior.
Answer Person
Answer People provide advice to strangers without the promise of a return on their investment: They find questions and provide answers. Some Answer People spend more than 300 days per year in newsgroups helping their peers. We can identify these valuable participants through their posting behavior. An Answer Person primarily replies to threads initiated by others, is primarily involved in short threads, and tends to contribute only a few posts to the threads that they touch. They also tend to be surprisingly consistent in their posting behavior—contributing to a fairly high number of threads every week. In AuthorLines, an Answer person will have consistent patterns of numerous small blue circles descending toward the bottom of the page. Author1's network view in Figure 10 is dense where he has responded to many Questioners' messages.
Questioner
A Questioner is an individual who mostly posts new threads that seek help, information, or clarification from other members. Many legitimate Questioners post a single request, never to return. Others return consistently to particular newsgroups, and post a couple of questions per week. Many of their replies are clarifying statements or some type of follow-up to their original question. They have, therefore, only occasional ties to others. In AuthorLines, they are defined by a few small red circles that range from one to about five posts, accompanied by occasional blue circles, which are often continuations of their initiated threads from previous weeks. Questioners are visible in the network diagram around Answer1 (Figure 10): They have not replied to anyone, and only ask questions.
Troll
A Troll is someone who mostly initiates threads with seemingly legitimate questions or conversation starters. However, the ultimate goal of a Troll is to draw unwitting others into useless discussions. Because of this, Trolls are at the risk of being detected as cynical or manipulative Questioners. If recognized, they are quickly labeled by communities and ostracized by verbal sanctioning followed by filtering (in which members of the group can choose to ignore all messages from the Troll). Because of this, a Troll will look like a legitimate Questioner, but will post more often and be visible in more newsgroups. That is, the Troll will post actively in different newsgroups, starting provocative conversations. In Figure 11, “Cynic” appears to engage in question behavior by looking at his Author Line but his social network reveals that he has successfully engaged multiple people in his web of trolling. Figure 10 gives another interesting view of Cynic in the social network of two prominent Answer People in this newsgroup. Note that there are no one-way connections—Cynic is neither a Question Person nor an Answer Person. The Cynic has successfully dragged these Answer People into several useless conversations.
Figure 8
Newsgroup Crowds for four different newsgroups for all of 2003. All are on log/linear axes. Click on images to enlarge
b) alt.binaries.multimedia.elvispresley
Spammer/Binary Poster
Spammers post irrelevant messages to newsgroups, just as they do to email. Spammers can be easily identified from the pattern of their posts. Spammers are defined by high volumes of initiated threads, in which they contribute a single message, and highly consistent posting behavior. This is expressed visually in AuthorLines as a wall of narrow red columns of circles, as shown in Figure 12a. Each red circle is a new thread started by that actor; the absence of blue circles shows that the Spammer never responds to anyone else's threads. Binary Posters use automated tools to post hundreds of parts of binary files (such as music tracks and movies) to newsgroups; they use Usenet as a file-sharing space. These mass posts make them, presumably, valuable members of their communities. However, it is hard to see the difference between a Spammer and Binary Poster in an Author Line, because the Author Line elides distinctions among newsgroups. We can tell, however, that this is a unique identity that initiates lots of messages in Usenet and never replies to messages. Figure 12b appears to be a poster who both posts binary files and is involved in conversations.
Figure 8
Newsgroup Crowds for four different newsgroups for all of 2003. All are on log/linear axes. Click on images to enlarge
c) alt.pl.tvn.bigbrother
Flame Warrior/Conversationalist
Substantively, Flame warriors and Conversationalists are very different. A Conversationalist comes to discussion venues to discuss, carry on conversations with others, enjoy communion, and evaluate ideas. They generate valuable social interaction, a sense of belonging for other members, and a sense of community. In contrast, Flame Warriors violate the open spirit of conversation and the acceptance of communion with harsh, negative debate. The primary goal of a Flame warrior is to “win” an argument and thereby make him/herself appear superior to others involved in the conversation, especially those who oppose them. In AuthorLines, Conversationalists tend to have widely fluctuating rates of posting, both in terms of number of threads per week, and in terms of posts per threads. They initiate new threads and they reply to others' threads. Thus they have widely fluctuating patterns of large and small circles of both red and blue. In our current analysis, we could not clearly distinguish between Flame warriors and Conversationalists—except in degree. The most exaggerated Conversationalists often participated in flame wars—or highly antagonistic debates. Various high-volume Conversationalists and Flame Warriors can be found in Figure 12c-d. Figure 12c is involved in many short conversations, while Figure 12d is involved in a smaller number of conversations, but posts far more often to them.
We chose the Answer Person, the Questioner, and the Troll from the “microsoft.public.windows.server.general” newsgroup, because it is a site in which a fair amount of technical conversation takes place, and because there are a number of active participants in the group. This group, however, is missing some types of participants: There were no examples of Spammers, Binary Posters, or Flame Warriors that caught our attention. The group, responding largely to questions and answers, tends not to generate the long passionate discussions that attract Flame warriors. In Figure 12 we show extreme cases of behavior of the user roles described in this section.
Discussion
Understanding Social Spaces
Managers of physical social spaces learn to read their crowds and audiences, changing strategies in reaction to dynamic conditions. This work presents visualization techniques in an effort to support the understanding of patterns of social interaction found in computer-mediated conversation systems. Our goal is to provide images of computer-mediated social spaces that capture some of the rich information that is present in a physical social space. These visualizations have been applied to very large spaces in Usenet, and may be relevant to a variety of other online conversation spaces, such as mailing lists and discussion boards. It is possible to reconstruct “Crowds,”“Author Lines,” and network diagrams for any conversational space. Those tools, in turn, can allow stakeholders in the spaces, hosts, managers, leaders, casual participants, and passive consumers, to better understand their newsgroup or alternative conversational repository.
We have demonstrated that these techniques can be used to identify conversation spaces and authors who are contributing in particular patterned ways. By recognizing the diversity of these spaces stakeholders, we can begin to monitor the spaces and contributors reflexively, essentially gaining a mirror with which to see the reflection of their collective activity through the conversational social cyberspace.
In their most basic form, these tools and approaches encourage practitioners to focus on the cultivation of certain kinds of roles in their communities and discussion space. These tools also could provide stakeholders with objective measures to assess the impacts of changing policies and practices. With refinement, we imagine a “dashboard” of real-time images of collections of conversation environments. These components might include images such as those presented here, as well as others that capture alternative dimensions of the data. Each view might be interlinked to others so that activity within one component triggers activity in other linked displays. Components might be selected from a library, allowing particular collections of components to be used for different people with different interests in the social cyberspace, including researchers.
Our intent has been to make visible the latent but invisible patterns present in conversational data sets. We suggest that researchers and other stakeholders around conversational social cyberspaces can benefit from an awareness of the patterns of individual activity and the structure of social interaction within their spaces of interest.
Traditional analyses of social cyberspaces have depended on micro-level, ethnographic examination built out of the details of individual messages. This work suggests an application of an ethnographic and qualitative approach at a level above the details of individual messages through the study of comparative visualizations of social cyberspaces. The creation of images of social cyberspace introduces opportunities for applying interpretive studies of the resulting patterns while supporting those findings with statistical scope and analysis. This has an additional methodological implication: Pulling back from the details of interaction and looking outside of the content of messages leads to focus on the structural context around individual interactions. Applied over time, this approach might be able to document, for example, the emergence and diffusion of developing norms within a set of discussion spaces.
Visualizations and Limitations
Each of the visualizations presented here has limitations: By bringing out some aspects of interaction, they necessarily lose other detail. The Treemap visualization, for example, was originally designed to represent relative usage of computer storage space (see Shneiderman, 2004). While the size of a hard drive is fixed, the total amount of posts to the Usenet is not. As such, our Treemaps are weak at showing total volume of messages.
Usenet Crowds conveys a great deal of information in a single visualization: It represents precise values on three metrics for every actor. However, the detail at the individual level may be obscuring important differences at the group level. In dense regions of the visualization, the overlap and stacking of observations can make the distribution of observations across the space hard to interpret. While it is useful to establish what sorts of persons are absent, it is sometimes difficult to establish presence.
AuthorLines represents the posts from a unique email address to all of Usenet for a given year. This biographical snapshot can provide a good summary of an actor's role in a newsgroup, providing that they do not play different roles in different groups. The current implementation leaves open opportunities to better break apart participation per group, in order to better understand the mixed behavior (as exhibited, for example, in 12b).
Conclusion
This article has followed a path of analysis from a macro overview of all of Usenet through the level of individual newsgroups and finally down to the details of specific authors. In the process of doing so, it has cut across the many dimensions and relations among newsgroups, threads, authors, messages, and time. The three sections of this article have explored the range of variation among different newsgroup hierarchies, among the collective behavior of authors within individual newsgroups, and the behavior of individual authors over time.
Growing a Comprehensive Taxonomy of Usenet
In addition to simply noting the degree of variation, this approach points to a possible taxonomy of Usenet behavior. While this article does not attempt to place its results on a statistical footing, many of the sections have statistical implications. The “Answer People,” with their distinctive network positions and reply patterns, for example, may be found with straightforward queries. This article has also outlined other visible roles that seem to recur. It would be both productive and desirable to extend this.
Our future work centers on finding and statistically defining these roles. Using techniques like cluster analysis and social network analysis, and by linking a careful study of the messages sent by users to the structural response and conversational properties of those users, we hope to separate out different roles for different types of users.
Growing a Comprehensive Taxonomy of Usenet
Another major direction of our future work is a continuing process of examination and evaluation of the various interfaces that Netscan presents. While each visualization is valuable, none of them is well-integrated into the newsreading experience. We are beginning to study ways of connecting this meta-information to Usenet reading, in order to allow users to come into Usenet and have access to information about what social context they are in.
References
Burke
,
P. J.
, &
Reitzes
,
D. C.
(
1991
).
An identity theory approach to commitment
.
Social Psychology Quarterly
,
54
(
3
),
239
–
251
.
Fiore
,
A. T.
,
Lee Tiernan
,
S.
, &
Smith
,
M. A.
(
2002
).
Observed Behavior and Perceived Value of Authors in Usenet Newsgroups
. Paper presented at the Conference on Human Factors in Computing Systems, Minneapolis, MN.
Golder
,
S. A.
(
2003
).
A typology of social roles in usenet
. Unpublished Senior Honors,
Harvard University
, Cambridge, MA.
Herring
,
S. C.
(
2004
).
Slouching toward the ordinary: Current trends in computer-mediated communication
.
New Media & Society
,
6
(
1
),
26
–
36
.
Hine
,
C.
(
2000
).
Virtual Ethnography
. London.
Sage
.
Hogg
,
M. A.
,
Abrams
,
D.
,
Otten
,
S.
, &
Hinkle
,
S.
(
2003
).
The social identity perspective –Intergroup relations, self-conception, and small groups
.
Small Group Research
,
35
(
3
),
246
–
276
.
Howard
,
P. N.
(
2002
).
Network ethnography and the hypermedia organization: New media, new organizations, new methods
.
New Media & Society
,
4
(
4
),
550
–
574
.
Kayany
,
J. M.
(
1998
).
Contexts of uninhibited online behavior: Flaming in social newsgroups on usenet
.
Journal of the American Society for Information Science
,
49
(
12
),
1135
–
1141
.
Krikorian
,
D.
, &
Kiyomiya
,
T.
(
2002
).
Bona fide groups as self organizing systems: Applications to electronic newsgroups
. In
L. R.
Frey
(Ed.),
Group Communication in Context
(pp.
335
–
365
). New York:
Lawrence Erlbaum
.
Krikorian
,
D.
, &
Ludwig
,
G.
(
2003, February
).
Advances in Network Analysis: Over-Time Online Communication Networks
. Paper presented at the 23rd Annual Sunbelt Social Network Conference, Cancun, Mexico.
Mandel
,
M. J.
(
1983
).
Local roles and social networks
.
American Sociological Review
,
48
(
3
),
376
–
386
.
Mann
,
C.
, &
Stewart
,
F.
(
2000
).
Internet Communication and Qualitative Research: A Handbook for Researching Online
. London ; Thousand Oaks, CA.:
Sage Publications
.
Markham
,
A. N.
(
1998
).
Life Online: Researching Real Experience in Virtual Space
. Walnut Creek, CA:
Altamira Press
.
Miller
,
D.
, &
Slater
,
D.
(
2000
).
The Internet: An Ethnographic Approach
. Oxford ; New York:
Berg
.
Morrill
,
C.
, &
Fine
,
G. A.
(
1997
).
Ethnographic contributions to organizational sociology
.
Sociological Methods & Research
,
25
(
4
),
424
–
451
.
Neuendorf
,
K.
(
2002
).
The Content Analysis Guidebook
. Thousand Oaks, CA:
Sage Publications
.
Nonnecke
,
B.
, &
Preece
,
J.
(
2003
).
Silent participants: Getting to know lurkers better
. In
C.
Lueg
&
D.
Fisher
(Eds.),
From Usenet to CoWebs: Interacting with Social Information Spaces
(pp.
110
–
132
). London:
Springer Verlag
.
Rosen
,
D.
,
Woelfel
,
J.
,
Krikorian
,
D.
, &
Barnett
,
G.
(
2003
).
Procedures for analyses of online communities
.
Journal of Computer-Mediated Communication
,
8
(
4
). Retrieved July 12, 2005 from http://jcmc.indiana.edu/vol8/issue4/rosen.html.
Sack
,
W.
(
2000
).
Conversation map: An interface for very large-scale conversations
.
Journal of Management Information Systems
,
17
(
3
),
73
–
92
.
Sack
,
W.
(
2002
).
What does a very large-scale conversation look life? Artificial dialectics and the graphical summarization of large volumes of e-mail
.
Leonardo
,
35
(
4
),
417
–
426
.
Scott
,
J.
(
2000
).
Social Network Analysis: A Handbook
(2nd ed.). London ; Thousands Oaks, CA:
Sage Publications
.
Smith
,
M. A.
(
1999
).
Invisible crowds in cyberspace: measuring and mapping the social structure of USENET
. In
M. A.
Smith
&
P.
Kollock
(Eds.),
Communities in Cyberspace: Perspective on New Forms of Social Organization
(pp.
195
–
219
). London:
Routledge Press
.
Smith
,
M. A.
(
2003
).
Measures and maps of Usenet
. In
C.
Lueg
&
D.
Fisher
(Eds.),
From Usenet to Cowebs
(pp.
47
–
78
). London:
Springer Verlag
.
Smith
,
M. A.
, &
Fiore
,
A. T.
(
2001
).
Visualization components for persistent conversations
.
ACM SIG CHI 2001
.
Tufte
,
E. R.
(
1995
).
Envisioning Information
(5th printing, August 1995 ed.). Cheshire, Conn.:
Graphics Press
.
Tufte
,
E. R.
(
1997
).
Visual Explanations: Images and Quantities, Evidence and Narrative
. Cheshire, Conn.:
Graphics Press
.
Turkle
,
S.
(
1995
).
Life on the Screen: Identity in the Age of the Internet
. New York:
Simon & Schuster
.
Viégas
,
F. B.
, &
Smith
,
M. A.
(
2004
).
Newsgroup Crowds and Authorlines: Visualizing the activity of individuals in conversational cyberspaces
.
Proceedings of the 37th Hawai'i International Conference on System Sciences
. Los Alamitos:
IEEE Press
.
Wasserman
,
S.
, &
Faust
,
K.
(
1994
).
Social Network Analysis
. Cambridge:
Cambridge University Press
.
White
,
H. C.
,
Boorman
,
S. A.
, &
Breiger
,
R. L.
(
1976
).
Social-structure from multiple networks.1. Blockmodels of roles and positions
.
American Journal of Sociology
,
81
(
4
),
730
–
780
.
© 2005 International Communication Association