This document provides a thesis submitted by Manuel Lima to fulfill the requirements for a Master of Fine Arts degree in Design and Technology from Parsons School of Design in May 2005. The thesis describes Blogviz, a visualization model created by Lima to map the transmission and structure of popular links across the blogosphere. Blogviz aims to understand the life cycle and diffusion of topics through the blogosphere by analyzing the most cited URLs in daily blog posts. The thesis provides background on relevant concepts like memetics, information diffusion, and epidemiology to contextualize the goals and methodology of Blogviz.
Ensuring Technical Readiness For Copilot in Microsoft 365
Mapping the Dynamics of Information Diffusion in Blogspace
1. blogviz
Mapping the dynamics of
Information Diffusion in
Blogspace
by Manuel Lima
A thesis document submitted in partial
fulfillment of the requirements for the
degree of Master of Fine Arts in
Design and Technology.
Parsons School of Design
May 2005
Thesis Instructor: Christopher Kirwan
Writing Instructor: Mark Stafford
Manuel Lima
lima@parsons.edu
www.blogviz.com
2. blogviz
Mapping the dynamics of
Information Diffusion in
Blogspace
by Manuel Lima
Abstract
Blogviz is a visualization model for mapping the
transmission and internal structure of top links
across the blogosphere. It explores the idea of
meme propagation by assuming a parallel with
the spreading of most cited URLs in daily weblog
entries.
The main goal of Blogviz is to unravel hidden
patterns in the topics diffusion process. What’s
the life cycle of a topic? How does it start and
how does it evolve through time? Are topics
constrained to a specific community of users?
Who are the most influential and innovative blogs
in any topic? Are there any relationships amongst
topic proliferators?
Keywords
Information Diffusion, Memetics, Weblogs, Online
Social Communities, Complex Networks,
Information Architecture, Information
Visualization, Diffusion of Innovations,
Epidemiology, Small Worlds
3. Acknowledgements
−
Scott Patterson
Jared Schiffman
David Kearford
Fura Johannesdottir
Thank you for your feedback
−
Christopher Kirwan
Mark Stafford
Thank you for your guidance,
openness and continuous motivation
−
My dearest Parents
Thank you for your eternal support
and dedication
4.
5. Table of Contents
1 Introduction 1
1.1 Concept 1
1.2 Memetics 3
1.3 Diffusion of Innovations 5
1.4 Epidemiology 10
12
2 Impetus
16
2.1 Subject of Analysis
18
3 Context
18
3.1 Online Social Communities
21
3.2 Weblogs
23
3.3 Blogosphere
24
4 Audience
26
5 Precedents
38
6 Methodology
38
6.1 Summer Research
39
6.2 Visual Explorations
42
6.3 Prototype #1
44
6.4 Prototype #2
47
6.5 Prototype #3
50
6.6 Prototype #4
53
6.7 Final Application
63
7 Technical Sources
63
7.1 Blog Engines
64
7.2 Blogviz Data
68
8 Conclusion
73
9 Bibliography
Appendix A
Summer Research Presentation
Appendix B
Complex Networks: Visual Explorations
6.
7. 1 Introduction
Blogging presents one of the most interesting social phenomenons of our time. This
change in the flow of online information might radically change the way we look at
news providers and large media conglomerates. It also provides an extraordinary online
laboratory to analyze how trends, ideas and information travel through social
communities.
1.1 C0ncept
Blogviz is a non-commercial research project developed with the intent of disentangling
this highly complex network for further study, research and analysis. The main goal of
Blogviz is to improve our understanding of the dynamics of information propagation
among weblogs.
An underlying question to Blogviz is: “How can we measure meme as a unit of cultural
evolution?”. The answer is not easy. Memes, due to their widespread trait and frequent
untraceable evolutionary track, become extremely hard to measure accurately. In
opposition to this commonly undetectable meme pool, the blogosphere offers a
discernible and documented map of thousands of memes, with clear trails of
progression, structured by date and time.
There are many possible ways of looking at information diffusion in blogspace. It can be
based on conversation threads, comment threads, key sentences, themes, tags, or top
links. Blogviz analyzes top links, occasionally called topics, which represent the most
cited URLs appearing in blog entries in any given day. These popular links represent
particular memes that provide an idea of sources, stories and themes that have
occupied the attention of bloggers over a certain period of time.
By exploring the evolution of these topics through time, Blogviz will not only able to track
its popular dispatchers and key innovators, but also, follow its dissemination pattern from
the beginning to an eventual tipping point, where it might leap the blog community and
reach the mainstream.
1
8. Blogviz embodies a flash driven interactive visualization model with extensive use of
information visualization and information architecture. Why is Information
Visualization central to Blogviz? Information Visualization can be defined as quot;the use of
computer-supported, interactive, visual representations of abstract data to amplify
cognitionquot; (Card, Mackinlay & Shneiderman, 1999). Information Visualization does not
only makes data easier for human interpretation but it also discovers and highlights
relationships in data elements, usually reducing the processes of searching by gathering
information in a small rich space.
Therefore, Blogviz employs Information Visualization with the key intent of uncovering
hidden patterns in the data and deriving plausible conclusions, which promote an
advanced knowledge of information dynamics in blogspace. By unraveling the modus
operandi behind the blogosphere we might be able to improve our knowledge on the
mechanics of online social communities and, to some extent, the mechanics of
complex social networks.
Blogviz is currently a portrait of blogosphere’s topic activity during the months of January
and February 2005. The selection of a time period was purely arbitrary. In order to make
this project a reality within the thesis development time limitations, a decision was made
in order to constrain the project to a more specific time span. Nevertheless, the model
was developed to easily incorporate different timeframes. Blogviz will continue to expand
in the future, to the possible point of including real-time data.
Blogviz uses existing data from three different blog search engines organized in a
database that will soon be available for public access. (see Technical Sources for
additional information)
2
9. 1.2 Memetics
From a conversation with my Thesis Writing instructor, Mark Stafford, I was able to
understand how my thesis had become closely related to the concepts of memetics or
meme behavior. We came to the conclusion that I was developing a “topological model
of meme activity”, even if until then I was somehow oblivious to it. That title actually
remained for a while when characterizing Blogviz. But later on I decided to change it,
since the word meme was slightly audience limiting and the expression topological could
result in inadequate interpretations. I still question why the notion of Memetics didn’t
came up in my research earlier, but what is particularly interesting is that it was there
from the beginning, immersed in every iteration of my work. I think I was too much
concentrated in the idea of a word-of-mouth behavior, an expression used by Malcolm
Gladwell in “The Tipping Point” and by Duncan Watts in “Six Degrees: The Science of a
Connected Age”.
The vital point is that Memetics is the principle theory when contextualizing Blogviz, and
because of that, understanding the theory of Memetics is a crucial measure to
comprehend the underlying concept of Blogviz.
1.2.1 What’s a Meme?
The term was first coined by Richard Dawkins’s, in 1976, on his notorious book “The
Selfish Gene”. In the words of Dawkins the word quot;memequot; refers to quot;a unit of cultural
transmission, or a unit of imitationquot;. More specifically, a meme can be defined as a self-
propagating unit of cultural evolution, a unit of information, held in an individual's
memory or in an outside artifact (e.g. book, record or tool), which is likely to be
communicated or copied to another individual's memory or retention system. Examples
of memes are ideas, catch-phrases, melodies, technologies, icons, theories, inventions,
languages, designs, fashions, and traditions. This covers all forms of beliefs, values and
behaviors that are normally taken over from others rather than discovered
independently.
A meme is basically a pattern of information that induces people to repeat it. People try
to “infect” each other with memes they find most appealing, despite of the memes'
objective value or truth.
3
10. 1.2.2 What is Memetics?
Memetics is the study of evolutionary models of information transmission based on the
concept of the meme. In spite of its roots in evolutionary biology and computer
simulation, memetics has become more of a social science, focusing primarily on the
spread of information within human society. Rather than debate the inherent quot;truthquot; or
lack of quot;truthquot; of an idea, memetics is largely concerned with how that idea itself gets
replicated.
Another definition of Memetics declares it is the theoretical and empirical science that
studies the replication, spread and evolution of memes. As portrayed in the Journal of
Memetics*: “It’s core idea is that memes differ in their degree of ‘fitness’, i.e. adaptation
to the socio-cultural environment in which they propagate. Because of natural selection,
fitter memes will be more successful in being communicated, ‘infecting’ a larger number
of individuals and/or surviving for a longer time within the population. Memetics tries to
understand what characterizes fit memes, and how they affect individuals, organizations,
cultures and society at large”.
Since the premise of Memetics is to investigate the evolutionary mechanisms that
determine the propagation of information within a population of human, animal or
artificial agents, we can easily perceive why this science is vital to the understanding of
cults, ideologies, or marketing campaigns of all kinds.
A meme is acknowledged as a self-propagating unit of cultural evolution, analogous to
the gene (the unit of genetics). And because of memes’ similar behavior to life forms,
Memetics embraces the analytical techniques of diverse sciences, such as,
epidemiology, evolutionary science, immunology, diffusion of innovations, linguistics,
and semiotics.
* Journal of Memetics (http://jom-emit.cfpm.org)
4
11. 1.3 Diffusion of Innovations
I believe any type of Information Diffusion Model (IDM) in Social Networks must derive
extensive practical knowledge from the sciences of epidemiology and diffusion of
innovations. These two domains help us understand many of the factors that
characterize the spreading of information and adoption process in social communities.
Epidemiology and Diffusion of innovations also share many similarities and are
surprisingly linked together. For these reasons I decided to include in this thesis a short
description of these areas, since in addition to the concept of Memetics, they create an
extraordinary context to the understanding of Blogviz.
I don’t make wide explanations of each domain but rather comparisons between them on
how they relate to this thesis’s assertion. In order to delineate a common ground for the
following definitions, this paper assumes that an innovation can be characterized as a
new meme, given that it is also described as a new idea. In the context of information
diffusion in the blogosphere, it assumes the process of adoption to be the process by
which a blogger, aware of the existence of a new meme (or innovation), decides to
mention it on his/her own personal blog, in the form of a post or part of a post. This
action can be understood as an “adoption” by the blogger of this particular unit of
information, therefore contributing to its replication.
The study of innovation adoption and diffusion has its origins in the Midwestern United
States. In an Iowa State University study, Ryan and Gross (1943) showed that the
pattern of adoption and diffusion of a maize hybrid was systematic, hence opening the
door for further research.
Diffusion is the process by which an innovation is communicated through certain
channels over time among the members of a social system (Everett M. Rogers, 1995).
The innovation includes quot;any thought, behavior, or thing that is new because it is
qualitatively different from existing formsquot; (Jones, 1967). The characteristics of an
innovation, as perceived by members of a social system, determine its rate of adoption.
Just by analyzing these last statements one can easily grasp a series or similarities with
the notion of Memetics. Even to the point that the theory of Diffusion of Innovations also
considers the unit of adoption not exclusive to an individual person, but extending to
other types of retention systems.
5
12. The four main elements in the diffusion of new ideas are:
(1) The innovation
(2) Communication channels
(3) Time
(4) The social system (context)
1.3.1 The Innovation
These are the characteristics that determine an innovation’s rate of adoption:
– Relative advantage
– Compatibility
– Complexity
– Trialability
– Observability to those people within the social system.
1.3.2 Communication Channels
A communication channel is the means by which messages get from one individual to
another. Mass media channels are more effective in creating knowledge of innovations,
whereas interpersonal channels are more effective in forming and changing attitudes
toward a new idea, and thus in influencing the decision to adopt or reject a new idea.
Most individuals evaluate an innovation, not on the basis of scientific research by
experts, but through the subjective evaluations of near-peers who have adopted the
innovation. (Everett M. Rogers)
In a broad sense, the communication channel in the context of Blogviz is indubitably the
Internet. Without it there wouldn’t even be any kind of communication between bloggers.
However, without blogrolls and posting citations within each blog, the restrict channels
among them would be very difficult to perceive. Blogrolls are the backbone of blog
communities, the edges that keep all the nodes interconnected, and therefore, are the
key factors in understanding how information develops across the blogosphere. In fact, a
major characteristic of online social communities is that they are based on
communication channels, not on physical co-location. A blogroll is a listing of websites
that often appear as links on weblogs, usually on a left or right frame of the page. This
list of links is used to relate the site owner's interest or affiliation with other webloggers.
6
13. 1.3.3 Time
The Diffusion of Innovations theory divides the element of Time in three main
dimensions, in which only two can be fully applied to the context of information diffusion
in the blogosphere.
> Innovation-decision – The innovation-decision process is the mental course of action
in which an individual passes from first knowledge of an innovation to forming an attitude
toward the innovation, to a decision to adopt or reject it, and if adopting it, to implement
this new idea and confirm the decision.
In the case of a blogger deciding to post or not a specific meme in his/her weblog, this
decision process is so fast that it’s almost impossible to measure. It applies to other
memes, and definitely to other innovations, but it’s not relevant as a measurement in top
links replication.
> Innovativeness – Innovativeness is the degree to which an individual is fairly faster in
adopting new ideas in relation to other members of a social system. Innovativeness, in
opposition to the innovation-decision process, is an extremely significant measurement
in top links replication, as in most information diffusion models.
There are five adopter categories, or member classifications of a social system, based
on their level of innovativeness:
– Innovators
– Early adopters
– Early majority
– Late majority
– Laggards
Bell-shaped curve showing categories of
individual innovativeness and
percentages within each category
7
14. Innovativeness among social systems is characterized by a bell-shaped curved where
time and incidence of adoption are the two main vectors. This concept, in the context of
Blogviz, is further explored in the Methodology chapter of this thesis.
Many search engines and community tools analyzing the blogosphere, assume a direct
correlation between blogs popularity and innovativeness. I believe this assumption is
incorrect. Their thinking is very simple. If a specific blog has a high number of inbound
links and therefore a sizeable readership, it must imply that it’s in the frontline in finding
and publishing original information. The HP Information Dynamics Lab study on the
“Implicit Structure and the Dynamics of Blogspace” (Eytan Adar et al) showed exactly
the opposite. The study demonstrated that popular blogs are rarely among the first ones
to start a specific trend. Many popular blogs claim most of their “discoveries” by not
citing their original source, which are usually smaller unfamiliar blogs. The level of
popularity of each blog might be directly related to its scale of influence, but not
necessarily to its level of innovativeness. So who are these unknown bloggers that bring
fresh ideas to the blogspace? Who are these innovators or trendsetters? Blogviz will
allow an exposure of these anonymous sources, crucial in the dynamics of topics
diffusion.
> Rate of adoption – The rate of adoption describes how fast an innovation is adopted
by members of a social system in a given time period. When mapping the cumulative
adoption time path or temporal pattern of a diffusion process, the resulting distribution
can generally be described as taking the form of an S-shaped (sigmoid) curve. Time and
cumulative adoption (or infected population) are the plot main vectors.
8
15. 1.3.4 The Social System
The fourth main element in the diffusion of new ideas is the social system, which
basically creates a boundary for the diffusion and adoption of an innovation to occur. A
social system is defined as a set of interrelated units that are engaged in joint problem-
solving to accomplish a common goal (Everett M. Rogers). The members or units of a
social system may be individuals, informal groups, organizations, and/or subsystems.
In regards to the replication of top links among weblogs, the social system is
undoubtedly the blogosphere, depicted as a fertile network of endless social
communities. This vast communication network consists of interconnected individuals
(bloggers) who are linked by shared interests and patterned flows of information.
At a first glance, considering the highly interconnected web of links, connections and
shared interests among bloggers, it might seem easy to understand the adoption
process of a particular unit of information or innovation. However, another crucial
conclusion exposed by the HP Information Dynamics Lab study, mentioned before,
declared that “for URLs appearing on at least 2 blogs, 77% of blogs do not have a direct
link to another blog mentioning the URL earlier. For those URL’s present on at least 10
blogs, 70% are not attributable to direct links”.
There have been several studies on how the system’s social structure, and norms or
established behavior patterns, affect the diffusion of innovations within a particular social
system. But another area of research that is closely linked to Blogviz relates to opinion
leadership. It can be described as the degree to which an individual is able to influence
informally other individuals' attitudes or explicit behavior in a desired way with relative
frequency. Blogviz allows a broad understanding of opinion leadership in blogspace by
tracking and exposing the most influential and innovative topic proliferators.
9
16. 1.4 Epidemiology
Throughout this thesis I use several times the terms contamination and infection when
describing the adoption process of memes. Even though this practice might lead to
unwanted interpretations, its use is not arbitrary, and it actually facilitates the
comprehension of information diffusion dynamics.
Epidemiology in its broadest sense is the study of disease patterns in human
populations (Wikipedia). Epidemiology can also be described as the study of the
determinants, occurrence, and distribution of health and disease in a defined population.
Infection is the replication of organisms in host tissue, which may cause disease. A
carrier is an individual with no overt disease who harbors infectious organisms. And the
notion of dissemination is understood as the spread of the organism in the environment.
In the above description, regardless of the different terms, we start noticing several
similarities with the domain of diffusion of innovations. This analogy is even more explicit
when characterizing the three major elements in disease occurrence, the so-called chain
of infection:
(1) The etiologic agent (parallel to the innovation)
(2) The method of transmission (parallel to the communication channel)
(3) The host (parallel to a unit of a social system)
Further along in characterizing the disease evolution, the epidemiologic descriptive study
organizes data by time, place and person. It is unquestionably the closest approach to
the concept of Information Diffusion. It divides the element of Time into four main trends;
respectively, secular trends, periodic trends, seasonal trends and epidemics. What’s
interesting in this typology of Time is that it applies equally well to the evolution of top
links across the blogosphere. Because of that I assume a series of parallelisms between
them.
The secular trend describes the occurrence of disease over a prolonged period. This
continual development is less usual then the seasonal trend in the context of blogspace.
This trend usually describes commercial or very popular websites that never lose entirely
the bloggers’ interest and as a result have a continuous existence among them.
10
17. The periodic trend basically expresses a temporary modification in the overall secular
trend. It conveys a sudden new interest in a specific meme that is part of a continual
trend.
The seasonal trend reflects seasonal changes in disease occurrence following changes
in environmental conditions that enhance the ability of the agent to replicate or be
transmitted. This short transitory trend is the most common in blogspace. A new meme
that spreads quickly and rapidly loses interest, dying in a short period of time.
The epidemic incidence of a disease happens generally when it surpasses a threshold
of 7% of the target population. An epidemic is a sudden and boost in occurrence due to
prevalent factors that support transmission. An information epidemic in blogspace might
originate a tipping point, where a specific meme escalates and leaps the blogspace,
reaching the mainstream.
11
18. 2 Impetus
The main source of motivation for my thesis development is based on a solid
cooperation between Information Diffusion, Information Architecture, Data Visualization,
and the Science of Complex Networks.
My curiosity in Information Architecture was initially fostered in Christopher Kirwan’s
MFADT class in the Spring of 2004, and since then, it became a major subject of interest
and awareness. I remember observing for the first time a diagram with four
interconnected circles representing the continuous Understanding Spectrum. Data
originates information, which leads to knowledge and ultimately to wisdom. This concept
influenced my vision and made me reflect on the responsibly I had, as a designer, to
contribute to this spectrum.
The Understanding
Spectrum
Nathan Shredoff
We may have access to an abundance of information but I strongly believe we lack the
ability to process it effectively. In face of contemporary technological accomplishments,
our ability to generate and acquire data has by far outpaced our ability to make sense of
it. Neither raw data nor scattered information offers any level of meaningful
understanding. This is where Information Architecture and Information Visualization
undertake an important mission. If we are truly entering a fourth phase in human-kind, a
theory defended by a large number of anthropologists and sociologist, then Information
12
19. Architecture is going to be a golden key in the process. In a world increasingly driven by
information, it rapidly assumes the form of power, and typifies society in terms of those
who own it and those who don’t. Meaningful information is not a given fact, and
particularly now, when our cultural artifacts are being measured in gigabytes and
terabytes, organizing, sorting and displaying information, in an efficient way, is a crucial
measure for intelligence, knowledge and wisdom.
In the Spring 2004 semester I was involved in two projects that were decisive in the
delineation of my thesis domain of interest and my increased alertness towards
Information Architecture and Information Visualization. The first one was a group project
developed at the Information Architecture class, taught by Christopher Kirwan. Self-
Replicating Cloners was a project aimed at producing visualizations of Virus, their
progression through time and world scale dissemination. Two viruses were analyzed by
comparison, SARS and MyDoom, each one representing its underlying field, human
biology and computer technology.
Self-Replicating Cloners
Visualizations of Virus (biological/computer generated),
their progression through time and worldscale dissemination
13
20. The second point of awareness was a group project developed in a collaboration studio
with Siemens Corporate Research Center. Aimed at Siemens Medical, DSS –
Disease Surveillance System was a visualization and communication tool that shared
symptomatological data between hospitals and health care professionals for detecting
possible disease outbreaks and recognizing development patterns nation wide.
DSS – Disease Surveillance System
After these two particular experiences, I started my summer research with some clear
interests in mind, but still scattered through distinct areas such as artificial life, virology,
cognitive science, genetics, cyber biology, epidemiology, and pattern recognition.
Emergence, by Steven Johnson, was the first book I read in my research and it was a
surprising start. The paradigm of Emergence, which can be described as a “higher-level
pattern arising out of parallel complex interactions between local agents”, was slowly
overflowing my mind with bright new discoveries. And with an augmented motivation, I
started gradually abandoning some initial ideas and, in other cases, finding common
links between them, under the sciences of complexity and self-organization. The search
for answers on how order can emerge from disorder, and organization emerge from
chaos, guide me to initiate a study on the individual parameters of emergent systems,
such as collective/macro behavior, self-organizing communities and bottom-up
hierarchy.
This research led me inevitably to complex systems. Delving into this new area was
even more thrilling. Finding each day, a common structure in apparent distinct fields, or
similarities between natural systems and human designs, was beyond doubt
overwhelming. From that point on, I became extremely fascinated with the omnipresent
14
21. web of signals and interactions, nodes and links that shape modern complex networks,
from social networks, to corporations, cities, living organisms and the Internet.
Complexity is a challenge by itself. Complex Networks are everywhere. It is a structural
and organizational principle that reaches almost every field we can think of, from genes
to power systems, from food webs to market shares. Paraphrasing Albert Barabasi, one
of the leading researchers in this area, “the mistery of life begins with the intricate web of
interactions, integrating the millions of molecules within each organism”. Humans, since
their birth, experience the effect of networks every day, from large complex systems like
transportation routes and communication networks, to less conscious interactions,
common in social networks. A Scale-Free network, the most common topology in either
natural or human systems, is curiously enough, a very recent breakthrough. Since its
discovery, 6 years ago, dozens of researchers worldwide have been disentangling the
networks around us at an amazing rate. This awareness is helping us understand not
only the world around us but also the most intricate web of interactions that shape the
human body. The global effort of constructing a general theory of complexity is
tremendous and may lead us, not only to a structural understanding of networks, but to
major improvements in stability, robustness and security of most complex systems
around the globe. Like Barabasi refers in Linked, “Once we stumble across the right
vision of complexity, it will take little to bring it to fruition. When that will happen is one of
the mysteries that keeps many of us going”.
The feature that has always fascinated me the most in complex networks is the
dynamics of Dissemination Patterns. The visualization of a path, and inherent duration,
of a certain fad, idea, or virus, in a social/biological or computer network has been, since
the beginning, a critical point of awareness. How does a particular contagion travel from
point A to B, which nodes it affects in its course, and how fast if contaminates a large
cluster or the entire network.
15
22. 2.1 Subject of Analysis
After my summer research presentation, in the beginning of the Fall 2004 semester,
where I showed all the collected knowledge in the domain of complex networks, I went
even further on observing and collecting dozens of network visualization examples and
trying several open-source applications. This investigation resulted on my second official
presentation. Part of this research also coincided with the work I was developing as a
design researcher at Parsons Institute of Information Mapping (PIIM). For additional
information on this study please consult section 6.2 of chapter 6 – Methodology.
After the second official presentation I was sure of two things:
1 – I wanted to continue my visual explorations exercise, by gathering problems and
inconsistencies in complex network diagrams and proposing plausible solutions.
2 – I wanted to map a dissemination pattern in a specific network. By doing that, I
intended, not only to be innovative and bring something new to the field, but also display
a ‘showcase’ of my visual thinking in terms of complex networks visualization.
The first objective was well defined, and best of all, already under development. The
major problem was finding a solution for the second point. I had to hit upon a subject that
represented all the research and knowledge I had gathered through the summer and the
beginning of the Fall 2004 semester. Finding an answer to this quest seemed an
impossible task, due to the vagueness of possible directions. At a certain point it was as
if I had came back to the start, with the fearful blankness of June assaulting my mind
once again. Time was urging and I knew whatever subject I chose, I was still facing an
enormous workload ahead of me. The first thing I decided was to go back to my initial
interest, the main cause that led me in this escalating exploration of complex networks. I
quickly found out my early motivations: virus dissemination and relationships between
social/biological and computer/technological systems.
One thing I discovered on my summer research is that ideas, fads, trends and
innovations show similar dissemination patterns as virus in social networks. The concept
of word-of-mouth is a fascinating diffusion behavior that has always intrigued
psychologists, sociologists, anthropologists, and lately marketers. To be able to map a
word-of-mouth epidemic in a specific social network is a blue-sky scenario. And that
might be true, in relation to physical interactions in a physical world between physical
16
23. individuals. However, a flourishing movement on the Internet presents an interesting
experimental laboratory to explore this behavior. Blogging embodies an incredible case
of word-of-mouth, where news, ideas and fads travel through community clusters with
high adoption rates. Because of their inherent nature blogs became my ultimate fixation
and the main frameset for my Thesis. Their high interconnectivity and shared flow of
information represent not only an obvious case study of meme propagation, but an
outstanding example of a dissemination pattern in a increasingly high complex network,
estimated to be over 8 million nodes.
As an example, I’ll mention a topic that emerged from the blog community in the
beginning of October, 2004. On the first presidential debate for the US Elections 2004,
on September 30, 2004, between President George W. Bush and Senator John Kerry,
there was an episode that got the attention of a particular viewer. “You forgot Poland”
was the abrupt statement made by George W. Bush while John Kerry was enumerating
the allied forces present at the Iraq War. The presidential debate occurred on a Friday
evening, September 30, and on the following Monday night, there was a topic already
sharing 12 links among bloggers. This topic pointed to a specific URL –
http://www.youforgotpoland.com. By that time, less than 72 hours after the debate,
someone had already created a domain (youforgotpoland.com) and was selling online t-
shirts and stickers with the same sentence. A new meme had been born and in a short
period of time “infected” several people.
This intriguing example reveals the accelerating rate of information flow among bloggers
and how fast it spreads or “contaminates” online blog communities. Another issue of
awareness, demonstrated by this example, is the possibility of tracking a possible
outburst. Imagine this topic reaching the mainstream a week later, possibly a major
newspaper or a particular TV show. How interesting would it be, to actually go back in
time and discover where this outbreak first originated, the way it was adopted and how
fast it grew?
These last two queries have undoubtedly become a crucial motivation for the
development of my thesis. Quoting Duncan Watts, in regard to the mechanics of social
networks: “To understand the pattern, we need to delve further into the rules by which
individuals make decisions, and how, in the process, our apparently independent
choices become inextricably bound together.”
17
24. 3 Context
The contextual narrowing of my thesis proposal starts on the broad area of Complex
Networks, tights its limits on Social Networks and ends at its ultimate contextual
boundary, Online Social Communities.
Even though this Thesis proposition places itself on the center of a broad group of
domains, I decided to deeply explore its closest and more direct domain – Online Social
Communities, and the main subject of analysis – Blogs. Nevertheless, besides the
omnipresent field of complex networks, the context of this thesis incorporates the
domains of Information Diffusion, Memetics, Information Architecture, Data
Visualization, Information Theory, Diffusion of Innovations, Epidemiology and
Small Worlds.
3.1 Online Social Communities
Online Social Communities, although much more concise than the Science of Complex
Networks, is still a wide-ranging field that can include mostly every type of online inter-
personal communication medium, from e-mail listings/threads, to Usenet groups, MUDs,
chat environments, instant messaging, community forums, weblogs, online gamming,
interest groups, among others.
Online Communities offer an interesting change on the parameters that until now have
defined social interaction. Several years after Milgram’s notorious small-world test,
Russell Bernard and Peter Killworth did what they called a “reverse small-world
experiment”. They interviewed hundreds of individuals, explaining Milgram’s experiment
and asking them what personal criteria would they use to get a specific package to
someone they didn’t know. Bernard and Killworth’s study found that most of the subjects
used only a couple of dimensions to get their message sent to the next recipient. Most
predominant dimensions were geography and occupation.
Jon Kleinberg, a computer scientist who attended Cornell and MIT, was also motivated
by Milgram’s small-world study, and questioned how did the individuals actually found
the paths within the network. Kleinberg concluded that people have generally a strong
sense of distance, which they use to distinguish themselves from others. A notion of
18
25. distance can have several factors in which geographical distance is just one of them.
Profession, race, religion, income, class, education, are other elements added to the
equation, that describe how distant a specific person is from us.
From the beginning of human existence, communities were created for the benefits of
their own members. Usually by means of expediency, either in relation to the exchange
of goods or improved security against enemies, these groups of people occurred as
emergent systems by means of social convenience. Geography always played an
essential role and without a common shared space most of these communities wouldn’t
even exist. With the posterior developments of mail, and more recently, telephone, telex,
and fax, human communication became highly enhanced and geography started
diminishing its major influence. However, these new “technologies” only improved the
way people communicated with each other, by giving them more tools and decreasing
the time span and subsequently the distance; other then that, there were no major
changes in the way social communities were formed. No matter how fast and easy it
became for someone in Europe to talk with someone in America or China, there were
never communities created on the basis of telephone calls.
If we explore the word syntax structure of most communication tools prior to the Internet,
such as telegraph, telex, telegram, and telephone, we encounter the constant presence
of the prefix tele-. Tele is a greek word that means “at a distance”, usually implying “to
be distant” or “over a distance”. The first use of the prefix tele was in the word telescope
which was actually adapted from Galileo’s Italian word telescopi, followed by the word
telegraph, meaning “writing at a distance”. Therefore, Telecommunications is the field
that embodies all the systems that intent to communicate “at a distant” or “over a
distance”. Once again we see the importance of geography as a crucial domain for
human communication, where the advancement of technology, since the beginning, has
been trying to diminish its constraints, by allowing people to communicate over an ever-
present and disturbing distance. I find this analysis particularly interesting in such a way
that the Internet, and all features associated with it, has completely abandoned the prefix
tele-, drastically assuming the medium, and replaced it with the prefix e-. From e-mail, to
e-commerce, and e-business, the prefix e- is usually associated with the latest heat of
technological revolution, an abbreviation of the word electronic and an obvious
association with the word cyber.
19
26. The advent of the Internet and the World Wide Web changed these secular communal
constraints, possibly forever. The Internet became not just a medium for social gathering
and communication, but it absorbed it, and the medium became truly the message. The
transmission of information on the Internet is regularly measured in milliseconds, and the
time it usually takes for a message to leave a computer in Tokyo and arrive at a
computer in New York is more or less the same as a message sent to you, from your
next-door neighbor. The difference is merely a few milliseconds, which is by itself a
measurement difficult to perceive. Geography, as a crucial criterion for the birth of social
communities, has been utterly disregarded by online social communities. Without the
limitations of geography and physical interaction and identification, online communities
had to rely on a more abstract, but equally distinguishing criteria, interests. By analyzing
most current online communities, from online players to chat rooms, blogs and
newsgroups, we find out that in the absence of physical recognition, social values like
trust, confidence, respect and even friendship are ultimately based on a set of shared
interests. And of course, this “virtual” interaction would not be possible without specific
communication channels, portrayed as technological sub-systems of the larger medium,
the Internet.
Personal interests are a central element of our social identity, and subsequently, a highly
considered factor in relationships. Paraphrasing Duncan Watts in regards to peer-to-
peer networks, “social identity is what leads networks to be searchable”. The fabulous
aspect of online communities is the possibility of not only searching these clusters of
shared interests, but also tracking the exchange of conversations, ideas and messages
between them. By analyzing this data, it’s possible to understand, to some extend, how
information travels through these virtual environments. Weblogs, in this conjecture,
represent units of a remarkable social laboratory. It’s relatively easy to track their
connectivity, but also, due to their highly clustering nature, it’s possible to examine in
specific communities, how do news and trends travel through individual bloggers.
20
27. 3.2 Weblogs
Weblogs (alternate: blogs) are not just a new fad among Internet users and they are
much more than a collection of online digital diaries of spread interest groups. Blogs
represent a change in online information flow and they are becoming a rising news
source for many people. We might not even be aware of how influential blogs will be in
the future but one thing is sure, there are currently blogs with close to half a million
visitors a day, more than many large newspapers, magazines and news broadcasters.
Jorn Barger coined the term in 1997 and in 1999 Peter Merholz coined its alternative
abbreviation “blog”. As Jorn Barger stated:
quot;Weblogs are often-updated sites that point to articles elsewhere on the web, often with
comments, and to on-site articles. A weblog is kind of a continual tour, with a human guide
[whom] you get to know. There are many guides to choose from and each develops an
audience. There's camaraderie and politics between the people who run weblogs. They
point to each other in all kinds of structures, graphs, loops, etc.quot;
The most common definition of a blog is that of an online diary of thoughts, links, events,
or actions posted on a web page with a dated log format. These posts are often, but not
necessarily, in reverse chronological order, and are updated on a daily or very frequent
basis with new information about a particular subject or range of subjects. Despite this
dry classification, the usefulness of a weblog is incredibly rich.
Blogs are the vital elements of the personal publishing revolution. If we go back a few
years, before the rise of online publishing, the only way someone could write something
for general public would be through a letter to the editor, and hope for its message to be
published in the magazine’s next issue. For the first time in the history of human
communication, any single person has the opportunity to reach millions with their
message, as the cliché proclaims, with “the touch of a button”. Instead of being passive
consumers of information, Internet users are becoming active participants. This power to
the people is debatably a positive trend, since many people subjectively consider this
measure adds to the existent “junk” flowing on the Web. Since most blogs don’t obey to
any kind of editorial process or peer review and sometimes “play” with anonymity, their
public posts also raise legal concerns about intellectual property, defamation, and alike.
21
28. Controversies apart, blogs, as the World Wide Web, are free democratic resources that
embody the concept of free speech, which is unquestionably a right for all.
Blogs also exemplify the true concept of diversity. Besides being oblivious to who might
use this personal tool, blog content is as varied as the Web itself. The authors of
Essential Blogging explain this diversity by pointing out that “creating a taxonomy of the
blogiverse is a fruitless task”, since “there’s no good, central directory of blogs that puts
each one in its own pigeonhole, because even the most topical blogger will stray from
the subject from time to time to celebrate some personal victory or warn his readers off a
terrible movie”.
One might also argue that in fact, this personal publishing revolution started with the first
website, and consequently with the birth of the Internet. This is obviously true, however,
until the first blog publishing tools became available, anyone who wanted to circulate
their own ideas online, had to be fluent in HTML, web hosting, and aware of most
webdesign applications available. Even after GeoCites launch in 1996, offering free web
hosting to non-commercial personal pages, web pioneers had to be HTML-savvy people
who would spent the evenings working on their websites. Also, these few personal
webpages that start populating the Web in the mid 90’s were just a scattered collection
of isolated opinions, with no regular updates and unconnected from each other. The big
blog phenomenon started escalating in the summer of 1999, when a small web company
called Pyra Labs released a product called Blogger. From that point on the blog
community exploded and the more bloggers came into scene, more online blog tools
became available. This was the beginning of the personal publishing revolution.
The inclination towards personalization is reaching every industry, from clothing to cars,
from software to medicine. News and Information are just new elements added to the
equation. In my opinion, the reasons why many blogs are so successful are due to two
major factors: personalization and comforting lassitude. Blogs are usually maintained by
a single person who filters the huge amount of available information according to his/her
own preferences. For people who share common interests with the blogger, it’s not only
exciting to get information from that source, since it’s going to match their inclination to
some degree, but it also saves them a lot of time by avoiding the large, more abstract,
and sometimes incongruent, news sources. In countries such as the US, where large
media sources are becoming increasingly dry and biased, blogs might also represent an
oasis of independent information.
22
29. 3.3 Blogosphere
Blogosphere (alternate: blogsphere), or blogspace, is the collective term encircling all
weblogs (alternate: blogs). It’s almost impossible to determine with precision the existing
number of weblogs, or even the ones currently active. Technorati is a leading search
engine for the blogosphere, similar to Google or Yahoo, but exclusive to blogs.
Technorati, as of February 2005, was tracking 7,245,866 blogs, and this number is far
from stagnating. Out of curiosity, when reviewing this paper on April 6, 2005, I checked
Technorati to see how the latest number had changed. To my not-so-surprised
amazement, Technorati declared to be tracking 8,469,023 weblogs. It translates in an
increase of more than 1 million blogs in less then two months.
The latest Pew Internet study estimates that about 27%, or about 32 million, of American
Internet users are regular blog readers. They say a new weblog is created every 2.2
seconds, which means there are about 38,000 new weblogs a day. Bloggers update
their blogs regularly; there are about 500,000 posts daily, or about 5.8 posts per second.
When we’re faced with a number of blogs higher than eight million (at least), it becomes
hard to consider its whole as a single community. The blogosphere, in analogy to its
medium, the Internet, does not represent a single community but a vast collection of
endless communities. These communities shape a complex web of more than 8 million
nodes and are key factors in the outburst and further development of trends, fads and
innovations. Also, due to its inherent diversity, any kind of classification regarding the
blogosphere is a mere exercise of oversimplification.
23
30. 4 Audience
Scientists/Researchers on Complex Networks
Hopefully, Blogviz will offer a significant step in this long scientific journey towards the
understanding of the dynamics of complex networks. To all researchers, academics, and
scientists that have been persistently and bravely disentangling the networks around us,
I truly hope this model can produce one important footprint in this expedition. It doesn’t
have to be gigantic, just one step forward. By bringing my visual expertise and interest in
Information Architecture, Data Visualization and Interface Design, I expect to make a
small corner of the vast Science of Complex Networks more clear and understandable.
This corner embodies the domain of Online Social Communities and the phenomenon of
blogging.
Sociologists
Professionals, Researchers, Faculty and Students. Blogviz will offer an interesting case
study for analyzing a dynamic, ever-changing and complex online social network – the
Blogosphere. To map a word-of-mouth spreading in social communication has been,
until now, an almost fruitless task. Blogs in the other hand offer an engaging
experimental laboratory to better study and understand this occurrence. Memetics is an
expanding field of study in social sciences, which is being explored by a significant
number of researchers. Blogviz, by making a parallel between meme propagation and
topics diffusion in blogspace, makes an important contribution to the understanding of
Memetics.
Information Architects and Data Visualization enthusiasts
Professionals, Researchers, Faculty and Students. I look forward that my passion and
fascination for the field of Information Architecture and Data Visualization can be
reflected in my thesis project. I truthfully hope that Blogviz can be a relevant precedent in
some of your projects, deserve a mention in your research, inspire or influence you at
some level.
24
31. Cultural Critics
Blogging presents one of the most intriguing and captivating phenomenons of our time.
We might be in for a long ride in the adulteration of most publishing media
conglomerates. We cannot really predict the ultimate result of this major drift in the flow
of online information, but one thing is sure, it has already started. Blogviz will offer an
enhanced insight on the mechanics of this contemporary revolution.
Marketers
Possibly, the only open door to an eventual commercial viability for the application is
based on its relevance for the Marketing industry. Even if Blogviz is a non-commercial
research project, it is reassuring to know that it’s potentially useful outside the research
and academic realms. Like sociologists, marketers have become more and more
interested in the word-of-mouth behavior, even though the more traditional marketing
strategists haven’t minimally explored this concept. In the blog community, most
bloggers are incorporating the idea of syndication in their blogs, in the form of a data
XML file, called RSS, which is basically a list of post summaries and links to them.
These files can then be interpreted by a desktop application called a RSS Aggregator,
and read by the user without the need to access the specific website. Some consider
RSS to be the future of news distribution, and that might well be the case, which
explains why, as in any communication medium, advertisement is now starting to
infiltrate RSS Feeds. The potential use of Blogviz in this assertion is huge. Marketers
interested in investing in the best RSS blog sources for advertisement, could easily track
most seen blogs, locate the innovators, the followers, the major dispatchers of
information, and then explore the conclusions accordingly.
Bloggers
Blogviz is a visualization model build to better understand the information dynamics
within the blog community. By that order, any interested blogger who feels the need to
comprehend the underlying network that he’s part of is a potential user of my research
project.
25
32. 5 Precedents
The chain of influences and inspiration for my thesis project is, as expected, extremely
widespread and goes from new media art, information architecture, data visualization,
complex networks, interface design, among so many other fields, and life in general.
Even if I started enumerating major key thinkers whose work I admire and respect, and
subsequently absorbed for myself, I expect many names would still be unmentioned
from the extensive list of people. In enunciating the key precedents for my thesis, I
concentrated exclusively in projects developed in the area of Online Social
Communities, my closest encircling thesis domain. Since the major goal of my thesis is
to visually map a specific diffusion pattern and the connectivity among blog communities,
I decided to establish as precedents, projects that make extensive use of a visual
structure to portrait their field of research.
5.1 Blog Epidemic Analizer
Authors: Eytan Adar, Li Zhang, Lada Adamic, Rajan Lukose
Institution: HP Information Dynamics Lab
URL: http://www.hpl.hp.com/research/idl/papers/blogs/index.html
Description:
HP Information Dynamics Lab created the Blog Epidemic Analyzer as part of their
research on information propagation. They released their paper “Implicit Structure and
the Dynamics of Blogspace” as a result of this research. Eytan Adar, Li Zhang, Lada
Adamic, and Rajan Lukose, used the search engine BlogPulse to map the behavior of
the blog community from May 11 to May 21, 2003.
Relevance:
This project is the closest to my thesis ambition and it obtained exciting results that
became pertinent in selecting specific parameters for my work. Although highly useful as
a research project, their few tryouts in terms of visualization were extremely poor. Their
major breakthrough was announcing that the most popular blogs are not the most
innovative, by commonly “stealing” news and information from smaller, less-known blog
sources. I believe it’s a very significant allegation that decisively influences the way we
understand the mechanics of blog communities.
26
33. 5.2 Loom2
Authors: Danah Boyd, Hyun-Yeul Lee, Ethan Perry
Institution: Sociable Media Group - MIT Media Lab
URL: http://smg.media.mit.edu/projects/loom2/
Author’s Description:
“The goal of our research is to use the salient features of social interaction to build a
‘legible’ interactive visual representation of Usenet. We started by exploring the Usenet
environment, constructing a series of relevant questions. From the questions, we have
started to explore how this information can be derived from the textual data available
online. Simultaneously, we have started designing segments of visualization, under the
assumption that the desired characteristics were ascertainable.”
Relevance:
This project is a major aesthetical inspiration. I believe the use they make of a radial
structure fits the purpose of the project quite well, where specific degrees relate to a time
dimension and nodes’ colors to specific theme categories. Usenet represents a subject
of analysis closely related to blogging, since message/post threads in newsgroups have
a similar pattern of contamination as topics among the blogosphere. For the construction
of their appealing visual models it’s not surprising the amount of work they had to
undertake: “To build our designs, we drew on a wide variety of theoretical and practical
concepts from a range of fields, including graphic and interactive design, architecture,
sociology, and computer animation.”
27
34. 5.3 Social Network Fragments
Authors: Danah Boyd, Jeff Potter
Institution: Sociable Media Group - MIT Media Lab
URL: http://smg.media.mit.edu/projects/SocialNetworkFragments/index.html
Description:
“Social Network Fragments was developed as a self-awareness tool for individuals to
explore the social networks that they create without structural consideration”. Its goal
was to “help users examine their structure so as to unveil the structural holes that are
built in such complex networks. These structural holes exist when users choose to
fragment portions of their network, often revealing facets of their own identity. As an
individual interacts with a diverse range of people, they are motivated to reveal different
aspects of their identity, thereby creating a multi-faceted social identity, whereby
different people know different things about the individual. In engaging in this behavior,
individuals start to segment their social network into a variety of different clusters, or
types of people.”
Relevance:
The visualization of social networks undertakes a major leap in many of the projects
produced by the Sociable Media Group (SMG) at MIT Media Lab. With some amazing
visual displays the SMG “investigates issues concerning society and identity in the
networked world”, addressing questions such as “How do we perceive other people on-
line? What does a virtual crowd look like? How do social conventions develop in the
networked world?”. Social Network Fragments aims at something so extraordinary as
mapping someone’s unnoticed social network. Although it may seem simple and intuitive
to track any individual connections to others, this project tries to reach further more then
the immediate first-degree acquaintances, by reaching a friend-of-a-friend network.
28
35. This approach to small world theory has been pursued by some companies, which sell
products focusing on social networking management. The idea is simple: don’t just get to
the people you know, get to the people they know. Manage your friend-of-a-friend
network in order to find the shortest path for whatever you’re looking. Among the leading
companies incorporating this concept are: Spoke Software, Visible Path, SRD and In-Q-
Tel. Social Network Fragments offers a reasonable visual solution, where I believe some
improvements could be implemented. By emphasizing the visual criteria solely on text,
color and depth (simulated 3rd dimension), the interface becomes somehow limited to
fully explore its content.
5.4 PostHistory
Author: Fernanda Viégas
Institution: Sociable Media Group - MIT Media Lab
URL: http://web.media.mit.edu/~fviegas/posthistory/
Author’s Description:
“Most of us deal with email on an everyday basis and some of us have been doing so
for several years. Nevertheless, it is hard to perceive the accumulation of this frantic
activity, it is hard to get a sense of the number of messages sent and received, not to
mention how difficult it is keeping track of how many people have written to you or
received messages from you. The aim is to provide users with a novel and hopefully
richer experience of their email activities. PostHistory represents an opportunity for
reflection and insightful monitoring of fundamental patterns of interactivity. The
visualization aims at impressing on the user a sense of daily accumulation, of growth
and scale – dimensions not normally conveyed on current email applications.”
29
36. Relevance:
Fernanda Viégas, a brazilian graduate student at MIT Media Lab, is a prolific new media
designer that has been involved in many relevant projects. PostHistory is one of her
best. What I find most interesting in this project is the series of new structures and
features she proposes in order to better understand the pattern created by e-mail
activity. This project is visually innovative and it’s a quite an impressive contribute to the
field of Information Visualization. Another project conceptually related to PostHistory is
Thread Arcs, a fresh interactive visualization technique designed to help people use
threads found in email. Thread Arcs, which resulted in a published paper, is a truly
interesting visual approach to e-mail threads and even to small sized graphs. This
concept is part of a major E-mail Application developed by the Collaborative User
Experience team at IBM Research. ReMail is being developed for almost a decade and
it aims at improving the knowledge of how people use e-mail, and also, make that
experience more functional and straightforward. Some of its features are very
encouraging.
Thread Arcs
ReMail (IBM Research)
30
37. 5.5 Social Circles
Author: Marcos Weskamp
URL: http://marumushi.com/apps/socialcircles/
Author’s Description:
“Social Circles intends to partially reveal the social networks that emerge in mailing lists.
The idea was to visualize in near real-time the social hierarchies and the main subjects
they address. When subscribing to a mailing you never know who the principals are,
how many people are listening or what subjects they are talking about. It's like entering a
meeting room with plenty of people in the darkness and then having to learn who is who
by just listening to their voices. Social Circles does not pretend to be a statistical
application, but rather aims to raise the lights in that room just enough to let you
enhance your perception of what’s happening.”
Relevance:
Marcos Weskamp is a key thinker in digital information design and a major personal
influence. Newsmap, Weskamp’s most famous project, and one of the best online
examples of data visualization, gathers google news and displays it in an innovative tree
structure map in several languages (http://www.marumushi.com/apps/newsmap). In
Social Circles, even thought Marcos Weskamp doesn’t push the project far from the
most common network visualization schemas, its concept is very strong, particularly in a
recent version of it, where the user can map its own inbox of e-mail messages.
31
38. 5.6 WebFan
Author: Rebecca Xiong
Institution: Sociable Media Group - MIT Media Lab
URL: http://www.sbox.tugraz.at/home/k/koebi/WebFan%20Description.htm
Author’s Description:
“WebFan visualizes user activities at WebBoards, or Web-based message boards,
which contain messages posted by users. It uses the reply structure of the messages to
lay them out using a fan-like hierarchical structure. This abstract structure allows a large
set of Web pages with multiple levels to be represented at the same time for overview
and comparison. Users can also interactively explore the fan structure to find out more
about individual pages. Dynamic user activity is overlaid on top of this display.”
Relevance:
“Currently, Web users have little knowledge about the activities of fellow users. They
cannot see the flow of on-line crowds or identify centers of on-line activity.” WebFan
seeks to enrich this experience by visualizing the activity of other people in the
message boards. I believe this is a very relevant project, particularly for the
unconventional medium of WebBoards, that Rebecca Xiong chose to map. WebFan
relates to my thesis project by visualizing overall patterns of usage and answering
questions such as: What are people looking at? What is hot? Where do clusters of
similar interests form?
32
39. 5.7 Visual Who
Author: Judith S. Donath
Institution: Sociable Media Group - MIT Media Lab
URL: http://smg.media.mit.edu/people/Judith/VisualWho/VisualWho.html
Author’s Description:
“The population of a real-world community creates many visual patterns. Some are
patterns of activity: the web and flow of rush hour traffic or the swift appearance of
umbrellas at the onset of a rain-shower. Others are patterns of affiliation, such as the
sea of business suits streaming from a commuter train, or the bright t-shirts and sun-
glasses of tourists circling a historic site. Visual Who makes these patterns visible. It
creates an interactive visualization of the members’ affiliations and animates their
arrivals and departures. The visualization uses a spring model. The user chooses
groups (for example, subscribers to a mailing-list) to place on the screen as anchor
points. The names of the community members are pulled to each anchor by a spring,
the strength of which is determined by the individual’s degree of affiliation with the
group represented by the anchor”.
Relevance:
Visual Who, besides offering a motivating contextual precedent in relation to social
networks, portraits a tempting method of mapping social connectivity among a set of
individuals. It offers an interesting approach to pattern recognition and visualization,
although I think it suffers from the same inconsistencies pointed out in the Social
Network Fragments project.
33
40. 5.8 Avatars 2002
Authors: Katy Börner, William Hazlewood, Sy-Miaw Lin
Institution: School of Library and Information Science, Indiana University
URL: http://ella.slis.indiana.edu/%7Ekaty/gallery/
Description:
This project originated a research paper: “Visualizing the Spatial and Temporal
Distribution of User Interaction Data Collected in Three-Dimensional Virtual Worlds”. The
project is a visualization of the social patterns in the Culture virtual environment, part of
the Quest Atlantis universe. The map shows user trails over time. It was produced using
a visualization tool developed by Katy Börner and colleagues at the School of Library
and Information Science, Indiana University.
Relevance:
The particular relevance of this project relies on its visual pattern analysis. I think the
underlying concept of being able to visually recognize different user trails on a 3D online
game is extremely captivating. In a virtual game, many times played with unknown
faces, the notions of time and space alter considerably, which makes this project
particularly challenging by trying to recreate a defined user trail pattern throughout a
physically undefined space.
34
41. 5.9 PeopleGarden
Author: Rebecca Xiong
Institution: Sociable Media Group - MIT Media Lab
URL: http://www.infovis.net/E-zine/num_46.htm
Description:
PeopleGarden: Creating Data Portraits for Users proposes the “Data Portrait” as a
graphical medium for the visualization of information related to individual users of
interactive media. The visual metaphor that PeopleGarden uses is of flowers in a
garden. Each data portrait is the trace of the user’s activities and takes the shape of a
flower.
Relevance:
“On-line interaction environments such as Web-based message boards, chat rooms, and
Usenet newsgroups have become widely popular. As the number of participants rises, it
is increasingly difficult to distinguish individual users and to comprehend the overall
interaction context.” In PeopleGarden the representation of a vague virtual space
reaches its extreme by allowing it to be portrayed as a digital garden. The concept is that
flowers represent individuals in a chat room, and the more time a user stays active in a
conversation the more its flower can grow and expand. I think this project is conceptually
very strong as it presents an innovative visual method for representing a vague
unspecified space.
35
42. 5.10 History Flow
Authors: Martin Wattenberg, Fernanda Viégas
Institution: IBM Watson Research Center
URL: http://researchweb.watson.ibm.com/history/index.htm
Author’s Description:
“The history flow application charts the evolution of a document as it is edited by many
people using a very simple visualization technique. History flow provides answers at a
glance to questions like, Has a community contributed to the text or has it been mostly
written by a single author? How much has a particular contributor influenced the current
version of the document? Is the text's evolution marked by spurts of intense revision
activity or does it reflect a smooth transition from its beginning to the present? The
current version of history flow visualizes the evolution of pages from Wikipedia”.
Relevance:
HistoryFlow is truly one of the most significant projects in reveling hidden patterns from a
set of data, otherwise unnoticed by the user. This feature is undoubtedly one of the key
strengths of Information Visualization. Using available data from the Wikipedia website,
the authors build an inventive visualization model for analyzing the evolutionary pattern
of individual contributions to Wikipedia articles through time. This visualization method
has some resemblance to Theme River™, developed by the Pacific Northwest National
Laboratory (PNNL), but it’s quite impressive the amount of conclusions history flow was
able to facilitate. In a lecture given at Parsons D+T Lab, on February 23, 2005, Martin
Wattenberg speaking on this project, mentioned that it takes an average of 2 minutes for
any kind of article vandalism to be noticed and repaired.
36
43. 5.11 Listening Post
Authors: Mark Hansen, Ben Rubin
URL: http://www.earstudio.com/projects/listeningPost.html
Author’s Description:
“Listening Post is an art installation that culls text fragments in real time from thousands
of unrestricted Internet chat rooms, bulletin boards and other public forums. The texts
are read (or sung) by a voice synthesizer, and simultaneously displayed across a
suspended grid of more than two hundred small electronic screens.”
Relevance:
Although the toolset and the medium of this project are quite different from the screen-
based interactive application intended for my thesis, I believe this project is an amazing
precedent and one of the best installations I have ever seen. Exhibited at the List Visual
Arts Center, Cambridge, Mass, and the Whitney Museum of American Art, New York,
Listening Post has recently been awarded a prize at the Ars Electronica 2004 Festival.
Co-author Ben Rubin emphasizes the motivation for the project: “My starting place was
simple curiosity: What do 100,000 people chatting on the Internet sound like?”. The
significance of Listening Post is remarkable. It displays short messages, randomly
picked from chat rooms according to a specific set of keywords, and then, not only it
gives life to them by placing the messages in a specific spatial configuration, a
“suspended grid of more than two hundred small electronic screens”, but also gives
them a sound dimension, which makes the experience truly memorable. This large
display of small screens resembles a “window” overseeing the activity in cyberspace.
37
44. 6 Methodology
6.1 Summer Research
My first presentation in the beginning of the Fall 2004 semester enclosed some of the
widespread research done through summer. It was entitled “Discovering Complex
Networks”. My approach to this first assignment was to face the presentation as a
lecture, by educating my audience about the engaging science of complex networks and
narrating all the discoveries and knowledge gathered in this initial phase.
The presentation contained explanations and diagrams about the specific properties of
scale-free networks and took a holistic view by showing diverse examples of complex
networks in different domains, as diverse as Gene Networks and Airline Routes. All the
images shown at this presentation can be seen in Appendix A – Summer Research
Presentation, at the end of this paper.
In order to better understand the successive steps that led me to the study of complex
networks one should consult the Impetus chapter on this Thesis. There I describe in
detail the evolution of my research inclination and motivation course.
I ended my Summer Research Presentation with a slide where I stated that my main
interest was to “Visually map a dissemination/propagation pattern in a scale-free
network”. I also made a short list of additional enquiries, where one could read:
> How does an idea, innovation, fad, trend, disease or virus travel from A to B in a
specific scale-free network?
> How long does it takes?
> How many nodes are affected?
> How do the hubs react?
38
45. I finally concluded the presentation by stating what were my future goals. “To choose an
area and subject to analyze, where I can bring something new to the field and contribute
to its development.”
6.2 Visual Explorations
After an extensive research on Complex Networks I started to delve into different ways
of visualizing them. The main premise was that complex networks are difficult to
visualize, but we don't need to make them more complex in the process of trying.
On September 27, 2004, I wrote the following in my thesis diary blog: “My thesis
assertion has always been the visualization of dissemination patterns in a particular
scale-free network. (…) However, I quickly found out that this premise is based on the
assumption that the target network displays a visual structure suitable for analysis.
Naturally, most of the time, this assumption is incorrect. Since a visual representation of
a dissemination pattern cannot exist without a functional visual representation of the
underlying network, I decided to dedicate my time, for now, to the visualization of
complex networks. I've been delving into a set of visual explorations, collecting problems
and proposing solutions.”
quot;Functional visualizations are more than innovative statistical analyses and computational algorithms. They
must make sense to the user and require a visual language system that uses colour, shape, line, hierarchy
and composition to communicate clearly and appropriately, much like the alphabetic and character-based
languages used worldwide between humans.quot;
Matt Woolman
Digital Information Graphics
39
46. As acknowledged in another blog entry, also on September 2004: “I've tried several
open-source network visualization tools and seen hundreds of visualization examples. I
think I found a critical problem. In most tools I've seen, the user starts building its
network from an initial node. The user places the first node in the center of the drawing
board and then, node after node, link after link, the network starts expanding. Since
there's no preceding method of organizing the nodes and links in the designated area,
new nodes start naturally occupying any free space available. Unsurprisingly, after a
certain threshold, the lattice of lines and nodes becomes unbearable. This problem
happens so many times.”
The difference between this method and Mark Lombardi's drawings, for example, is a
question of organization. Instead of a bottom-up hierarchy described before, Lombardi
used to plan his overall design with a holistic view of the entire network, knowing
beforehand the amount of space he had and the exact number of nodes and links he
needed to draw. Because of this, the cleanness of his drawings, where rarely there's an
edge overlapping, is an excellent example of network visualization. What I cannot
understand is why Lombardi's method, and alike, aren't taken into consideration
whenever someone decides to build a visual representation of a network. A macro
approach to the problem is definitely more appropriate. A top-down hierarchy instead of
bottom-up. And to say Lombardi's networks where not complex enough is a mere
exercise of oversimplifying his work.
The beautiful and eloquent global networks
of Mark Lombardi
40
47. Besides the mentioned problem, I encountered two others in my research, which
contribute drastically to the huge amount of bad visualization examples of complex
networks. First, most visual applications are based in constructive algorithms that obey
one rule: display the inputted data. Rarely the notion of how the data is displayed is
considered. By that reason, often-stunning visual forms demonstrate a low level of clarity
and function. Second, usually programmers who built open-source applications and
scientists/researchers who use them, have no visual sensibility or graph drawing
knowledge. Many researches produce a visual model of the analyzed network as a mere
additional element for showing their research. Sometimes it adds nothing to it.
On my second thesis presentation in the Fall 2004 semester, I applied many of my
reflections and sketches to practical examples, proposing possible solutions to improve
the visualization of complex networks. I divided my solutions into five major steps:
The main slides of this presentation can be seen in Appendix B – Complex Networks:
Visual Explorations, at the end of this paper.
41
48. 6.3 Prototype #1
This was my first visual prototype shown at the Fall 2004 mid-term review. This review
also marked the birth of the thesis title: Blogviz. The mid-term presentation was entitled
Blogviz: An experimental social laboratory. The underlying concept was based on a
major aspiration: nodes local stability and links global connectivity. The goal was to
map the connectivity among blogs. What I tried was to position the nodes in a structured
way, so they would remain fixed, and to some level, under control. The links, however,
would be in constant change and the outcome would be highly random and
unpredictable. The reason why I chose to sort all the nodes in a precise manner was to
be able to isolate the major hubs and have some control over the lattice resulting from
the links agglomeration. Looking at it now, it seems the result was too rigid and strict.
The radial diagram with its implosive structure reinforces the structure rigidness by
resembling a closed system that probably doesn’t describe so well the blogs
fundamental openness.
Blogviz Visual Studies – Prototype #1
I realized I had to take a different path. I was trying too hard to control the outcome and I
believe the result showed exactly that. I had to loose some of my constant need for
control and let the system be more auto-sufficient, self-organizing and adaptive.
As stated in my Thesis blog in October 24, 2004: “Another criticism I received during the
presentation was that I was being to concerned with the visual aspect of it, and that I
was thinking too much as a visual designer. Well, although I agree in part with the critic,
42
49. my thesis assertion has always been the visualization of a specific dissemination
pattern, and from my extensive research in complex networks, I truly believe that the
only way I can positively contribute to this field is by employing my visual and interface
design knowledge. In my first prototype presentation I dissected several problems on the
visualization of complex networks and proposed distinct solutions that might solve some
of its inconsistencies. I believe there has to be a balance between highly complex
network visualizations that offer a poor functionality and highly aesthetic/innovative
visual representations that might suffer from the same dilemma. I just have to pursue
that balance.”
On this same presentation I also illustrated some of my initial studies regarding the
linkage among blogs. Connectivity in the blogsphere is a very binary process; we only
need to make two questions. Is blog A connected to blog B? If so, who is linking whom?
If none of them is linking to the other, they become momentarily isolated islands. For that
presentation I showed a few visual studies where I mainly explored the concept of
directional linkage, by visualizing inbound or outbound links, or putting it simple, who is
linking whom. The images below portrait some of these explorations.
43
50. 6.4 Prototype #2
While on my first prototype I was trying to deal with a structured way to map connectivity
among blogs, by isolating the hubs and sort the nodes according to popularity, on my
second prototype, I basically explored possible ways of visualizing diffusion patterns
over time. I tried several models based on a radial structure where time became the
major imposing element. In most of these experiences I faced a common problem in
representing a continuous flow of infected blogs. The underlying radial structure seemed
to impose its rigidness by enforcing fractures in the pattern, particularly whenever there
was a day transition.
Blogviz Visual Studies – Prototype #2
44
52. I quickly found out I had to make a change in my visualization thinking, since a radial
structure didn’t quite apply to my subject of analysis. Perhaps I was too much influenced
or distracted with the Radial Form of Organization Chart from the Alexander Hamilton
Institute or Loom 2, by Danah Boyd (et al).
Radial Form of Organization Chart (1924) Loom2 - Danah Boyd, Hyun-Yeul lee, Ethan Perry
Alexander Hamilton Institute Sociable Media Group - MIT Media Lab
As I wrote in my thesis blog on November 16, 2004: “At the moment I’m becoming
convinced that a horizontal array is truly the best way of representing the quantitative
and temporal qualities of a pattern. Time is a crucial domain in a dissemination pattern,
particularly in a word-of-mouth social behavior. The amazing potentialities of a horizontal
assortment is the uninterrupted continuous flow of data and the possibility of collapsing
time frames and still maintain a sense of scale and understanding of the pattern
dynamics.”
Blogviz Visual Studies – Horizontal array of adopting units
46
53. Blogviz Visual Studies
Different tryouts where adopting units (blogs) are structured
in a vertical and horizontal array
After this critical change in my visualization studies I started doing a lot of sketching and
writing. I built a few diagrams to get a full understanding of my system; built several
taxonomies and dissected the mechanics of blogging. This examination helped me
putting my ideas straight and getting a sense of what I was dealing with.
6.5 Prototype #3
On my third prototype I introduced Blogviz as a “topological model of meme behavior”.
From the conclusions of my previous tryouts, I decided to deeply explore the notion of a
horizontal array of adopting units (weblogs) to portrait the propagation pattern of a
specific topic. By doing that I would be constraining the Time element to the X axis. The
following images represent a series of tryouts in this context.
47
55. On this phase of the project I also introduced the first visual taxonomy of blogviz, by
dissecting the system and its intrinsic elements. The following image portraits a critical
understanding of the inherent structure of blogviz at that stage.
At the same time, a list of goals was created
(left image) in order to better understand the
intent of Blogviz.
49
56. 6.6 Prototype #4
From a series of independent and spread visual studies that characterized the initial
trials, this fourth prototype was the first solid tryout for acknowledging Blogviz as an
interactive visualization model. At the time I was pushing the concept of application or
tool of analysis, which according to some critics was implying a need for commercial
viability. Even though I’m convinced this thesis has several elements that could be
successfully applied in commercial applications, my goal with this project is to elevate
the understanding of Memetics in a specific social network and conduct a serious
research experiment, which I believe fits more adequately within the academic realm.
Another point worth of consideration is that, when developing this prototype, Blogviz was
intended to work with real-time data, in the form of hourly updated XML RSS feeds. This
idea changed afterwards, however, it was a crucial deliberation in the development of
this prototype.
Prototype #4 – Default First Page
50
57. A quick explanation on the previous image’s visual schema is that circles represent
topics; the diameter corresponds to the total number of adopting blogs; and the colors,
pink and green, denote respectively, a decreasing or increasing course. Time is again
incorporated in the X-axis, where the closer a circle is from the right edge of the window,
the more recent was its last dispatch. The Y-axis position of each circle helps reinforce
its level of adoption.
The main interaction on this fourth prototype was based on a simple flow. The default
first page would allow a swift view on the general pattern by showing the overall
condition of current topics popularity. If one decided to investigate more deeply the
structure and evolution of a particular topic, it would be taken to a sequence of
examination methods. The following images illustrate some of the techniques proposed.
Prototype #4 – Blogs’ evolutionary paths through time
Prototype #4 – Plotting blogs according to time/popularity
51
58. Prototype #4 – Detailed View Prototype #4 – Detailed View
Prototype #4 – Blogs’ adoption represented by a Tree Map Prototype #4 – Blogs’ analysis by Theme and Generator
Prototype #4 – Blogs’ relationship analysis
52
59. 6.7 Final Application
A major drift in the development of Blogviz was the decision of not incorporating real-
time data for the backend of the application. As previously stated, on my fourth prototype
I was mostly concentrated on developing a visualization schema that would expose
current trends in the topics diffusion process, by reading data from hourly updated XML
feeds. It would basically display the most adopted topics spreading in the blogosphere in
any given time. Even if the application allowed an extended breakdown of each topic
other then just a quick view at the present information tendencies, it was just considering
a restrict number of topics. I believe Blogviz’s concept, at that phase, was trying to
incorporate to many features, or levels of analysis, without being able to develop one
efficiently. It was also becoming a trend analysis tool rather then a comprehensive model
of topics distribution. I wanted Blogviz to become a serious visualization study on
information diffusion in blogspace, and not so much a marketing application. I still
believe there’s enormous potential on visualizing popular topics with real-time data
integration, and that might be something Blogviz will incorporate in the future. However, I
first wanted to better understand the topics’ inner structure and evolution through time.
This change in Blogviz progress also coincided with a parallel immersion in the
domains of Epidemiology and Diffusion of Innovations Theory.
I never imagined that an apparent minor adjustment would require such a drastic
turnaround in the project’s conceptualization. Until now, Blogviz had been dealing
with a very restrict and manageable time span. Real time data visualization was
merely constrained to one day, or at the most, one week. In opposition, by aiming at
an adaptive model, the critical goal was to come up with a visualization method that
could easily include time variations and still be consistent. Another crucial problem
was to visualize, in a very tight space, a high number of topics.
I had to come up with a visualization model that would answer these last two
problems accordingly. First, it had to be flexible enough to embrace distinct time
spans, but at the same time maintain uniformity throughout the process. Second, it
had to be able to include a high number of topics, and also, allow an immediate
understanding of the overall pattern and the individual life cycle of each topic.
53
60. On the process of looking for inspiration in diverse sources, I came up with an
elucidating diagram by E. J. Marey, on Edward Tufte’s The Visual Display of Quantitative
Information, that resolved particularly well many of the challenges I was facing.
Original Image: E. J. Marey, La Méthode Graphique (Paris, 1885), p.20.
Source: Tufte, Edward R., The Visual Display of Quantitative Information
The preceding image illustrates Marey’s graphical train schedule for Paris and Lyon in
the 1880’s. The X-axis incorporates Time, measured in hours, and maintains the same
scale in both the top edge (corresponding to departures and arrivals from Paris) and the
bottom edge (for departures and arrivals from Lyon). The remaining horizontal lines
represent other train stations between Paris and Lyon. The diagonal lines represent
different trains, leaving and arriving from the two main stations, and the horizontal line-
breaks represent waiting time in secondary stations.
This chart influenced me greatly in the following steps of my project. I believe it is an
extraordinary example of information visualization, where time and pattern become one
intrinsic entity, allowing a substantial understanding of the data dynamics in one brief
look.
I applied a modified version of this concept to Blogviz, where the lines became
representative of topics, and the time scale was measured in days. Blogviz’s model
doesn’t incorporate any type of constraint on the Y-axis, as Marey’s graph does,
therefore the overall height of the main window is rather arbitrary. The following image
represents the main visualization window for topics’ evolution within the Blogviz
environment.
54
61. Blogviz’s topics visualization – Topic Lines and Time Scale
The interesting characteristic of this model is that, as in the Paris/Lyon train schedule
example, the angle of each line has a specific meaning. This happens because both top
and bottom edges of the window maintain the same time scale. Therefore, the wider the
angle, the shortest is the duration, in this case, the topic’s duration. On the image above
for example, one may see a line, close to the center of the window, which seems to be
almost vertical; what it means is that the life cycle of that particular topic was very short.
This feature is even more relevant for topic lines that have either the starting or ending
point outside the present timeframe. I conducted a small experiment within the same
model, where the lines, instead of their diagonal placement, were drawn horizontally.
This method was probably even more successful when the lines had the starting and
ending point inside the selected time span. However, when topic lines had a first day or
last day of spreading outside this frame, it would be unpredictable to calculate the
amount of days beyond it. What the diagonal alignment facilitates is a full understanding
of the topic’s life cycle, even when it spreads outside the present time span.
To better understand the intricacies of this visualization model, the following images
illustrate the four possible life cycles for every topic line, within each timeframe, and the
way they are represented.
55
62. Topic with first and last day of spreading within the current time span
Topic with first day of spreading outside the current time span
Topic with last day of spreading outside the current time span
56
63. Topic with first and last day of spreading outside the current time span
The prediction line angle for outsider dates is made through an equation that multiplies
the number of days (topic duration) by the number of pixels of each day parcel. So if a
specific topic line has the starting point (first day of spreading) within the present
timeframe, the last day outside of it, and its total days are 64; the system multiplies 64 by
12 (number of pixels of a day parcel) from the starting point, and as a result, a line is
drawn dynamically to the resulting end point.
Another feature of this visualization method, further explained in the following Blogviz
Interface section, refers to the brightness or color saturation of each line. In Blogviz, the
default setting for the lines’ brightness is a depiction of the total number of adopting
blogs. This allows for a comprehensible insight when evaluating the overall pattern. On a
brief look, one is able to identify the life cycle of each topic, and also, the number of
blogs that adopted it.
I like to consider the visual representation of this model as a metaphor of a window,
overlooking cyberspace, where lines of information flow continuously cross it.
57