SlideShare une entreprise Scribd logo
1  sur  98
Télécharger pour lire hors ligne
blogviz
Mapping the dynamics of
 Information Diffusion in
              Blogspace


           by Manuel Lima




                            A thesis document submitted in partial
                            fulfillment of the requirements for the
                            degree of Master of Fine Arts in
                            Design and Technology.

                            Parsons School of Design

                            May 2005


                            Thesis Instructor: Christopher Kirwan
                            Writing Instructor: Mark Stafford




                            Manuel Lima
                            lima@parsons.edu


                            www.blogviz.com
blogviz
Mapping the dynamics of
 Information Diffusion in
              Blogspace


           by Manuel Lima




                            Abstract
                            Blogviz is a visualization model for mapping the
                            transmission and internal structure of top links
                            across the blogosphere. It explores the idea of
                            meme propagation by assuming a parallel with
                            the spreading of most cited URLs in daily weblog
                            entries.

                            The main goal of Blogviz is to unravel hidden
                            patterns in the topics diffusion process. What’s
                            the life cycle of a topic? How does it start and
                            how does it evolve through time? Are topics
                            constrained to a specific community of users?
                            Who are the most influential and innovative blogs
                            in any topic? Are there any relationships amongst
                            topic proliferators?



                            Keywords
                            Information Diffusion, Memetics, Weblogs, Online
                            Social Communities, Complex Networks,
                            Information Architecture, Information
                            Visualization, Diffusion of Innovations,
                            Epidemiology, Small Worlds
Acknowledgements

−
Scott Patterson
Jared Schiffman
David Kearford
Fura Johannesdottir

Thank you for your feedback

−
Christopher Kirwan
Mark Stafford

Thank you for your guidance,
openness and continuous motivation

−
My dearest Parents

Thank you for your eternal support
and dedication
Table of Contents


1 Introduction                            1
  1.1 Concept                             1
  1.2 Memetics                            3
  1.3 Diffusion of Innovations            5
  1.4 Epidemiology                        10

                                          12
2 Impetus
                                          16
  2.1 Subject of Analysis

                                          18
3 Context
                                          18
  3.1 Online Social Communities
                                          21
  3.2 Weblogs
                                          23
  3.3 Blogosphere

                                          24
4 Audience
                                          26
5 Precedents
                                          38
6 Methodology
                                          38
  6.1 Summer Research
                                          39
  6.2 Visual Explorations
                                          42
  6.3 Prototype #1
                                          44
  6.4 Prototype #2
                                          47
  6.5 Prototype #3
                                          50
  6.6 Prototype #4
                                          53
  6.7 Final Application

                                          63
7 Technical Sources
                                          63
  7.1 Blog Engines
                                          64
  7.2 Blogviz Data

                                          68
8 Conclusion
                                          73
9 Bibliography


  Appendix A
  Summer Research Presentation

  Appendix B
  Complex Networks: Visual Explorations
1 Introduction

Blogging presents one of the most interesting social phenomenons of our time. This
change in the flow of online information might radically change the way we look at
news providers and large media conglomerates. It also provides an extraordinary online
laboratory to analyze how trends, ideas and information travel through social
communities.


1.1 C0ncept
Blogviz is a non-commercial research project developed with the intent of disentangling
this highly complex network for further study, research and analysis. The main goal of
Blogviz is to improve our understanding of the dynamics of information propagation
among weblogs.


An underlying question to Blogviz is: “How can we measure meme as a unit of cultural
evolution?”. The answer is not easy. Memes, due to their widespread trait and frequent
untraceable evolutionary track, become extremely hard to measure accurately. In
opposition to this commonly undetectable meme pool, the blogosphere offers a
discernible and documented map of thousands of memes, with clear trails of
progression, structured by date and time.


There are many possible ways of looking at information diffusion in blogspace. It can be
based on conversation threads, comment threads, key sentences, themes, tags, or top
links. Blogviz analyzes top links, occasionally called topics, which represent the most
cited URLs appearing in blog entries in any given day. These popular links represent
particular memes that provide an idea of sources, stories and themes that have
occupied the attention of bloggers over a certain period of time.


By exploring the evolution of these topics through time, Blogviz will not only able to track
its popular dispatchers and key innovators, but also, follow its dissemination pattern from
the beginning to an eventual tipping point, where it might leap the blog community and
reach the mainstream.




                                                                                           1
Blogviz embodies a flash driven interactive visualization model with extensive use of
information visualization and information architecture. Why is Information
Visualization central to Blogviz? Information Visualization can be defined as quot;the use of
computer-supported, interactive, visual representations of abstract data to amplify
cognitionquot; (Card, Mackinlay & Shneiderman, 1999). Information Visualization does not
only makes data easier for human interpretation but it also discovers and highlights
relationships in data elements, usually reducing the processes of searching by gathering
information in a small rich space.


Therefore, Blogviz employs Information Visualization with the key intent of uncovering
hidden patterns in the data and deriving plausible conclusions, which promote an
advanced knowledge of information dynamics in blogspace. By unraveling the modus
operandi behind the blogosphere we might be able to improve our knowledge on the
mechanics of online social communities and, to some extent, the mechanics of
complex social networks.


Blogviz is currently a portrait of blogosphere’s topic activity during the months of January
and February 2005. The selection of a time period was purely arbitrary. In order to make
this project a reality within the thesis development time limitations, a decision was made
in order to constrain the project to a more specific time span. Nevertheless, the model
was developed to easily incorporate different timeframes. Blogviz will continue to expand
in the future, to the possible point of including real-time data.


Blogviz uses existing data from three different blog search engines organized in a
database that will soon be available for public access. (see Technical Sources for
additional information)




                                                                                             2
1.2 Memetics
From a conversation with my Thesis Writing instructor, Mark Stafford, I was able to
understand how my thesis had become closely related to the concepts of memetics or
meme behavior. We came to the conclusion that I was developing a “topological model
of meme activity”, even if until then I was somehow oblivious to it. That title actually
remained for a while when characterizing Blogviz. But later on I decided to change it,
since the word meme was slightly audience limiting and the expression topological could
result in inadequate interpretations. I still question why the notion of Memetics didn’t
came up in my research earlier, but what is particularly interesting is that it was there
from the beginning, immersed in every iteration of my work. I think I was too much
concentrated in the idea of a word-of-mouth behavior, an expression used by Malcolm
Gladwell in “The Tipping Point” and by Duncan Watts in “Six Degrees: The Science of a
Connected Age”.


The vital point is that Memetics is the principle theory when contextualizing Blogviz, and
because of that, understanding the theory of Memetics is a crucial measure to
comprehend the underlying concept of Blogviz.


1.2.1 What’s a Meme?
The term was first coined by Richard Dawkins’s, in 1976, on his notorious book “The
Selfish Gene”. In the words of Dawkins the word quot;memequot; refers to quot;a unit of cultural
transmission, or a unit of imitationquot;. More specifically, a meme can be defined as a self-
propagating unit of cultural evolution, a unit of information, held in an individual's
memory or in an outside artifact (e.g. book, record or tool), which is likely to be
communicated or copied to another individual's memory or retention system. Examples
of memes are ideas, catch-phrases, melodies, technologies, icons, theories, inventions,
languages, designs, fashions, and traditions. This covers all forms of beliefs, values and
behaviors that are normally taken over from others rather than discovered
independently.


A meme is basically a pattern of information that induces people to repeat it. People try
to “infect” each other with memes they find most appealing, despite of the memes'
objective value or truth.




                                                                                             3
1.2.2 What is Memetics?
Memetics is the study of evolutionary models of information transmission based on the
concept of the meme. In spite of its roots in evolutionary biology and computer
simulation, memetics has become more of a social science, focusing primarily on the
spread of information within human society. Rather than debate the inherent quot;truthquot; or
lack of quot;truthquot; of an idea, memetics is largely concerned with how that idea itself gets
replicated.


Another definition of Memetics declares it is the theoretical and empirical science that
studies the replication, spread and evolution of memes. As portrayed in the Journal of
Memetics*: “It’s core idea is that memes differ in their degree of ‘fitness’, i.e. adaptation
to the socio-cultural environment in which they propagate. Because of natural selection,
fitter memes will be more successful in being communicated, ‘infecting’ a larger number
of individuals and/or surviving for a longer time within the population. Memetics tries to
understand what characterizes fit memes, and how they affect individuals, organizations,
cultures and society at large”.


Since the premise of Memetics is to investigate the evolutionary mechanisms that
determine the propagation of information within a population of human, animal or
artificial agents, we can easily perceive why this science is vital to the understanding of
cults, ideologies, or marketing campaigns of all kinds.


A meme is acknowledged as a self-propagating unit of cultural evolution, analogous to
the gene (the unit of genetics). And because of memes’ similar behavior to life forms,
Memetics embraces the analytical techniques of diverse sciences, such as,
epidemiology, evolutionary science, immunology, diffusion of innovations, linguistics,
and semiotics.




* Journal of Memetics (http://jom-emit.cfpm.org)

                                                                                                4
1.3 Diffusion of Innovations
I believe any type of Information Diffusion Model (IDM) in Social Networks must derive
extensive practical knowledge from the sciences of epidemiology and diffusion of
innovations. These two domains help us understand many of the factors that
characterize the spreading of information and adoption process in social communities.
Epidemiology and Diffusion of innovations also share many similarities and are
surprisingly linked together. For these reasons I decided to include in this thesis a short
description of these areas, since in addition to the concept of Memetics, they create an
extraordinary context to the understanding of Blogviz.


I don’t make wide explanations of each domain but rather comparisons between them on
how they relate to this thesis’s assertion. In order to delineate a common ground for the
following definitions, this paper assumes that an innovation can be characterized as a
new meme, given that it is also described as a new idea. In the context of information
diffusion in the blogosphere, it assumes the process of adoption to be the process by
which a blogger, aware of the existence of a new meme (or innovation), decides to
mention it on his/her own personal blog, in the form of a post or part of a post. This
action can be understood as an “adoption” by the blogger of this particular unit of
information, therefore contributing to its replication.


The study of innovation adoption and diffusion has its origins in the Midwestern United
States. In an Iowa State University study, Ryan and Gross (1943) showed that the
pattern of adoption and diffusion of a maize hybrid was systematic, hence opening the
door for further research.


Diffusion is the process by which an innovation is communicated through certain
channels over time among the members of a social system (Everett M. Rogers, 1995).
The innovation includes quot;any thought, behavior, or thing that is new because it is
qualitatively different from existing formsquot; (Jones, 1967). The characteristics of an
innovation, as perceived by members of a social system, determine its rate of adoption.


Just by analyzing these last statements one can easily grasp a series or similarities with
the notion of Memetics. Even to the point that the theory of Diffusion of Innovations also
considers the unit of adoption not exclusive to an individual person, but extending to
other types of retention systems.
                                                                                              5
The four main elements in the diffusion of new ideas are:

(1)   The innovation
(2)   Communication channels
(3)   Time
(4)   The social system (context)

1.3.1 The Innovation
These are the characteristics that determine an innovation’s rate of adoption:

– Relative advantage
– Compatibility
– Complexity
– Trialability
– Observability to those people within the social system.

1.3.2 Communication Channels
A communication channel is the means by which messages get from one individual to
another. Mass media channels are more effective in creating knowledge of innovations,
whereas interpersonal channels are more effective in forming and changing attitudes
toward a new idea, and thus in influencing the decision to adopt or reject a new idea.
Most individuals evaluate an innovation, not on the basis of scientific research by
experts, but through the subjective evaluations of near-peers who have adopted the
innovation. (Everett M. Rogers)


In a broad sense, the communication channel in the context of Blogviz is indubitably the
Internet. Without it there wouldn’t even be any kind of communication between bloggers.
However, without blogrolls and posting citations within each blog, the restrict channels
among them would be very difficult to perceive. Blogrolls are the backbone of blog
communities, the edges that keep all the nodes interconnected, and therefore, are the
key factors in understanding how information develops across the blogosphere. In fact, a
major characteristic of online social communities is that they are based on
communication channels, not on physical co-location. A blogroll is a listing of websites
that often appear as links on weblogs, usually on a left or right frame of the page. This
list of links is used to relate the site owner's interest or affiliation with other webloggers.




                                                                                                  6
1.3.3 Time
The Diffusion of Innovations theory divides the element of Time in three main
dimensions, in which only two can be fully applied to the context of information diffusion
in the blogosphere.


> Innovation-decision – The innovation-decision process is the mental course of action
in which an individual passes from first knowledge of an innovation to forming an attitude
toward the innovation, to a decision to adopt or reject it, and if adopting it, to implement
this new idea and confirm the decision.


In the case of a blogger deciding to post or not a specific meme in his/her weblog, this
decision process is so fast that it’s almost impossible to measure. It applies to other
memes, and definitely to other innovations, but it’s not relevant as a measurement in top
links replication.


> Innovativeness – Innovativeness is the degree to which an individual is fairly faster in
adopting new ideas in relation to other members of a social system. Innovativeness, in
opposition to the innovation-decision process, is an extremely significant measurement
in top links replication, as in most information diffusion models.


There are five adopter categories, or member classifications of a social system, based
on their level of innovativeness:

– Innovators
– Early adopters
– Early majority
– Late majority
– Laggards




                                                      Bell-shaped curve showing categories of
                                                      individual innovativeness and
                                                      percentages within each category




                                                                                                7
Innovativeness among social systems is characterized by a bell-shaped curved where
time and incidence of adoption are the two main vectors. This concept, in the context of
Blogviz, is further explored in the Methodology chapter of this thesis.


Many search engines and community tools analyzing the blogosphere, assume a direct
correlation between blogs popularity and innovativeness. I believe this assumption is
incorrect. Their thinking is very simple. If a specific blog has a high number of inbound
links and therefore a sizeable readership, it must imply that it’s in the frontline in finding
and publishing original information. The HP Information Dynamics Lab study on the
“Implicit Structure and the Dynamics of Blogspace” (Eytan Adar et al) showed exactly
the opposite. The study demonstrated that popular blogs are rarely among the first ones
to start a specific trend. Many popular blogs claim most of their “discoveries” by not
citing their original source, which are usually smaller unfamiliar blogs. The level of
popularity of each blog might be directly related to its scale of influence, but not
necessarily to its level of innovativeness. So who are these unknown bloggers that bring
fresh ideas to the blogspace? Who are these innovators or trendsetters? Blogviz will
allow an exposure of these anonymous sources, crucial in the dynamics of topics
diffusion.


> Rate of adoption – The rate of adoption describes how fast an innovation is adopted
by members of a social system in a given time period. When mapping the cumulative
adoption time path or temporal pattern of a diffusion process, the resulting distribution
can generally be described as taking the form of an S-shaped (sigmoid) curve. Time and
cumulative adoption (or infected population) are the plot main vectors.




                                                                                                 8
1.3.4 The Social System
The fourth main element in the diffusion of new ideas is the social system, which
basically creates a boundary for the diffusion and adoption of an innovation to occur. A
social system is defined as a set of interrelated units that are engaged in joint problem-
solving to accomplish a common goal (Everett M. Rogers). The members or units of a
social system may be individuals, informal groups, organizations, and/or subsystems.


In regards to the replication of top links among weblogs, the social system is
undoubtedly the blogosphere, depicted as a fertile network of endless social
communities. This vast communication network consists of interconnected individuals
(bloggers) who are linked by shared interests and patterned flows of information.


At a first glance, considering the highly interconnected web of links, connections and
shared interests among bloggers, it might seem easy to understand the adoption
process of a particular unit of information or innovation. However, another crucial
conclusion exposed by the HP Information Dynamics Lab study, mentioned before,
declared that “for URLs appearing on at least 2 blogs, 77% of blogs do not have a direct
link to another blog mentioning the URL earlier. For those URL’s present on at least 10
blogs, 70% are not attributable to direct links”.


There have been several studies on how the system’s social structure, and norms or
established behavior patterns, affect the diffusion of innovations within a particular social
system. But another area of research that is closely linked to Blogviz relates to opinion
leadership. It can be described as the degree to which an individual is able to influence
informally other individuals' attitudes or explicit behavior in a desired way with relative
frequency. Blogviz allows a broad understanding of opinion leadership in blogspace by
tracking and exposing the most influential and innovative topic proliferators.




                                                                                              9
1.4 Epidemiology
Throughout this thesis I use several times the terms contamination and infection when
describing the adoption process of memes. Even though this practice might lead to
unwanted interpretations, its use is not arbitrary, and it actually facilitates the
comprehension of information diffusion dynamics.


Epidemiology in its broadest sense is the study of disease patterns in human
populations (Wikipedia). Epidemiology can also be described as the study of the
determinants, occurrence, and distribution of health and disease in a defined population.
Infection is the replication of organisms in host tissue, which may cause disease. A
carrier is an individual with no overt disease who harbors infectious organisms. And the
notion of dissemination is understood as the spread of the organism in the environment.


In the above description, regardless of the different terms, we start noticing several
similarities with the domain of diffusion of innovations. This analogy is even more explicit
when characterizing the three major elements in disease occurrence, the so-called chain
of infection:

(1) The etiologic agent (parallel to the innovation)
(2) The method of transmission (parallel to the communication channel)
(3) The host (parallel to a unit of a social system)



Further along in characterizing the disease evolution, the epidemiologic descriptive study
organizes data by time, place and person. It is unquestionably the closest approach to
the concept of Information Diffusion. It divides the element of Time into four main trends;
respectively, secular trends, periodic trends, seasonal trends and epidemics. What’s
interesting in this typology of Time is that it applies equally well to the evolution of top
links across the blogosphere. Because of that I assume a series of parallelisms between
them.


The secular trend describes the occurrence of disease over a prolonged period. This
continual development is less usual then the seasonal trend in the context of blogspace.
This trend usually describes commercial or very popular websites that never lose entirely
the bloggers’ interest and as a result have a continuous existence among them.



                                                                                               10
The periodic trend basically expresses a temporary modification in the overall secular
trend. It conveys a sudden new interest in a specific meme that is part of a continual
trend.


The seasonal trend reflects seasonal changes in disease occurrence following changes
in environmental conditions that enhance the ability of the agent to replicate or be
transmitted. This short transitory trend is the most common in blogspace. A new meme
that spreads quickly and rapidly loses interest, dying in a short period of time.


The epidemic incidence of a disease happens generally when it surpasses a threshold
of 7% of the target population. An epidemic is a sudden and boost in occurrence due to
prevalent factors that support transmission. An information epidemic in blogspace might
originate a tipping point, where a specific meme escalates and leaps the blogspace,
reaching the mainstream.




                                                                                         11
2 Impetus

The main source of motivation for my thesis development is based on a solid
cooperation between Information Diffusion, Information Architecture, Data Visualization,
and the Science of Complex Networks.


My curiosity in Information Architecture was initially fostered in Christopher Kirwan’s
MFADT class in the Spring of 2004, and since then, it became a major subject of interest
and awareness. I remember observing for the first time a diagram with four
interconnected circles representing the continuous Understanding Spectrum. Data
originates information, which leads to knowledge and ultimately to wisdom. This concept
influenced my vision and made me reflect on the responsibly I had, as a designer, to
contribute to this spectrum.


                                                                       The Understanding
                                                                       Spectrum

                                                                       Nathan Shredoff




We may have access to an abundance of information but I strongly believe we lack the
ability to process it effectively. In face of contemporary technological accomplishments,
our ability to generate and acquire data has by far outpaced our ability to make sense of
it. Neither raw data nor scattered information offers any level of meaningful
understanding. This is where Information Architecture and Information Visualization
undertake an important mission. If we are truly entering a fourth phase in human-kind, a
theory defended by a large number of anthropologists and sociologist, then Information


                                                                                           12
Architecture is going to be a golden key in the process. In a world increasingly driven by
information, it rapidly assumes the form of power, and typifies society in terms of those
who own it and those who don’t. Meaningful information is not a given fact, and
particularly now, when our cultural artifacts are being measured in gigabytes and
terabytes, organizing, sorting and displaying information, in an efficient way, is a crucial
measure for intelligence, knowledge and wisdom.


In the Spring 2004 semester I was involved in two projects that were decisive in the
delineation of my thesis domain of interest and my increased alertness towards
Information Architecture and Information Visualization. The first one was a group project
developed at the Information Architecture class, taught by Christopher Kirwan. Self-
Replicating Cloners was a project aimed at producing visualizations of Virus, their
progression through time and world scale dissemination. Two viruses were analyzed by
comparison, SARS and MyDoom, each one representing its underlying field, human
biology and computer technology.




Self-Replicating Cloners
Visualizations of Virus (biological/computer generated),
their progression through time and worldscale dissemination




                                                                                            13
The second point of awareness was a group project developed in a collaboration studio
with Siemens Corporate Research Center. Aimed at Siemens Medical, DSS –
Disease Surveillance System was a visualization and communication tool that shared
symptomatological data between hospitals and health care professionals for detecting
possible disease outbreaks and recognizing development patterns nation wide.




DSS – Disease Surveillance System



After these two particular experiences, I started my summer research with some clear
interests in mind, but still scattered through distinct areas such as artificial life, virology,
cognitive science, genetics, cyber biology, epidemiology, and pattern recognition.
Emergence, by Steven Johnson, was the first book I read in my research and it was a
surprising start. The paradigm of Emergence, which can be described as a “higher-level
pattern arising out of parallel complex interactions between local agents”, was slowly
overflowing my mind with bright new discoveries. And with an augmented motivation, I
started gradually abandoning some initial ideas and, in other cases, finding common
links between them, under the sciences of complexity and self-organization. The search
for answers on how order can emerge from disorder, and organization emerge from
chaos, guide me to initiate a study on the individual parameters of emergent systems,
such as collective/macro behavior, self-organizing communities and bottom-up
hierarchy.


This research led me inevitably to complex systems. Delving into this new area was
even more thrilling. Finding each day, a common structure in apparent distinct fields, or
similarities between natural systems and human designs, was beyond doubt
overwhelming. From that point on, I became extremely fascinated with the omnipresent


                                                                                               14
web of signals and interactions, nodes and links that shape modern complex networks,
from social networks, to corporations, cities, living organisms and the Internet.


Complexity is a challenge by itself. Complex Networks are everywhere. It is a structural
and organizational principle that reaches almost every field we can think of, from genes
to power systems, from food webs to market shares. Paraphrasing Albert Barabasi, one
of the leading researchers in this area, “the mistery of life begins with the intricate web of
interactions, integrating the millions of molecules within each organism”. Humans, since
their birth, experience the effect of networks every day, from large complex systems like
transportation routes and communication networks, to less conscious interactions,
common in social networks. A Scale-Free network, the most common topology in either
natural or human systems, is curiously enough, a very recent breakthrough. Since its
discovery, 6 years ago, dozens of researchers worldwide have been disentangling the
networks around us at an amazing rate. This awareness is helping us understand not
only the world around us but also the most intricate web of interactions that shape the
human body. The global effort of constructing a general theory of complexity is
tremendous and may lead us, not only to a structural understanding of networks, but to
major improvements in stability, robustness and security of most complex systems
around the globe. Like Barabasi refers in Linked, “Once we stumble across the right
vision of complexity, it will take little to bring it to fruition. When that will happen is one of
the mysteries that keeps many of us going”.


The feature that has always fascinated me the most in complex networks is the
dynamics of Dissemination Patterns. The visualization of a path, and inherent duration,
of a certain fad, idea, or virus, in a social/biological or computer network has been, since
the beginning, a critical point of awareness. How does a particular contagion travel from
point A to B, which nodes it affects in its course, and how fast if contaminates a large
cluster or the entire network.




                                                                                                15
2.1 Subject of Analysis
After my summer research presentation, in the beginning of the Fall 2004 semester,
where I showed all the collected knowledge in the domain of complex networks, I went
even further on observing and collecting dozens of network visualization examples and
trying several open-source applications. This investigation resulted on my second official
presentation. Part of this research also coincided with the work I was developing as a
design researcher at Parsons Institute of Information Mapping (PIIM). For additional
information on this study please consult section 6.2 of chapter 6 – Methodology.


After the second official presentation I was sure of two things:


1 – I wanted to continue my visual explorations exercise, by gathering problems and
inconsistencies in complex network diagrams and proposing plausible solutions.
2 – I wanted to map a dissemination pattern in a specific network. By doing that, I
intended, not only to be innovative and bring something new to the field, but also display
a ‘showcase’ of my visual thinking in terms of complex networks visualization.


The first objective was well defined, and best of all, already under development. The
major problem was finding a solution for the second point. I had to hit upon a subject that
represented all the research and knowledge I had gathered through the summer and the
beginning of the Fall 2004 semester. Finding an answer to this quest seemed an
impossible task, due to the vagueness of possible directions. At a certain point it was as
if I had came back to the start, with the fearful blankness of June assaulting my mind
once again. Time was urging and I knew whatever subject I chose, I was still facing an
enormous workload ahead of me. The first thing I decided was to go back to my initial
interest, the main cause that led me in this escalating exploration of complex networks. I
quickly found out my early motivations: virus dissemination and relationships between
social/biological and computer/technological systems.


One thing I discovered on my summer research is that ideas, fads, trends and
innovations show similar dissemination patterns as virus in social networks. The concept
of word-of-mouth is a fascinating diffusion behavior that has always intrigued
psychologists, sociologists, anthropologists, and lately marketers. To be able to map a
word-of-mouth epidemic in a specific social network is a blue-sky scenario. And that
might be true, in relation to physical interactions in a physical world between physical

                                                                                           16
individuals. However, a flourishing movement on the Internet presents an interesting
experimental laboratory to explore this behavior. Blogging embodies an incredible case
of word-of-mouth, where news, ideas and fads travel through community clusters with
high adoption rates. Because of their inherent nature blogs became my ultimate fixation
and the main frameset for my Thesis. Their high interconnectivity and shared flow of
information represent not only an obvious case study of meme propagation, but an
outstanding example of a dissemination pattern in a increasingly high complex network,
estimated to be over 8 million nodes.


As an example, I’ll mention a topic that emerged from the blog community in the
beginning of October, 2004. On the first presidential debate for the US Elections 2004,
on September 30, 2004, between President George W. Bush and Senator John Kerry,
there was an episode that got the attention of a particular viewer. “You forgot Poland”
was the abrupt statement made by George W. Bush while John Kerry was enumerating
the allied forces present at the Iraq War. The presidential debate occurred on a Friday
evening, September 30, and on the following Monday night, there was a topic already
sharing 12 links among bloggers. This topic pointed to a specific URL –
http://www.youforgotpoland.com. By that time, less than 72 hours after the debate,
someone had already created a domain (youforgotpoland.com) and was selling online t-
shirts and stickers with the same sentence. A new meme had been born and in a short
period of time “infected” several people.


This intriguing example reveals the accelerating rate of information flow among bloggers
and how fast it spreads or “contaminates” online blog communities. Another issue of
awareness, demonstrated by this example, is the possibility of tracking a possible
outburst. Imagine this topic reaching the mainstream a week later, possibly a major
newspaper or a particular TV show. How interesting would it be, to actually go back in
time and discover where this outbreak first originated, the way it was adopted and how
fast it grew?


These last two queries have undoubtedly become a crucial motivation for the
development of my thesis. Quoting Duncan Watts, in regard to the mechanics of social
networks: “To understand the pattern, we need to delve further into the rules by which
individuals make decisions, and how, in the process, our apparently independent
choices become inextricably bound together.”

                                                                                          17
3 Context

The contextual narrowing of my thesis proposal starts on the broad area of Complex
Networks, tights its limits on Social Networks and ends at its ultimate contextual
boundary, Online Social Communities.


Even though this Thesis proposition places itself on the center of a broad group of
domains, I decided to deeply explore its closest and more direct domain – Online Social
Communities, and the main subject of analysis – Blogs. Nevertheless, besides the
omnipresent field of complex networks, the context of this thesis incorporates the
domains of Information Diffusion, Memetics, Information Architecture, Data
Visualization, Information Theory, Diffusion of Innovations, Epidemiology and
Small Worlds.



3.1 Online Social Communities
Online Social Communities, although much more concise than the Science of Complex
Networks, is still a wide-ranging field that can include mostly every type of online inter-
personal communication medium, from e-mail listings/threads, to Usenet groups, MUDs,
chat environments, instant messaging, community forums, weblogs, online gamming,
interest groups, among others.


Online Communities offer an interesting change on the parameters that until now have
defined social interaction. Several years after Milgram’s notorious small-world test,
Russell Bernard and Peter Killworth did what they called a “reverse small-world
experiment”. They interviewed hundreds of individuals, explaining Milgram’s experiment
and asking them what personal criteria would they use to get a specific package to
someone they didn’t know. Bernard and Killworth’s study found that most of the subjects
used only a couple of dimensions to get their message sent to the next recipient. Most
predominant dimensions were geography and occupation.


Jon Kleinberg, a computer scientist who attended Cornell and MIT, was also motivated
by Milgram’s small-world study, and questioned how did the individuals actually found
the paths within the network. Kleinberg concluded that people have generally a strong
sense of distance, which they use to distinguish themselves from others. A notion of
                                                                                              18
distance can have several factors in which geographical distance is just one of them.
Profession, race, religion, income, class, education, are other elements added to the
equation, that describe how distant a specific person is from us.


From the beginning of human existence, communities were created for the benefits of
their own members. Usually by means of expediency, either in relation to the exchange
of goods or improved security against enemies, these groups of people occurred as
emergent systems by means of social convenience. Geography always played an
essential role and without a common shared space most of these communities wouldn’t
even exist. With the posterior developments of mail, and more recently, telephone, telex,
and fax, human communication became highly enhanced and geography started
diminishing its major influence. However, these new “technologies” only improved the
way people communicated with each other, by giving them more tools and decreasing
the time span and subsequently the distance; other then that, there were no major
changes in the way social communities were formed. No matter how fast and easy it
became for someone in Europe to talk with someone in America or China, there were
never communities created on the basis of telephone calls.


If we explore the word syntax structure of most communication tools prior to the Internet,
such as telegraph, telex, telegram, and telephone, we encounter the constant presence
of the prefix tele-. Tele is a greek word that means “at a distance”, usually implying “to
be distant” or “over a distance”. The first use of the prefix tele was in the word telescope
which was actually adapted from Galileo’s Italian word telescopi, followed by the word
telegraph, meaning “writing at a distance”. Therefore, Telecommunications is the field
that embodies all the systems that intent to communicate “at a distant” or “over a
distance”. Once again we see the importance of geography as a crucial domain for
human communication, where the advancement of technology, since the beginning, has
been trying to diminish its constraints, by allowing people to communicate over an ever-
present and disturbing distance. I find this analysis particularly interesting in such a way
that the Internet, and all features associated with it, has completely abandoned the prefix
tele-, drastically assuming the medium, and replaced it with the prefix e-. From e-mail, to
e-commerce, and e-business, the prefix e- is usually associated with the latest heat of
technological revolution, an abbreviation of the word electronic and an obvious
association with the word cyber.


                                                                                             19
The advent of the Internet and the World Wide Web changed these secular communal
constraints, possibly forever. The Internet became not just a medium for social gathering
and communication, but it absorbed it, and the medium became truly the message. The
transmission of information on the Internet is regularly measured in milliseconds, and the
time it usually takes for a message to leave a computer in Tokyo and arrive at a
computer in New York is more or less the same as a message sent to you, from your
next-door neighbor. The difference is merely a few milliseconds, which is by itself a
measurement difficult to perceive. Geography, as a crucial criterion for the birth of social
communities, has been utterly disregarded by online social communities. Without the
limitations of geography and physical interaction and identification, online communities
had to rely on a more abstract, but equally distinguishing criteria, interests. By analyzing
most current online communities, from online players to chat rooms, blogs and
newsgroups, we find out that in the absence of physical recognition, social values like
trust, confidence, respect and even friendship are ultimately based on a set of shared
interests. And of course, this “virtual” interaction would not be possible without specific
communication channels, portrayed as technological sub-systems of the larger medium,
the Internet.


Personal interests are a central element of our social identity, and subsequently, a highly
considered factor in relationships. Paraphrasing Duncan Watts in regards to peer-to-
peer networks, “social identity is what leads networks to be searchable”. The fabulous
aspect of online communities is the possibility of not only searching these clusters of
shared interests, but also tracking the exchange of conversations, ideas and messages
between them. By analyzing this data, it’s possible to understand, to some extend, how
information travels through these virtual environments. Weblogs, in this conjecture,
represent units of a remarkable social laboratory. It’s relatively easy to track their
connectivity, but also, due to their highly clustering nature, it’s possible to examine in
specific communities, how do news and trends travel through individual bloggers.




                                                                                              20
3.2 Weblogs
Weblogs (alternate: blogs) are not just a new fad among Internet users and they are
much more than a collection of online digital diaries of spread interest groups. Blogs
represent a change in online information flow and they are becoming a rising news
source for many people. We might not even be aware of how influential blogs will be in
the future but one thing is sure, there are currently blogs with close to half a million
visitors a day, more than many large newspapers, magazines and news broadcasters.


Jorn Barger coined the term in 1997 and in 1999 Peter Merholz coined its alternative
abbreviation “blog”. As Jorn Barger stated:


    quot;Weblogs are often-updated sites that point to articles elsewhere on the web, often with
    comments, and to on-site articles. A weblog is kind of a continual tour, with a human guide
    [whom] you get to know. There are many guides to choose from and each develops an
    audience. There's camaraderie and politics between the people who run weblogs. They
    point to each other in all kinds of structures, graphs, loops, etc.quot;



The most common definition of a blog is that of an online diary of thoughts, links, events,
or actions posted on a web page with a dated log format. These posts are often, but not
necessarily, in reverse chronological order, and are updated on a daily or very frequent
basis with new information about a particular subject or range of subjects. Despite this
dry classification, the usefulness of a weblog is incredibly rich.


Blogs are the vital elements of the personal publishing revolution. If we go back a few
years, before the rise of online publishing, the only way someone could write something
for general public would be through a letter to the editor, and hope for its message to be
published in the magazine’s next issue. For the first time in the history of human
communication, any single person has the opportunity to reach millions with their
message, as the cliché proclaims, with “the touch of a button”. Instead of being passive
consumers of information, Internet users are becoming active participants. This power to
the people is debatably a positive trend, since many people subjectively consider this
measure adds to the existent “junk” flowing on the Web. Since most blogs don’t obey to
any kind of editorial process or peer review and sometimes “play” with anonymity, their
public posts also raise legal concerns about intellectual property, defamation, and alike.
                                                                                                  21
Controversies apart, blogs, as the World Wide Web, are free democratic resources that
embody the concept of free speech, which is unquestionably a right for all.
Blogs also exemplify the true concept of diversity. Besides being oblivious to who might
use this personal tool, blog content is as varied as the Web itself. The authors of
Essential Blogging explain this diversity by pointing out that “creating a taxonomy of the
blogiverse is a fruitless task”, since “there’s no good, central directory of blogs that puts
each one in its own pigeonhole, because even the most topical blogger will stray from
the subject from time to time to celebrate some personal victory or warn his readers off a
terrible movie”.


One might also argue that in fact, this personal publishing revolution started with the first
website, and consequently with the birth of the Internet. This is obviously true, however,
until the first blog publishing tools became available, anyone who wanted to circulate
their own ideas online, had to be fluent in HTML, web hosting, and aware of most
webdesign applications available. Even after GeoCites launch in 1996, offering free web
hosting to non-commercial personal pages, web pioneers had to be HTML-savvy people
who would spent the evenings working on their websites. Also, these few personal
webpages that start populating the Web in the mid 90’s were just a scattered collection
of isolated opinions, with no regular updates and unconnected from each other. The big
blog phenomenon started escalating in the summer of 1999, when a small web company
called Pyra Labs released a product called Blogger. From that point on the blog
community exploded and the more bloggers came into scene, more online blog tools
became available. This was the beginning of the personal publishing revolution.


The inclination towards personalization is reaching every industry, from clothing to cars,
from software to medicine. News and Information are just new elements added to the
equation. In my opinion, the reasons why many blogs are so successful are due to two
major factors: personalization and comforting lassitude. Blogs are usually maintained by
a single person who filters the huge amount of available information according to his/her
own preferences. For people who share common interests with the blogger, it’s not only
exciting to get information from that source, since it’s going to match their inclination to
some degree, but it also saves them a lot of time by avoiding the large, more abstract,
and sometimes incongruent, news sources. In countries such as the US, where large
media sources are becoming increasingly dry and biased, blogs might also represent an
oasis of independent information.

                                                                                               22
3.3 Blogosphere
Blogosphere (alternate: blogsphere), or blogspace, is the collective term encircling all
weblogs (alternate: blogs). It’s almost impossible to determine with precision the existing
number of weblogs, or even the ones currently active. Technorati is a leading search
engine for the blogosphere, similar to Google or Yahoo, but exclusive to blogs.
Technorati, as of February 2005, was tracking 7,245,866 blogs, and this number is far
from stagnating. Out of curiosity, when reviewing this paper on April 6, 2005, I checked
Technorati to see how the latest number had changed. To my not-so-surprised
amazement, Technorati declared to be tracking 8,469,023 weblogs. It translates in an
increase of more than 1 million blogs in less then two months.


The latest Pew Internet study estimates that about 27%, or about 32 million, of American
Internet users are regular blog readers. They say a new weblog is created every 2.2
seconds, which means there are about 38,000 new weblogs a day. Bloggers update
their blogs regularly; there are about 500,000 posts daily, or about 5.8 posts per second.


When we’re faced with a number of blogs higher than eight million (at least), it becomes
hard to consider its whole as a single community. The blogosphere, in analogy to its
medium, the Internet, does not represent a single community but a vast collection of
endless communities. These communities shape a complex web of more than 8 million
nodes and are key factors in the outburst and further development of trends, fads and
innovations. Also, due to its inherent diversity, any kind of classification regarding the
blogosphere is a mere exercise of oversimplification.




                                                                                             23
4 Audience

Scientists/Researchers on Complex Networks

Hopefully, Blogviz will offer a significant step in this long scientific journey towards the
understanding of the dynamics of complex networks. To all researchers, academics, and
scientists that have been persistently and bravely disentangling the networks around us,
I truly hope this model can produce one important footprint in this expedition. It doesn’t
have to be gigantic, just one step forward. By bringing my visual expertise and interest in
Information Architecture, Data Visualization and Interface Design, I expect to make a
small corner of the vast Science of Complex Networks more clear and understandable.
This corner embodies the domain of Online Social Communities and the phenomenon of
blogging.


Sociologists

Professionals, Researchers, Faculty and Students. Blogviz will offer an interesting case
study for analyzing a dynamic, ever-changing and complex online social network – the
Blogosphere. To map a word-of-mouth spreading in social communication has been,
until now, an almost fruitless task. Blogs in the other hand offer an engaging
experimental laboratory to better study and understand this occurrence. Memetics is an
expanding field of study in social sciences, which is being explored by a significant
number of researchers. Blogviz, by making a parallel between meme propagation and
topics diffusion in blogspace, makes an important contribution to the understanding of
Memetics.


Information Architects and Data Visualization enthusiasts

Professionals, Researchers, Faculty and Students. I look forward that my passion and
fascination for the field of Information Architecture and Data Visualization can be
reflected in my thesis project. I truthfully hope that Blogviz can be a relevant precedent in
some of your projects, deserve a mention in your research, inspire or influence you at
some level.




                                                                                               24
Cultural Critics

Blogging presents one of the most intriguing and captivating phenomenons of our time.
We might be in for a long ride in the adulteration of most publishing media
conglomerates. We cannot really predict the ultimate result of this major drift in the flow
of online information, but one thing is sure, it has already started. Blogviz will offer an
enhanced insight on the mechanics of this contemporary revolution.


Marketers

Possibly, the only open door to an eventual commercial viability for the application is
based on its relevance for the Marketing industry. Even if Blogviz is a non-commercial
research project, it is reassuring to know that it’s potentially useful outside the research
and academic realms. Like sociologists, marketers have become more and more
interested in the word-of-mouth behavior, even though the more traditional marketing
strategists haven’t minimally explored this concept. In the blog community, most
bloggers are incorporating the idea of syndication in their blogs, in the form of a data
XML file, called RSS, which is basically a list of post summaries and links to them.
These files can then be interpreted by a desktop application called a RSS Aggregator,
and read by the user without the need to access the specific website. Some consider
RSS to be the future of news distribution, and that might well be the case, which
explains why, as in any communication medium, advertisement is now starting to
infiltrate RSS Feeds. The potential use of Blogviz in this assertion is huge. Marketers
interested in investing in the best RSS blog sources for advertisement, could easily track
most seen blogs, locate the innovators, the followers, the major dispatchers of
information, and then explore the conclusions accordingly.


Bloggers

Blogviz is a visualization model build to better understand the information dynamics
within the blog community. By that order, any interested blogger who feels the need to
comprehend the underlying network that he’s part of is a potential user of my research
project.




                                                                                              25
5 Precedents

The chain of influences and inspiration for my thesis project is, as expected, extremely
widespread and goes from new media art, information architecture, data visualization,
complex networks, interface design, among so many other fields, and life in general.
Even if I started enumerating major key thinkers whose work I admire and respect, and
subsequently absorbed for myself, I expect many names would still be unmentioned
from the extensive list of people. In enunciating the key precedents for my thesis, I
concentrated exclusively in projects developed in the area of Online Social
Communities, my closest encircling thesis domain. Since the major goal of my thesis is
to visually map a specific diffusion pattern and the connectivity among blog communities,
I decided to establish as precedents, projects that make extensive use of a visual
structure to portrait their field of research.


5.1 Blog Epidemic Analizer
Authors: Eytan Adar, Li Zhang, Lada Adamic, Rajan Lukose
Institution: HP Information Dynamics Lab
URL: http://www.hpl.hp.com/research/idl/papers/blogs/index.html


Description:

HP Information Dynamics Lab created the Blog Epidemic Analyzer as part of their
research on information propagation. They released their paper “Implicit Structure and
the Dynamics of Blogspace” as a result of this research. Eytan Adar, Li Zhang, Lada
Adamic, and Rajan Lukose, used the search engine BlogPulse to map the behavior of
the blog community from May 11 to May 21, 2003.


Relevance:
This project is the closest to my thesis ambition and it obtained exciting results that
became pertinent in selecting specific parameters for my work. Although highly useful as
a research project, their few tryouts in terms of visualization were extremely poor. Their
major breakthrough was announcing that the most popular blogs are not the most
innovative, by commonly “stealing” news and information from smaller, less-known blog
sources. I believe it’s a very significant allegation that decisively influences the way we
understand the mechanics of blog communities.




                                                                                              26
5.2 Loom2
Authors: Danah Boyd, Hyun-Yeul Lee, Ethan Perry
Institution: Sociable Media Group - MIT Media Lab
URL: http://smg.media.mit.edu/projects/loom2/




Author’s Description:
“The goal of our research is to use the salient features of social interaction to build a
‘legible’ interactive visual representation of Usenet. We started by exploring the Usenet
environment, constructing a series of relevant questions. From the questions, we have
started to explore how this information can be derived from the textual data available
online. Simultaneously, we have started designing segments of visualization, under the
assumption that the desired characteristics were ascertainable.”


Relevance:
This project is a major aesthetical inspiration. I believe the use they make of a radial
structure fits the purpose of the project quite well, where specific degrees relate to a time
dimension and nodes’ colors to specific theme categories. Usenet represents a subject
of analysis closely related to blogging, since message/post threads in newsgroups have
a similar pattern of contamination as topics among the blogosphere. For the construction
of their appealing visual models it’s not surprising the amount of work they had to
undertake: “To build our designs, we drew on a wide variety of theoretical and practical
concepts from a range of fields, including graphic and interactive design, architecture,
sociology, and computer animation.”




                                                                                            27
5.3 Social Network Fragments
Authors: Danah Boyd, Jeff Potter
Institution: Sociable Media Group - MIT Media Lab
URL: http://smg.media.mit.edu/projects/SocialNetworkFragments/index.html




Description:
“Social Network Fragments was developed as a self-awareness tool for individuals to
explore the social networks that they create without structural consideration”. Its goal
was to “help users examine their structure so as to unveil the structural holes that are
built in such complex networks. These structural holes exist when users choose to
fragment portions of their network, often revealing facets of their own identity. As an
individual interacts with a diverse range of people, they are motivated to reveal different
aspects of their identity, thereby creating a multi-faceted social identity, whereby
different people know different things about the individual. In engaging in this behavior,
individuals start to segment their social network into a variety of different clusters, or
types of people.”


Relevance:

The visualization of social networks undertakes a major leap in many of the projects
produced by the Sociable Media Group (SMG) at MIT Media Lab. With some amazing
visual displays the SMG “investigates issues concerning society and identity in the
networked world”, addressing questions such as “How do we perceive other people on-
line? What does a virtual crowd look like? How do social conventions develop in the
networked world?”. Social Network Fragments aims at something so extraordinary as
mapping someone’s unnoticed social network. Although it may seem simple and intuitive
to track any individual connections to others, this project tries to reach further more then
the immediate first-degree acquaintances, by reaching a friend-of-a-friend network.

                                                                                             28
This approach to small world theory has been pursued by some companies, which sell
products focusing on social networking management. The idea is simple: don’t just get to
the people you know, get to the people they know. Manage your friend-of-a-friend
network in order to find the shortest path for whatever you’re looking. Among the leading
companies incorporating this concept are: Spoke Software, Visible Path, SRD and In-Q-
Tel. Social Network Fragments offers a reasonable visual solution, where I believe some
improvements could be implemented. By emphasizing the visual criteria solely on text,
color and depth (simulated 3rd dimension), the interface becomes somehow limited to
fully explore its content.



5.4 PostHistory
Author: Fernanda Viégas
Institution: Sociable Media Group - MIT Media Lab
URL: http://web.media.mit.edu/~fviegas/posthistory/




Author’s Description:

“Most of us deal with email on an everyday basis and some of us have been doing so
for several years. Nevertheless, it is hard to perceive the accumulation of this frantic
activity, it is hard to get a sense of the number of messages sent and received, not to
mention how difficult it is keeping track of how many people have written to you or
received messages from you. The aim is to provide users with a novel and hopefully
richer experience of their email activities. PostHistory represents an opportunity for
reflection and insightful monitoring of fundamental patterns of interactivity. The
visualization aims at impressing on the user a sense of daily accumulation, of growth
and scale – dimensions not normally conveyed on current email applications.”




                                                                                           29
Relevance:

Fernanda Viégas, a brazilian graduate student at MIT Media Lab, is a prolific new media
designer that has been involved in many relevant projects. PostHistory is one of her
best. What I find most interesting in this project is the series of new structures and
features she proposes in order to better understand the pattern created by e-mail
activity. This project is visually innovative and it’s a quite an impressive contribute to the
field of Information Visualization. Another project conceptually related to PostHistory is
Thread Arcs, a fresh interactive visualization technique designed to help people use
threads found in email. Thread Arcs, which resulted in a published paper, is a truly
interesting visual approach to e-mail threads and even to small sized graphs. This
concept is part of a major E-mail Application developed by the Collaborative User
Experience team at IBM Research. ReMail is being developed for almost a decade and
it aims at improving the knowledge of how people use e-mail, and also, make that
experience more functional and straightforward. Some of its features are very
encouraging.




                                                    Thread Arcs
                                                    ReMail (IBM Research)




                                                                                             30
5.5 Social Circles
Author: Marcos Weskamp
URL: http://marumushi.com/apps/socialcircles/




Author’s Description:
“Social Circles intends to partially reveal the social networks that emerge in mailing lists.
The idea was to visualize in near real-time the social hierarchies and the main subjects
they address. When subscribing to a mailing you never know who the principals are,
how many people are listening or what subjects they are talking about. It's like entering a
meeting room with plenty of people in the darkness and then having to learn who is who
by just listening to their voices. Social Circles does not pretend to be a statistical
application, but rather aims to raise the lights in that room just enough to let you
enhance your perception of what’s happening.”


Relevance:
Marcos Weskamp is a key thinker in digital information design and a major personal
influence. Newsmap, Weskamp’s most famous project, and one of the best online
examples of data visualization, gathers google news and displays it in an innovative tree
structure map in several languages (http://www.marumushi.com/apps/newsmap). In
Social Circles, even thought Marcos Weskamp doesn’t push the project far from the
most common network visualization schemas, its concept is very strong, particularly in a
recent version of it, where the user can map its own inbox of e-mail messages.




                                                                                           31
5.6 WebFan
Author: Rebecca Xiong
Institution: Sociable Media Group - MIT Media Lab
URL: http://www.sbox.tugraz.at/home/k/koebi/WebFan%20Description.htm




Author’s Description:
“WebFan visualizes user activities at WebBoards, or Web-based message boards,
which contain messages posted by users. It uses the reply structure of the messages to
lay them out using a fan-like hierarchical structure. This abstract structure allows a large
set of Web pages with multiple levels to be represented at the same time for overview
and comparison. Users can also interactively explore the fan structure to find out more
about individual pages. Dynamic user activity is overlaid on top of this display.”


Relevance:

“Currently, Web users have little knowledge about the activities of fellow users. They
cannot see the flow of on-line crowds or identify centers of on-line activity.” WebFan
seeks to enrich this experience by visualizing the activity of other people in the
message boards. I believe this is a very relevant project, particularly for the
unconventional medium of WebBoards, that Rebecca Xiong chose to map. WebFan
relates to my thesis project by visualizing overall patterns of usage and answering
questions such as: What are people looking at? What is hot? Where do clusters of
similar interests form?




                                                                                         32
5.7 Visual Who
Author: Judith S. Donath
Institution: Sociable Media Group - MIT Media Lab
URL: http://smg.media.mit.edu/people/Judith/VisualWho/VisualWho.html




Author’s Description:

“The population of a real-world community creates many visual patterns. Some are
patterns of activity: the web and flow of rush hour traffic or the swift appearance of
umbrellas at the onset of a rain-shower. Others are patterns of affiliation, such as the
sea of business suits streaming from a commuter train, or the bright t-shirts and sun-
glasses of tourists circling a historic site. Visual Who makes these patterns visible. It
creates an interactive visualization of the members’ affiliations and animates their
arrivals and departures. The visualization uses a spring model. The user chooses
groups (for example, subscribers to a mailing-list) to place on the screen as anchor
points. The names of the community members are pulled to each anchor by a spring,
the strength of which is determined by the individual’s degree of affiliation with the
group represented by the anchor”.


Relevance:
Visual Who, besides offering a motivating contextual precedent in relation to social
networks, portraits a tempting method of mapping social connectivity among a set of
individuals. It offers an interesting approach to pattern recognition and visualization,
although I think it suffers from the same inconsistencies pointed out in the Social
Network Fragments project.




                                                                                            33
5.8 Avatars 2002
Authors: Katy Börner, William Hazlewood, Sy-Miaw Lin
Institution: School of Library and Information Science, Indiana University
URL: http://ella.slis.indiana.edu/%7Ekaty/gallery/




Description:
This project originated a research paper: “Visualizing the Spatial and Temporal
Distribution of User Interaction Data Collected in Three-Dimensional Virtual Worlds”. The
project is a visualization of the social patterns in the Culture virtual environment, part of
the Quest Atlantis universe. The map shows user trails over time. It was produced using
a visualization tool developed by Katy Börner and colleagues at the School of Library
and Information Science, Indiana University.


Relevance:

The particular relevance of this project relies on its visual pattern analysis. I think the
underlying concept of being able to visually recognize different user trails on a 3D online
game is extremely captivating. In a virtual game, many times played with unknown
faces, the notions of time and space alter considerably, which makes this project
particularly challenging by trying to recreate a defined user trail pattern throughout a
physically undefined space.




                                                                                              34
5.9 PeopleGarden
Author: Rebecca Xiong
Institution: Sociable Media Group - MIT Media Lab
URL: http://www.infovis.net/E-zine/num_46.htm




Description:
PeopleGarden: Creating Data Portraits for Users proposes the “Data Portrait” as a
graphical medium for the visualization of information related to individual users of
interactive media. The visual metaphor that PeopleGarden uses is of flowers in a
garden. Each data portrait is the trace of the user’s activities and takes the shape of a
flower.


Relevance:
“On-line interaction environments such as Web-based message boards, chat rooms, and
Usenet newsgroups have become widely popular. As the number of participants rises, it
is increasingly difficult to distinguish individual users and to comprehend the overall
interaction context.” In PeopleGarden the representation of a vague virtual space
reaches its extreme by allowing it to be portrayed as a digital garden. The concept is that
flowers represent individuals in a chat room, and the more time a user stays active in a
conversation the more its flower can grow and expand. I think this project is conceptually
very strong as it presents an innovative visual method for representing a vague
unspecified space.




                                                                                            35
5.10 History Flow
Authors: Martin Wattenberg, Fernanda Viégas
Institution: IBM Watson Research Center
URL: http://researchweb.watson.ibm.com/history/index.htm




Author’s Description:

“The history flow application charts the evolution of a document as it is edited by many
people using a very simple visualization technique. History flow provides answers at a
glance to questions like, Has a community contributed to the text or has it been mostly
written by a single author? How much has a particular contributor influenced the current
version of the document? Is the text's evolution marked by spurts of intense revision
activity or does it reflect a smooth transition from its beginning to the present? The
current version of history flow visualizes the evolution of pages from Wikipedia”.


Relevance:
HistoryFlow is truly one of the most significant projects in reveling hidden patterns from a
set of data, otherwise unnoticed by the user. This feature is undoubtedly one of the key
strengths of Information Visualization. Using available data from the Wikipedia website,
the authors build an inventive visualization model for analyzing the evolutionary pattern
of individual contributions to Wikipedia articles through time. This visualization method
has some resemblance to Theme River™, developed by the Pacific Northwest National
Laboratory (PNNL), but it’s quite impressive the amount of conclusions history flow was
able to facilitate. In a lecture given at Parsons D+T Lab, on February 23, 2005, Martin
Wattenberg speaking on this project, mentioned that it takes an average of 2 minutes for
any kind of article vandalism to be noticed and repaired.




                                                                                            36
5.11 Listening Post
Authors: Mark Hansen, Ben Rubin
URL: http://www.earstudio.com/projects/listeningPost.html




Author’s Description:

“Listening Post is an art installation that culls text fragments in real time from thousands
of unrestricted Internet chat rooms, bulletin boards and other public forums. The texts
are read (or sung) by a voice synthesizer, and simultaneously displayed across a
suspended grid of more than two hundred small electronic screens.”


Relevance:

Although the toolset and the medium of this project are quite different from the screen-
based interactive application intended for my thesis, I believe this project is an amazing
precedent and one of the best installations I have ever seen. Exhibited at the List Visual
Arts Center, Cambridge, Mass, and the Whitney Museum of American Art, New York,
Listening Post has recently been awarded a prize at the Ars Electronica 2004 Festival.
Co-author Ben Rubin emphasizes the motivation for the project: “My starting place was
simple curiosity: What do 100,000 people chatting on the Internet sound like?”. The
significance of Listening Post is remarkable. It displays short messages, randomly
picked from chat rooms according to a specific set of keywords, and then, not only it
gives life to them by placing the messages in a specific spatial configuration, a
“suspended grid of more than two hundred small electronic screens”, but also gives
them a sound dimension, which makes the experience truly memorable. This large
display of small screens resembles a “window” overseeing the activity in cyberspace.




                                                                                           37
6 Methodology

6.1 Summer Research
My first presentation in the beginning of the Fall 2004 semester enclosed some of the
widespread research done through summer. It was entitled “Discovering Complex
Networks”. My approach to this first assignment was to face the presentation as a
lecture, by educating my audience about the engaging science of complex networks and
narrating all the discoveries and knowledge gathered in this initial phase.


The presentation contained explanations and diagrams about the specific properties of
scale-free networks and took a holistic view by showing diverse examples of complex
networks in different domains, as diverse as Gene Networks and Airline Routes. All the
images shown at this presentation can be seen in Appendix A – Summer Research
Presentation, at the end of this paper.


In order to better understand the successive steps that led me to the study of complex
networks one should consult the Impetus chapter on this Thesis. There I describe in
detail the evolution of my research inclination and motivation course.


I ended my Summer Research Presentation with a slide where I stated that my main
interest was to “Visually map a dissemination/propagation pattern in a scale-free
network”. I also made a short list of additional enquiries, where one could read:


> How does an idea, innovation, fad, trend, disease or virus travel from A to B in a
specific scale-free network?
> How long does it takes?
> How many nodes are affected?
> How do the hubs react?




                                                                                         38
I finally concluded the presentation by stating what were my future goals. “To choose an
area and subject to analyze, where I can bring something new to the field and contribute
to its development.”



6.2 Visual Explorations
After an extensive research on Complex Networks I started to delve into different ways
of visualizing them. The main premise was that complex networks are difficult to
visualize, but we don't need to make them more complex in the process of trying.


On September 27, 2004, I wrote the following in my thesis diary blog: “My thesis
assertion has always been the visualization of dissemination patterns in a particular
scale-free network. (…) However, I quickly found out that this premise is based on the
assumption that the target network displays a visual structure suitable for analysis.
Naturally, most of the time, this assumption is incorrect. Since a visual representation of
a dissemination pattern cannot exist without a functional visual representation of the
underlying network, I decided to dedicate my time, for now, to the visualization of
complex networks. I've been delving into a set of visual explorations, collecting problems
and proposing solutions.”

quot;Functional visualizations are more than innovative statistical analyses and computational algorithms. They
must make sense to the user and require a visual language system that uses colour, shape, line, hierarchy
and composition to communicate clearly and appropriately, much like the alphabetic and character-based
languages used worldwide between humans.quot;


Matt Woolman
Digital Information Graphics



                                                                                                          39
As acknowledged in another blog entry, also on September 2004: “I've tried several
open-source network visualization tools and seen hundreds of visualization examples. I
think I found a critical problem. In most tools I've seen, the user starts building its
network from an initial node. The user places the first node in the center of the drawing
board and then, node after node, link after link, the network starts expanding. Since
there's no preceding method of organizing the nodes and links in the designated area,
new nodes start naturally occupying any free space available. Unsurprisingly, after a
certain threshold, the lattice of lines and nodes becomes unbearable. This problem
happens so many times.”


The difference between this method and Mark Lombardi's drawings, for example, is a
question of organization. Instead of a bottom-up hierarchy described before, Lombardi
used to plan his overall design with a holistic view of the entire network, knowing
beforehand the amount of space he had and the exact number of nodes and links he
needed to draw. Because of this, the cleanness of his drawings, where rarely there's an
edge overlapping, is an excellent example of network visualization. What I cannot
understand is why Lombardi's method, and alike, aren't taken into consideration
whenever someone decides to build a visual representation of a network. A macro
approach to the problem is definitely more appropriate. A top-down hierarchy instead of
bottom-up. And to say Lombardi's networks where not complex enough is a mere
exercise of oversimplifying his work.




The beautiful and eloquent global networks
of Mark Lombardi




                                                                                            40
Besides the mentioned problem, I encountered two others in my research, which
contribute drastically to the huge amount of bad visualization examples of complex
networks. First, most visual applications are based in constructive algorithms that obey
one rule: display the inputted data. Rarely the notion of how the data is displayed is
considered. By that reason, often-stunning visual forms demonstrate a low level of clarity
and function. Second, usually programmers who built open-source applications and
scientists/researchers who use them, have no visual sensibility or graph drawing
knowledge. Many researches produce a visual model of the analyzed network as a mere
additional element for showing their research. Sometimes it adds nothing to it.


On my second thesis presentation in the Fall 2004 semester, I applied many of my
reflections and sketches to practical examples, proposing possible solutions to improve
the visualization of complex networks. I divided my solutions into five major steps:




The main slides of this presentation can be seen in Appendix B – Complex Networks:
Visual Explorations, at the end of this paper.




                                                                                           41
6.3 Prototype #1
This was my first visual prototype shown at the Fall 2004 mid-term review. This review
also marked the birth of the thesis title: Blogviz. The mid-term presentation was entitled
Blogviz: An experimental social laboratory. The underlying concept was based on a
major aspiration: nodes local stability and links global connectivity. The goal was to
map the connectivity among blogs. What I tried was to position the nodes in a structured
way, so they would remain fixed, and to some level, under control. The links, however,
would be in constant change and the outcome would be highly random and
unpredictable. The reason why I chose to sort all the nodes in a precise manner was to
be able to isolate the major hubs and have some control over the lattice resulting from
the links agglomeration. Looking at it now, it seems the result was too rigid and strict.
The radial diagram with its implosive structure reinforces the structure rigidness by
resembling a closed system that probably doesn’t describe so well the blogs
fundamental openness.




Blogviz Visual Studies – Prototype #1


I realized I had to take a different path. I was trying too hard to control the outcome and I
believe the result showed exactly that. I had to loose some of my constant need for
control and let the system be more auto-sufficient, self-organizing and adaptive.


As stated in my Thesis blog in October 24, 2004: “Another criticism I received during the
presentation was that I was being to concerned with the visual aspect of it, and that I
was thinking too much as a visual designer. Well, although I agree in part with the critic,


                                                                                            42
my thesis assertion has always been the visualization of a specific dissemination
pattern, and from my extensive research in complex networks, I truly believe that the
only way I can positively contribute to this field is by employing my visual and interface
design knowledge. In my first prototype presentation I dissected several problems on the
visualization of complex networks and proposed distinct solutions that might solve some
of its inconsistencies. I believe there has to be a balance between highly complex
network visualizations that offer a poor functionality and highly aesthetic/innovative
visual representations that might suffer from the same dilemma. I just have to pursue
that balance.”


On this same presentation I also illustrated some of my initial studies regarding the
linkage among blogs. Connectivity in the blogsphere is a very binary process; we only
need to make two questions. Is blog A connected to blog B? If so, who is linking whom?
If none of them is linking to the other, they become momentarily isolated islands. For that
presentation I showed a few visual studies where I mainly explored the concept of
directional linkage, by visualizing inbound or outbound links, or putting it simple, who is
linking whom. The images below portrait some of these explorations.




                                                                                              43
6.4 Prototype #2
While on my first prototype I was trying to deal with a structured way to map connectivity
among blogs, by isolating the hubs and sort the nodes according to popularity, on my
second prototype, I basically explored possible ways of visualizing diffusion patterns
over time. I tried several models based on a radial structure where time became the
major imposing element. In most of these experiences I faced a common problem in
representing a continuous flow of infected blogs. The underlying radial structure seemed
to impose its rigidness by enforcing fractures in the pattern, particularly whenever there
was a day transition.




Blogviz Visual Studies – Prototype #2




                                                                                         44
Blogviz Visual Studies – Prototype #2




Blogviz Visual Studies – Prototype #2




                                        45
I quickly found out I had to make a change in my visualization thinking, since a radial
structure didn’t quite apply to my subject of analysis. Perhaps I was too much influenced
or distracted with the Radial Form of Organization Chart from the Alexander Hamilton
Institute or Loom 2, by Danah Boyd (et al).




Radial Form of Organization Chart (1924)           Loom2 - Danah Boyd, Hyun-Yeul lee, Ethan Perry
Alexander Hamilton Institute                       Sociable Media Group - MIT Media Lab



As I wrote in my thesis blog on November 16, 2004: “At the moment I’m becoming
convinced that a horizontal array is truly the best way of representing the quantitative
and temporal qualities of a pattern. Time is a crucial domain in a dissemination pattern,
particularly in a word-of-mouth social behavior. The amazing potentialities of a horizontal
assortment is the uninterrupted continuous flow of data and the possibility of collapsing
time frames and still maintain a sense of scale and understanding of the pattern
dynamics.”




Blogviz Visual Studies – Horizontal array of adopting units




                                                                                                    46
Blogviz Visual Studies
Different tryouts where adopting units (blogs) are structured
in a vertical and horizontal array


After this critical change in my visualization studies I started doing a lot of sketching and
writing. I built a few diagrams to get a full understanding of my system; built several
taxonomies and dissected the mechanics of blogging. This examination helped me
putting my ideas straight and getting a sense of what I was dealing with.



6.5 Prototype #3
On my third prototype I introduced Blogviz as a “topological model of meme behavior”.
From the conclusions of my previous tryouts, I decided to deeply explore the notion of a
horizontal array of adopting units (weblogs) to portrait the propagation pattern of a
specific topic. By doing that I would be constraining the Time element to the X axis. The
following images represent a series of tryouts in this context.




                                                                                            47
48
On this phase of the project I also introduced the first visual taxonomy of blogviz, by
dissecting the system and its intrinsic elements. The following image portraits a critical
understanding of the inherent structure of blogviz at that stage.




                                                       At the same time, a list of goals was created
                                                       (left image) in order to better understand the
                                                       intent of Blogviz.




                                                                                                        49
6.6 Prototype #4
From a series of independent and spread visual studies that characterized the initial
trials, this fourth prototype was the first solid tryout for acknowledging Blogviz as an
interactive visualization model. At the time I was pushing the concept of application or
tool of analysis, which according to some critics was implying a need for commercial
viability. Even though I’m convinced this thesis has several elements that could be
successfully applied in commercial applications, my goal with this project is to elevate
the understanding of Memetics in a specific social network and conduct a serious
research experiment, which I believe fits more adequately within the academic realm.


Another point worth of consideration is that, when developing this prototype, Blogviz was
intended to work with real-time data, in the form of hourly updated XML RSS feeds. This
idea changed afterwards, however, it was a crucial deliberation in the development of
this prototype.




Prototype #4 – Default First Page


                                                                                           50
A quick explanation on the previous image’s visual schema is that circles represent
topics; the diameter corresponds to the total number of adopting blogs; and the colors,
pink and green, denote respectively, a decreasing or increasing course. Time is again
incorporated in the X-axis, where the closer a circle is from the right edge of the window,
the more recent was its last dispatch. The Y-axis position of each circle helps reinforce
its level of adoption.


The main interaction on this fourth prototype was based on a simple flow. The default
first page would allow a swift view on the general pattern by showing the overall
condition of current topics popularity. If one decided to investigate more deeply the
structure and evolution of a particular topic, it would be taken to a sequence of
examination methods. The following images illustrate some of the techniques proposed.




Prototype #4 – Blogs’ evolutionary paths through time




Prototype #4 – Plotting blogs according to time/popularity




                                                                                            51
Prototype #4 – Detailed View                               Prototype #4 – Detailed View




Prototype #4 – Blogs’ adoption represented by a Tree Map   Prototype #4 – Blogs’ analysis by Theme and Generator




Prototype #4 – Blogs’ relationship analysis




                                                                                                              52
6.7 Final Application
A major drift in the development of Blogviz was the decision of not incorporating real-
time data for the backend of the application. As previously stated, on my fourth prototype
I was mostly concentrated on developing a visualization schema that would expose
current trends in the topics diffusion process, by reading data from hourly updated XML
feeds. It would basically display the most adopted topics spreading in the blogosphere in
any given time. Even if the application allowed an extended breakdown of each topic
other then just a quick view at the present information tendencies, it was just considering
a restrict number of topics. I believe Blogviz’s concept, at that phase, was trying to
incorporate to many features, or levels of analysis, without being able to develop one
efficiently. It was also becoming a trend analysis tool rather then a comprehensive model
of topics distribution. I wanted Blogviz to become a serious visualization study on
information diffusion in blogspace, and not so much a marketing application. I still
believe there’s enormous potential on visualizing popular topics with real-time data
integration, and that might be something Blogviz will incorporate in the future. However, I
first wanted to better understand the topics’ inner structure and evolution through time.
This change in Blogviz progress also coincided with a parallel immersion in the
domains of Epidemiology and Diffusion of Innovations Theory.


I never imagined that an apparent minor adjustment would require such a drastic
turnaround in the project’s conceptualization. Until now, Blogviz had been dealing
with a very restrict and manageable time span. Real time data visualization was
merely constrained to one day, or at the most, one week. In opposition, by aiming at
an adaptive model, the critical goal was to come up with a visualization method that
could easily include time variations and still be consistent. Another crucial problem
was to visualize, in a very tight space, a high number of topics.


I had to come up with a visualization model that would answer these last two
problems accordingly. First, it had to be flexible enough to embrace distinct time
spans, but at the same time maintain uniformity throughout the process. Second, it
had to be able to include a high number of topics, and also, allow an immediate
understanding of the overall pattern and the individual life cycle of each topic.




                                                                                            53
On the process of looking for inspiration in diverse sources, I came up with an
elucidating diagram by E. J. Marey, on Edward Tufte’s The Visual Display of Quantitative
Information, that resolved particularly well many of the challenges I was facing.




Original Image: E. J. Marey, La Méthode Graphique (Paris, 1885), p.20.
Source: Tufte, Edward R., The Visual Display of Quantitative Information



The preceding image illustrates Marey’s graphical train schedule for Paris and Lyon in
the 1880’s. The X-axis incorporates Time, measured in hours, and maintains the same
scale in both the top edge (corresponding to departures and arrivals from Paris) and the
bottom edge (for departures and arrivals from Lyon). The remaining horizontal lines
represent other train stations between Paris and Lyon. The diagonal lines represent
different trains, leaving and arriving from the two main stations, and the horizontal line-
breaks represent waiting time in secondary stations.


This chart influenced me greatly in the following steps of my project. I believe it is an
extraordinary example of information visualization, where time and pattern become one
intrinsic entity, allowing a substantial understanding of the data dynamics in one brief
look.


I applied a modified version of this concept to Blogviz, where the lines became
representative of topics, and the time scale was measured in days. Blogviz’s model
doesn’t incorporate any type of constraint on the Y-axis, as Marey’s graph does,
therefore the overall height of the main window is rather arbitrary. The following image
represents the main visualization window for topics’ evolution within the Blogviz
environment.



                                                                                              54
Blogviz’s topics visualization – Topic Lines and Time Scale



The interesting characteristic of this model is that, as in the Paris/Lyon train schedule
example, the angle of each line has a specific meaning. This happens because both top
and bottom edges of the window maintain the same time scale. Therefore, the wider the
angle, the shortest is the duration, in this case, the topic’s duration. On the image above
for example, one may see a line, close to the center of the window, which seems to be
almost vertical; what it means is that the life cycle of that particular topic was very short.


This feature is even more relevant for topic lines that have either the starting or ending
point outside the present timeframe. I conducted a small experiment within the same
model, where the lines, instead of their diagonal placement, were drawn horizontally.
This method was probably even more successful when the lines had the starting and
ending point inside the selected time span. However, when topic lines had a first day or
last day of spreading outside this frame, it would be unpredictable to calculate the
amount of days beyond it. What the diagonal alignment facilitates is a full understanding
of the topic’s life cycle, even when it spreads outside the present time span.


To better understand the intricacies of this visualization model, the following images
illustrate the four possible life cycles for every topic line, within each timeframe, and the
way they are represented.




                                                                                             55
Topic with first and last day of spreading within the current time span




Topic with first day of spreading outside the current time span




Topic with last day of spreading outside the current time span




                                                                          56
Topic with first and last day of spreading outside the current time span




The prediction line angle for outsider dates is made through an equation that multiplies
the number of days (topic duration) by the number of pixels of each day parcel. So if a
specific topic line has the starting point (first day of spreading) within the present
timeframe, the last day outside of it, and its total days are 64; the system multiplies 64 by
12 (number of pixels of a day parcel) from the starting point, and as a result, a line is
drawn dynamically to the resulting end point.


Another feature of this visualization method, further explained in the following Blogviz
Interface section, refers to the brightness or color saturation of each line. In Blogviz, the
default setting for the lines’ brightness is a depiction of the total number of adopting
blogs. This allows for a comprehensible insight when evaluating the overall pattern. On a
brief look, one is able to identify the life cycle of each topic, and also, the number of
blogs that adopted it.


I like to consider the visual representation of this model as a metaphor of a window,
overlooking cyberspace, where lines of information flow continuously cross it.




                                                                                            57
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace
Mapping the Dynamics of Information Diffusion in Blogspace

Contenu connexe

Similaire à Mapping the Dynamics of Information Diffusion in Blogspace

Tics Article 6 Ideas
Tics Article 6 IdeasTics Article 6 Ideas
Tics Article 6 IdeasXimenaBonilla
 
Tics Article 6 Ideas
Tics Article 6 IdeasTics Article 6 Ideas
Tics Article 6 IdeasXimenaBonilla
 
Connectivism firstdraft
Connectivism firstdraftConnectivism firstdraft
Connectivism firstdraftAaron Johannes
 
lankidetza iaskuntza
lankidetza iaskuntzalankidetza iaskuntza
lankidetza iaskuntzamagisdonosti
 
Conole brisbane seminar
Conole brisbane seminarConole brisbane seminar
Conole brisbane seminargrainne
 
The role of online social networks in inter-firm collaborative innovation and...
The role of online social networks in inter-firm collaborative innovation and...The role of online social networks in inter-firm collaborative innovation and...
The role of online social networks in inter-firm collaborative innovation and...Dr. Rob Duncan
 
Face Research 3.0 WOMUK 251109
Face Research 3.0 WOMUK 251109Face Research 3.0 WOMUK 251109
Face Research 3.0 WOMUK 251109WOMMA UK
 
Identifying and Responding to Emerging Technologies
Identifying and Responding to Emerging TechnologiesIdentifying and Responding to Emerging Technologies
Identifying and Responding to Emerging Technologieslisbk
 
Sci cafe2.0 caps concertation presentation
Sci cafe2.0 caps concertation presentationSci cafe2.0 caps concertation presentation
Sci cafe2.0 caps concertation presentationCAPS2020
 
Abc 2017 Informing professional social media practice through virtual communi...
Abc 2017 Informing professional social media practice through virtual communi...Abc 2017 Informing professional social media practice through virtual communi...
Abc 2017 Informing professional social media practice through virtual communi...J'ette Novakovich
 
Social media for researchers: Increase your research competitiveness using We...
Social media for researchers: Increase your research competitiveness using We...Social media for researchers: Increase your research competitiveness using We...
Social media for researchers: Increase your research competitiveness using We...Xavier Lasauca i Cisa
 
Re-Building a Tech Community - Post Pandemic!
Re-Building a Tech Community - Post Pandemic!Re-Building a Tech Community - Post Pandemic!
Re-Building a Tech Community - Post Pandemic!Jen Looper
 
Social Media communication for knowledge management in a multi-partner setting.
Social Media communication for knowledge management in a multi-partner setting.Social Media communication for knowledge management in a multi-partner setting.
Social Media communication for knowledge management in a multi-partner setting.Seb Maje
 
AITD Web2.0: Changing the learning landscape Sep07
AITD Web2.0: Changing the learning landscape Sep07AITD Web2.0: Changing the learning landscape Sep07
AITD Web2.0: Changing the learning landscape Sep07Anne Bartlett-Bragg
 
Science and Web2.0
Science and Web2.0Science and Web2.0
Science and Web2.0Ian Mulvany
 
Learning as a Social Process
Learning as a Social ProcessLearning as a Social Process
Learning as a Social ProcessRobert Cormia
 
Deanz2008 Stewarding Technology for Learn
Deanz2008 Stewarding Technology for LearnDeanz2008 Stewarding Technology for Learn
Deanz2008 Stewarding Technology for LearnNancy Wright White
 

Similaire à Mapping the Dynamics of Information Diffusion in Blogspace (20)

Market Research Online - Francesco D'Orazio
Market Research Online - Francesco D'OrazioMarket Research Online - Francesco D'Orazio
Market Research Online - Francesco D'Orazio
 
Tics Article 6 Ideas
Tics Article 6 IdeasTics Article 6 Ideas
Tics Article 6 Ideas
 
Tics Article 6 Ideas
Tics Article 6 IdeasTics Article 6 Ideas
Tics Article 6 Ideas
 
Connectivism firstdraft
Connectivism firstdraftConnectivism firstdraft
Connectivism firstdraft
 
lankidetza iaskuntza
lankidetza iaskuntzalankidetza iaskuntza
lankidetza iaskuntza
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Conole brisbane seminar
Conole brisbane seminarConole brisbane seminar
Conole brisbane seminar
 
The role of online social networks in inter-firm collaborative innovation and...
The role of online social networks in inter-firm collaborative innovation and...The role of online social networks in inter-firm collaborative innovation and...
The role of online social networks in inter-firm collaborative innovation and...
 
Face Research 3.0 WOMUK 251109
Face Research 3.0 WOMUK 251109Face Research 3.0 WOMUK 251109
Face Research 3.0 WOMUK 251109
 
Identifying and Responding to Emerging Technologies
Identifying and Responding to Emerging TechnologiesIdentifying and Responding to Emerging Technologies
Identifying and Responding to Emerging Technologies
 
Sci cafe2.0 caps concertation presentation
Sci cafe2.0 caps concertation presentationSci cafe2.0 caps concertation presentation
Sci cafe2.0 caps concertation presentation
 
sm@jgc Session Three
sm@jgc Session Threesm@jgc Session Three
sm@jgc Session Three
 
Abc 2017 Informing professional social media practice through virtual communi...
Abc 2017 Informing professional social media practice through virtual communi...Abc 2017 Informing professional social media practice through virtual communi...
Abc 2017 Informing professional social media practice through virtual communi...
 
Social media for researchers: Increase your research competitiveness using We...
Social media for researchers: Increase your research competitiveness using We...Social media for researchers: Increase your research competitiveness using We...
Social media for researchers: Increase your research competitiveness using We...
 
Re-Building a Tech Community - Post Pandemic!
Re-Building a Tech Community - Post Pandemic!Re-Building a Tech Community - Post Pandemic!
Re-Building a Tech Community - Post Pandemic!
 
Social Media communication for knowledge management in a multi-partner setting.
Social Media communication for knowledge management in a multi-partner setting.Social Media communication for knowledge management in a multi-partner setting.
Social Media communication for knowledge management in a multi-partner setting.
 
AITD Web2.0: Changing the learning landscape Sep07
AITD Web2.0: Changing the learning landscape Sep07AITD Web2.0: Changing the learning landscape Sep07
AITD Web2.0: Changing the learning landscape Sep07
 
Science and Web2.0
Science and Web2.0Science and Web2.0
Science and Web2.0
 
Learning as a Social Process
Learning as a Social ProcessLearning as a Social Process
Learning as a Social Process
 
Deanz2008 Stewarding Technology for Learn
Deanz2008 Stewarding Technology for LearnDeanz2008 Stewarding Technology for Learn
Deanz2008 Stewarding Technology for Learn
 

Plus de visual_think_map

Notebook1 By Chris Watson Visualthinkmap
Notebook1 By Chris Watson   VisualthinkmapNotebook1 By Chris Watson   Visualthinkmap
Notebook1 By Chris Watson Visualthinkmapvisual_think_map
 
Dissertation Proposal - Data Visualisation
Dissertation Proposal - Data VisualisationDissertation Proposal - Data Visualisation
Dissertation Proposal - Data Visualisationvisual_think_map
 
Photo Manipulations By Erik Johansson
Photo Manipulations By Erik JohanssonPhoto Manipulations By Erik Johansson
Photo Manipulations By Erik Johanssonvisual_think_map
 
6.05.How To Be Creative by hugh macloed
6.05.How To Be Creative by hugh macloed6.05.How To Be Creative by hugh macloed
6.05.How To Be Creative by hugh macloedvisual_think_map
 
Incomplete Manifesto for Growth by Bruce Mao
Incomplete Manifesto for Growth by Bruce MaoIncomplete Manifesto for Growth by Bruce Mao
Incomplete Manifesto for Growth by Bruce Maovisual_think_map
 
Visualizing Information for Advocacy: An Introduction to Information Design
Visualizing Information for Advocacy: An Introduction to Information DesignVisualizing Information for Advocacy: An Introduction to Information Design
Visualizing Information for Advocacy: An Introduction to Information Designvisual_think_map
 
The 10 1/2 Commandments of Visual Thinking: The "Lost Chapter" from The Back ...
The 10 1/2 Commandments of Visual Thinking: The "Lost Chapter" from The Back ...The 10 1/2 Commandments of Visual Thinking: The "Lost Chapter" from The Back ...
The 10 1/2 Commandments of Visual Thinking: The "Lost Chapter" from The Back ...visual_think_map
 
European Avant Garde Chronology
European Avant Garde ChronologyEuropean Avant Garde Chronology
European Avant Garde Chronologyvisual_think_map
 
Ways Of Looking - Guides to analysing art works
Ways Of Looking - Guides to analysing art worksWays Of Looking - Guides to analysing art works
Ways Of Looking - Guides to analysing art worksvisual_think_map
 
Making the familiar strange by Helen charman & michaela ross
Making the familiar strange by Helen charman & michaela rossMaking the familiar strange by Helen charman & michaela ross
Making the familiar strange by Helen charman & michaela rossvisual_think_map
 

Plus de visual_think_map (14)

Notebook1 By Chris Watson Visualthinkmap
Notebook1 By Chris Watson   VisualthinkmapNotebook1 By Chris Watson   Visualthinkmap
Notebook1 By Chris Watson Visualthinkmap
 
Octavo 88 5
Octavo 88 5Octavo 88 5
Octavo 88 5
 
Dissertation Proposal - Data Visualisation
Dissertation Proposal - Data VisualisationDissertation Proposal - Data Visualisation
Dissertation Proposal - Data Visualisation
 
Photo Manipulations By Erik Johansson
Photo Manipulations By Erik JohanssonPhoto Manipulations By Erik Johansson
Photo Manipulations By Erik Johansson
 
THE BIGVIZ
THE BIGVIZTHE BIGVIZ
THE BIGVIZ
 
6.05.How To Be Creative by hugh macloed
6.05.How To Be Creative by hugh macloed6.05.How To Be Creative by hugh macloed
6.05.How To Be Creative by hugh macloed
 
Enquiring Minds Guide
Enquiring Minds GuideEnquiring Minds Guide
Enquiring Minds Guide
 
Incomplete Manifesto for Growth by Bruce Mao
Incomplete Manifesto for Growth by Bruce MaoIncomplete Manifesto for Growth by Bruce Mao
Incomplete Manifesto for Growth by Bruce Mao
 
Visualizing Information for Advocacy: An Introduction to Information Design
Visualizing Information for Advocacy: An Introduction to Information DesignVisualizing Information for Advocacy: An Introduction to Information Design
Visualizing Information for Advocacy: An Introduction to Information Design
 
The 10 1/2 Commandments of Visual Thinking: The "Lost Chapter" from The Back ...
The 10 1/2 Commandments of Visual Thinking: The "Lost Chapter" from The Back ...The 10 1/2 Commandments of Visual Thinking: The "Lost Chapter" from The Back ...
The 10 1/2 Commandments of Visual Thinking: The "Lost Chapter" from The Back ...
 
European Avant Garde Chronology
European Avant Garde ChronologyEuropean Avant Garde Chronology
European Avant Garde Chronology
 
Ways Of Looking - Guides to analysing art works
Ways Of Looking - Guides to analysing art worksWays Of Looking - Guides to analysing art works
Ways Of Looking - Guides to analysing art works
 
Golden Mean
Golden MeanGolden Mean
Golden Mean
 
Making the familiar strange by Helen charman & michaela ross
Making the familiar strange by Helen charman & michaela rossMaking the familiar strange by Helen charman & michaela ross
Making the familiar strange by Helen charman & michaela ross
 

Dernier

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Dernier (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Mapping the Dynamics of Information Diffusion in Blogspace

  • 1. blogviz Mapping the dynamics of Information Diffusion in Blogspace by Manuel Lima A thesis document submitted in partial fulfillment of the requirements for the degree of Master of Fine Arts in Design and Technology. Parsons School of Design May 2005 Thesis Instructor: Christopher Kirwan Writing Instructor: Mark Stafford Manuel Lima lima@parsons.edu www.blogviz.com
  • 2. blogviz Mapping the dynamics of Information Diffusion in Blogspace by Manuel Lima Abstract Blogviz is a visualization model for mapping the transmission and internal structure of top links across the blogosphere. It explores the idea of meme propagation by assuming a parallel with the spreading of most cited URLs in daily weblog entries. The main goal of Blogviz is to unravel hidden patterns in the topics diffusion process. What’s the life cycle of a topic? How does it start and how does it evolve through time? Are topics constrained to a specific community of users? Who are the most influential and innovative blogs in any topic? Are there any relationships amongst topic proliferators? Keywords Information Diffusion, Memetics, Weblogs, Online Social Communities, Complex Networks, Information Architecture, Information Visualization, Diffusion of Innovations, Epidemiology, Small Worlds
  • 3. Acknowledgements − Scott Patterson Jared Schiffman David Kearford Fura Johannesdottir Thank you for your feedback − Christopher Kirwan Mark Stafford Thank you for your guidance, openness and continuous motivation − My dearest Parents Thank you for your eternal support and dedication
  • 4.
  • 5. Table of Contents 1 Introduction 1 1.1 Concept 1 1.2 Memetics 3 1.3 Diffusion of Innovations 5 1.4 Epidemiology 10 12 2 Impetus 16 2.1 Subject of Analysis 18 3 Context 18 3.1 Online Social Communities 21 3.2 Weblogs 23 3.3 Blogosphere 24 4 Audience 26 5 Precedents 38 6 Methodology 38 6.1 Summer Research 39 6.2 Visual Explorations 42 6.3 Prototype #1 44 6.4 Prototype #2 47 6.5 Prototype #3 50 6.6 Prototype #4 53 6.7 Final Application 63 7 Technical Sources 63 7.1 Blog Engines 64 7.2 Blogviz Data 68 8 Conclusion 73 9 Bibliography Appendix A Summer Research Presentation Appendix B Complex Networks: Visual Explorations
  • 6.
  • 7. 1 Introduction Blogging presents one of the most interesting social phenomenons of our time. This change in the flow of online information might radically change the way we look at news providers and large media conglomerates. It also provides an extraordinary online laboratory to analyze how trends, ideas and information travel through social communities. 1.1 C0ncept Blogviz is a non-commercial research project developed with the intent of disentangling this highly complex network for further study, research and analysis. The main goal of Blogviz is to improve our understanding of the dynamics of information propagation among weblogs. An underlying question to Blogviz is: “How can we measure meme as a unit of cultural evolution?”. The answer is not easy. Memes, due to their widespread trait and frequent untraceable evolutionary track, become extremely hard to measure accurately. In opposition to this commonly undetectable meme pool, the blogosphere offers a discernible and documented map of thousands of memes, with clear trails of progression, structured by date and time. There are many possible ways of looking at information diffusion in blogspace. It can be based on conversation threads, comment threads, key sentences, themes, tags, or top links. Blogviz analyzes top links, occasionally called topics, which represent the most cited URLs appearing in blog entries in any given day. These popular links represent particular memes that provide an idea of sources, stories and themes that have occupied the attention of bloggers over a certain period of time. By exploring the evolution of these topics through time, Blogviz will not only able to track its popular dispatchers and key innovators, but also, follow its dissemination pattern from the beginning to an eventual tipping point, where it might leap the blog community and reach the mainstream. 1
  • 8. Blogviz embodies a flash driven interactive visualization model with extensive use of information visualization and information architecture. Why is Information Visualization central to Blogviz? Information Visualization can be defined as quot;the use of computer-supported, interactive, visual representations of abstract data to amplify cognitionquot; (Card, Mackinlay & Shneiderman, 1999). Information Visualization does not only makes data easier for human interpretation but it also discovers and highlights relationships in data elements, usually reducing the processes of searching by gathering information in a small rich space. Therefore, Blogviz employs Information Visualization with the key intent of uncovering hidden patterns in the data and deriving plausible conclusions, which promote an advanced knowledge of information dynamics in blogspace. By unraveling the modus operandi behind the blogosphere we might be able to improve our knowledge on the mechanics of online social communities and, to some extent, the mechanics of complex social networks. Blogviz is currently a portrait of blogosphere’s topic activity during the months of January and February 2005. The selection of a time period was purely arbitrary. In order to make this project a reality within the thesis development time limitations, a decision was made in order to constrain the project to a more specific time span. Nevertheless, the model was developed to easily incorporate different timeframes. Blogviz will continue to expand in the future, to the possible point of including real-time data. Blogviz uses existing data from three different blog search engines organized in a database that will soon be available for public access. (see Technical Sources for additional information) 2
  • 9. 1.2 Memetics From a conversation with my Thesis Writing instructor, Mark Stafford, I was able to understand how my thesis had become closely related to the concepts of memetics or meme behavior. We came to the conclusion that I was developing a “topological model of meme activity”, even if until then I was somehow oblivious to it. That title actually remained for a while when characterizing Blogviz. But later on I decided to change it, since the word meme was slightly audience limiting and the expression topological could result in inadequate interpretations. I still question why the notion of Memetics didn’t came up in my research earlier, but what is particularly interesting is that it was there from the beginning, immersed in every iteration of my work. I think I was too much concentrated in the idea of a word-of-mouth behavior, an expression used by Malcolm Gladwell in “The Tipping Point” and by Duncan Watts in “Six Degrees: The Science of a Connected Age”. The vital point is that Memetics is the principle theory when contextualizing Blogviz, and because of that, understanding the theory of Memetics is a crucial measure to comprehend the underlying concept of Blogviz. 1.2.1 What’s a Meme? The term was first coined by Richard Dawkins’s, in 1976, on his notorious book “The Selfish Gene”. In the words of Dawkins the word quot;memequot; refers to quot;a unit of cultural transmission, or a unit of imitationquot;. More specifically, a meme can be defined as a self- propagating unit of cultural evolution, a unit of information, held in an individual's memory or in an outside artifact (e.g. book, record or tool), which is likely to be communicated or copied to another individual's memory or retention system. Examples of memes are ideas, catch-phrases, melodies, technologies, icons, theories, inventions, languages, designs, fashions, and traditions. This covers all forms of beliefs, values and behaviors that are normally taken over from others rather than discovered independently. A meme is basically a pattern of information that induces people to repeat it. People try to “infect” each other with memes they find most appealing, despite of the memes' objective value or truth. 3
  • 10. 1.2.2 What is Memetics? Memetics is the study of evolutionary models of information transmission based on the concept of the meme. In spite of its roots in evolutionary biology and computer simulation, memetics has become more of a social science, focusing primarily on the spread of information within human society. Rather than debate the inherent quot;truthquot; or lack of quot;truthquot; of an idea, memetics is largely concerned with how that idea itself gets replicated. Another definition of Memetics declares it is the theoretical and empirical science that studies the replication, spread and evolution of memes. As portrayed in the Journal of Memetics*: “It’s core idea is that memes differ in their degree of ‘fitness’, i.e. adaptation to the socio-cultural environment in which they propagate. Because of natural selection, fitter memes will be more successful in being communicated, ‘infecting’ a larger number of individuals and/or surviving for a longer time within the population. Memetics tries to understand what characterizes fit memes, and how they affect individuals, organizations, cultures and society at large”. Since the premise of Memetics is to investigate the evolutionary mechanisms that determine the propagation of information within a population of human, animal or artificial agents, we can easily perceive why this science is vital to the understanding of cults, ideologies, or marketing campaigns of all kinds. A meme is acknowledged as a self-propagating unit of cultural evolution, analogous to the gene (the unit of genetics). And because of memes’ similar behavior to life forms, Memetics embraces the analytical techniques of diverse sciences, such as, epidemiology, evolutionary science, immunology, diffusion of innovations, linguistics, and semiotics. * Journal of Memetics (http://jom-emit.cfpm.org) 4
  • 11. 1.3 Diffusion of Innovations I believe any type of Information Diffusion Model (IDM) in Social Networks must derive extensive practical knowledge from the sciences of epidemiology and diffusion of innovations. These two domains help us understand many of the factors that characterize the spreading of information and adoption process in social communities. Epidemiology and Diffusion of innovations also share many similarities and are surprisingly linked together. For these reasons I decided to include in this thesis a short description of these areas, since in addition to the concept of Memetics, they create an extraordinary context to the understanding of Blogviz. I don’t make wide explanations of each domain but rather comparisons between them on how they relate to this thesis’s assertion. In order to delineate a common ground for the following definitions, this paper assumes that an innovation can be characterized as a new meme, given that it is also described as a new idea. In the context of information diffusion in the blogosphere, it assumes the process of adoption to be the process by which a blogger, aware of the existence of a new meme (or innovation), decides to mention it on his/her own personal blog, in the form of a post or part of a post. This action can be understood as an “adoption” by the blogger of this particular unit of information, therefore contributing to its replication. The study of innovation adoption and diffusion has its origins in the Midwestern United States. In an Iowa State University study, Ryan and Gross (1943) showed that the pattern of adoption and diffusion of a maize hybrid was systematic, hence opening the door for further research. Diffusion is the process by which an innovation is communicated through certain channels over time among the members of a social system (Everett M. Rogers, 1995). The innovation includes quot;any thought, behavior, or thing that is new because it is qualitatively different from existing formsquot; (Jones, 1967). The characteristics of an innovation, as perceived by members of a social system, determine its rate of adoption. Just by analyzing these last statements one can easily grasp a series or similarities with the notion of Memetics. Even to the point that the theory of Diffusion of Innovations also considers the unit of adoption not exclusive to an individual person, but extending to other types of retention systems. 5
  • 12. The four main elements in the diffusion of new ideas are: (1) The innovation (2) Communication channels (3) Time (4) The social system (context) 1.3.1 The Innovation These are the characteristics that determine an innovation’s rate of adoption: – Relative advantage – Compatibility – Complexity – Trialability – Observability to those people within the social system. 1.3.2 Communication Channels A communication channel is the means by which messages get from one individual to another. Mass media channels are more effective in creating knowledge of innovations, whereas interpersonal channels are more effective in forming and changing attitudes toward a new idea, and thus in influencing the decision to adopt or reject a new idea. Most individuals evaluate an innovation, not on the basis of scientific research by experts, but through the subjective evaluations of near-peers who have adopted the innovation. (Everett M. Rogers) In a broad sense, the communication channel in the context of Blogviz is indubitably the Internet. Without it there wouldn’t even be any kind of communication between bloggers. However, without blogrolls and posting citations within each blog, the restrict channels among them would be very difficult to perceive. Blogrolls are the backbone of blog communities, the edges that keep all the nodes interconnected, and therefore, are the key factors in understanding how information develops across the blogosphere. In fact, a major characteristic of online social communities is that they are based on communication channels, not on physical co-location. A blogroll is a listing of websites that often appear as links on weblogs, usually on a left or right frame of the page. This list of links is used to relate the site owner's interest or affiliation with other webloggers. 6
  • 13. 1.3.3 Time The Diffusion of Innovations theory divides the element of Time in three main dimensions, in which only two can be fully applied to the context of information diffusion in the blogosphere. > Innovation-decision – The innovation-decision process is the mental course of action in which an individual passes from first knowledge of an innovation to forming an attitude toward the innovation, to a decision to adopt or reject it, and if adopting it, to implement this new idea and confirm the decision. In the case of a blogger deciding to post or not a specific meme in his/her weblog, this decision process is so fast that it’s almost impossible to measure. It applies to other memes, and definitely to other innovations, but it’s not relevant as a measurement in top links replication. > Innovativeness – Innovativeness is the degree to which an individual is fairly faster in adopting new ideas in relation to other members of a social system. Innovativeness, in opposition to the innovation-decision process, is an extremely significant measurement in top links replication, as in most information diffusion models. There are five adopter categories, or member classifications of a social system, based on their level of innovativeness: – Innovators – Early adopters – Early majority – Late majority – Laggards Bell-shaped curve showing categories of individual innovativeness and percentages within each category 7
  • 14. Innovativeness among social systems is characterized by a bell-shaped curved where time and incidence of adoption are the two main vectors. This concept, in the context of Blogviz, is further explored in the Methodology chapter of this thesis. Many search engines and community tools analyzing the blogosphere, assume a direct correlation between blogs popularity and innovativeness. I believe this assumption is incorrect. Their thinking is very simple. If a specific blog has a high number of inbound links and therefore a sizeable readership, it must imply that it’s in the frontline in finding and publishing original information. The HP Information Dynamics Lab study on the “Implicit Structure and the Dynamics of Blogspace” (Eytan Adar et al) showed exactly the opposite. The study demonstrated that popular blogs are rarely among the first ones to start a specific trend. Many popular blogs claim most of their “discoveries” by not citing their original source, which are usually smaller unfamiliar blogs. The level of popularity of each blog might be directly related to its scale of influence, but not necessarily to its level of innovativeness. So who are these unknown bloggers that bring fresh ideas to the blogspace? Who are these innovators or trendsetters? Blogviz will allow an exposure of these anonymous sources, crucial in the dynamics of topics diffusion. > Rate of adoption – The rate of adoption describes how fast an innovation is adopted by members of a social system in a given time period. When mapping the cumulative adoption time path or temporal pattern of a diffusion process, the resulting distribution can generally be described as taking the form of an S-shaped (sigmoid) curve. Time and cumulative adoption (or infected population) are the plot main vectors. 8
  • 15. 1.3.4 The Social System The fourth main element in the diffusion of new ideas is the social system, which basically creates a boundary for the diffusion and adoption of an innovation to occur. A social system is defined as a set of interrelated units that are engaged in joint problem- solving to accomplish a common goal (Everett M. Rogers). The members or units of a social system may be individuals, informal groups, organizations, and/or subsystems. In regards to the replication of top links among weblogs, the social system is undoubtedly the blogosphere, depicted as a fertile network of endless social communities. This vast communication network consists of interconnected individuals (bloggers) who are linked by shared interests and patterned flows of information. At a first glance, considering the highly interconnected web of links, connections and shared interests among bloggers, it might seem easy to understand the adoption process of a particular unit of information or innovation. However, another crucial conclusion exposed by the HP Information Dynamics Lab study, mentioned before, declared that “for URLs appearing on at least 2 blogs, 77% of blogs do not have a direct link to another blog mentioning the URL earlier. For those URL’s present on at least 10 blogs, 70% are not attributable to direct links”. There have been several studies on how the system’s social structure, and norms or established behavior patterns, affect the diffusion of innovations within a particular social system. But another area of research that is closely linked to Blogviz relates to opinion leadership. It can be described as the degree to which an individual is able to influence informally other individuals' attitudes or explicit behavior in a desired way with relative frequency. Blogviz allows a broad understanding of opinion leadership in blogspace by tracking and exposing the most influential and innovative topic proliferators. 9
  • 16. 1.4 Epidemiology Throughout this thesis I use several times the terms contamination and infection when describing the adoption process of memes. Even though this practice might lead to unwanted interpretations, its use is not arbitrary, and it actually facilitates the comprehension of information diffusion dynamics. Epidemiology in its broadest sense is the study of disease patterns in human populations (Wikipedia). Epidemiology can also be described as the study of the determinants, occurrence, and distribution of health and disease in a defined population. Infection is the replication of organisms in host tissue, which may cause disease. A carrier is an individual with no overt disease who harbors infectious organisms. And the notion of dissemination is understood as the spread of the organism in the environment. In the above description, regardless of the different terms, we start noticing several similarities with the domain of diffusion of innovations. This analogy is even more explicit when characterizing the three major elements in disease occurrence, the so-called chain of infection: (1) The etiologic agent (parallel to the innovation) (2) The method of transmission (parallel to the communication channel) (3) The host (parallel to a unit of a social system) Further along in characterizing the disease evolution, the epidemiologic descriptive study organizes data by time, place and person. It is unquestionably the closest approach to the concept of Information Diffusion. It divides the element of Time into four main trends; respectively, secular trends, periodic trends, seasonal trends and epidemics. What’s interesting in this typology of Time is that it applies equally well to the evolution of top links across the blogosphere. Because of that I assume a series of parallelisms between them. The secular trend describes the occurrence of disease over a prolonged period. This continual development is less usual then the seasonal trend in the context of blogspace. This trend usually describes commercial or very popular websites that never lose entirely the bloggers’ interest and as a result have a continuous existence among them. 10
  • 17. The periodic trend basically expresses a temporary modification in the overall secular trend. It conveys a sudden new interest in a specific meme that is part of a continual trend. The seasonal trend reflects seasonal changes in disease occurrence following changes in environmental conditions that enhance the ability of the agent to replicate or be transmitted. This short transitory trend is the most common in blogspace. A new meme that spreads quickly and rapidly loses interest, dying in a short period of time. The epidemic incidence of a disease happens generally when it surpasses a threshold of 7% of the target population. An epidemic is a sudden and boost in occurrence due to prevalent factors that support transmission. An information epidemic in blogspace might originate a tipping point, where a specific meme escalates and leaps the blogspace, reaching the mainstream. 11
  • 18. 2 Impetus The main source of motivation for my thesis development is based on a solid cooperation between Information Diffusion, Information Architecture, Data Visualization, and the Science of Complex Networks. My curiosity in Information Architecture was initially fostered in Christopher Kirwan’s MFADT class in the Spring of 2004, and since then, it became a major subject of interest and awareness. I remember observing for the first time a diagram with four interconnected circles representing the continuous Understanding Spectrum. Data originates information, which leads to knowledge and ultimately to wisdom. This concept influenced my vision and made me reflect on the responsibly I had, as a designer, to contribute to this spectrum. The Understanding Spectrum Nathan Shredoff We may have access to an abundance of information but I strongly believe we lack the ability to process it effectively. In face of contemporary technological accomplishments, our ability to generate and acquire data has by far outpaced our ability to make sense of it. Neither raw data nor scattered information offers any level of meaningful understanding. This is where Information Architecture and Information Visualization undertake an important mission. If we are truly entering a fourth phase in human-kind, a theory defended by a large number of anthropologists and sociologist, then Information 12
  • 19. Architecture is going to be a golden key in the process. In a world increasingly driven by information, it rapidly assumes the form of power, and typifies society in terms of those who own it and those who don’t. Meaningful information is not a given fact, and particularly now, when our cultural artifacts are being measured in gigabytes and terabytes, organizing, sorting and displaying information, in an efficient way, is a crucial measure for intelligence, knowledge and wisdom. In the Spring 2004 semester I was involved in two projects that were decisive in the delineation of my thesis domain of interest and my increased alertness towards Information Architecture and Information Visualization. The first one was a group project developed at the Information Architecture class, taught by Christopher Kirwan. Self- Replicating Cloners was a project aimed at producing visualizations of Virus, their progression through time and world scale dissemination. Two viruses were analyzed by comparison, SARS and MyDoom, each one representing its underlying field, human biology and computer technology. Self-Replicating Cloners Visualizations of Virus (biological/computer generated), their progression through time and worldscale dissemination 13
  • 20. The second point of awareness was a group project developed in a collaboration studio with Siemens Corporate Research Center. Aimed at Siemens Medical, DSS – Disease Surveillance System was a visualization and communication tool that shared symptomatological data between hospitals and health care professionals for detecting possible disease outbreaks and recognizing development patterns nation wide. DSS – Disease Surveillance System After these two particular experiences, I started my summer research with some clear interests in mind, but still scattered through distinct areas such as artificial life, virology, cognitive science, genetics, cyber biology, epidemiology, and pattern recognition. Emergence, by Steven Johnson, was the first book I read in my research and it was a surprising start. The paradigm of Emergence, which can be described as a “higher-level pattern arising out of parallel complex interactions between local agents”, was slowly overflowing my mind with bright new discoveries. And with an augmented motivation, I started gradually abandoning some initial ideas and, in other cases, finding common links between them, under the sciences of complexity and self-organization. The search for answers on how order can emerge from disorder, and organization emerge from chaos, guide me to initiate a study on the individual parameters of emergent systems, such as collective/macro behavior, self-organizing communities and bottom-up hierarchy. This research led me inevitably to complex systems. Delving into this new area was even more thrilling. Finding each day, a common structure in apparent distinct fields, or similarities between natural systems and human designs, was beyond doubt overwhelming. From that point on, I became extremely fascinated with the omnipresent 14
  • 21. web of signals and interactions, nodes and links that shape modern complex networks, from social networks, to corporations, cities, living organisms and the Internet. Complexity is a challenge by itself. Complex Networks are everywhere. It is a structural and organizational principle that reaches almost every field we can think of, from genes to power systems, from food webs to market shares. Paraphrasing Albert Barabasi, one of the leading researchers in this area, “the mistery of life begins with the intricate web of interactions, integrating the millions of molecules within each organism”. Humans, since their birth, experience the effect of networks every day, from large complex systems like transportation routes and communication networks, to less conscious interactions, common in social networks. A Scale-Free network, the most common topology in either natural or human systems, is curiously enough, a very recent breakthrough. Since its discovery, 6 years ago, dozens of researchers worldwide have been disentangling the networks around us at an amazing rate. This awareness is helping us understand not only the world around us but also the most intricate web of interactions that shape the human body. The global effort of constructing a general theory of complexity is tremendous and may lead us, not only to a structural understanding of networks, but to major improvements in stability, robustness and security of most complex systems around the globe. Like Barabasi refers in Linked, “Once we stumble across the right vision of complexity, it will take little to bring it to fruition. When that will happen is one of the mysteries that keeps many of us going”. The feature that has always fascinated me the most in complex networks is the dynamics of Dissemination Patterns. The visualization of a path, and inherent duration, of a certain fad, idea, or virus, in a social/biological or computer network has been, since the beginning, a critical point of awareness. How does a particular contagion travel from point A to B, which nodes it affects in its course, and how fast if contaminates a large cluster or the entire network. 15
  • 22. 2.1 Subject of Analysis After my summer research presentation, in the beginning of the Fall 2004 semester, where I showed all the collected knowledge in the domain of complex networks, I went even further on observing and collecting dozens of network visualization examples and trying several open-source applications. This investigation resulted on my second official presentation. Part of this research also coincided with the work I was developing as a design researcher at Parsons Institute of Information Mapping (PIIM). For additional information on this study please consult section 6.2 of chapter 6 – Methodology. After the second official presentation I was sure of two things: 1 – I wanted to continue my visual explorations exercise, by gathering problems and inconsistencies in complex network diagrams and proposing plausible solutions. 2 – I wanted to map a dissemination pattern in a specific network. By doing that, I intended, not only to be innovative and bring something new to the field, but also display a ‘showcase’ of my visual thinking in terms of complex networks visualization. The first objective was well defined, and best of all, already under development. The major problem was finding a solution for the second point. I had to hit upon a subject that represented all the research and knowledge I had gathered through the summer and the beginning of the Fall 2004 semester. Finding an answer to this quest seemed an impossible task, due to the vagueness of possible directions. At a certain point it was as if I had came back to the start, with the fearful blankness of June assaulting my mind once again. Time was urging and I knew whatever subject I chose, I was still facing an enormous workload ahead of me. The first thing I decided was to go back to my initial interest, the main cause that led me in this escalating exploration of complex networks. I quickly found out my early motivations: virus dissemination and relationships between social/biological and computer/technological systems. One thing I discovered on my summer research is that ideas, fads, trends and innovations show similar dissemination patterns as virus in social networks. The concept of word-of-mouth is a fascinating diffusion behavior that has always intrigued psychologists, sociologists, anthropologists, and lately marketers. To be able to map a word-of-mouth epidemic in a specific social network is a blue-sky scenario. And that might be true, in relation to physical interactions in a physical world between physical 16
  • 23. individuals. However, a flourishing movement on the Internet presents an interesting experimental laboratory to explore this behavior. Blogging embodies an incredible case of word-of-mouth, where news, ideas and fads travel through community clusters with high adoption rates. Because of their inherent nature blogs became my ultimate fixation and the main frameset for my Thesis. Their high interconnectivity and shared flow of information represent not only an obvious case study of meme propagation, but an outstanding example of a dissemination pattern in a increasingly high complex network, estimated to be over 8 million nodes. As an example, I’ll mention a topic that emerged from the blog community in the beginning of October, 2004. On the first presidential debate for the US Elections 2004, on September 30, 2004, between President George W. Bush and Senator John Kerry, there was an episode that got the attention of a particular viewer. “You forgot Poland” was the abrupt statement made by George W. Bush while John Kerry was enumerating the allied forces present at the Iraq War. The presidential debate occurred on a Friday evening, September 30, and on the following Monday night, there was a topic already sharing 12 links among bloggers. This topic pointed to a specific URL – http://www.youforgotpoland.com. By that time, less than 72 hours after the debate, someone had already created a domain (youforgotpoland.com) and was selling online t- shirts and stickers with the same sentence. A new meme had been born and in a short period of time “infected” several people. This intriguing example reveals the accelerating rate of information flow among bloggers and how fast it spreads or “contaminates” online blog communities. Another issue of awareness, demonstrated by this example, is the possibility of tracking a possible outburst. Imagine this topic reaching the mainstream a week later, possibly a major newspaper or a particular TV show. How interesting would it be, to actually go back in time and discover where this outbreak first originated, the way it was adopted and how fast it grew? These last two queries have undoubtedly become a crucial motivation for the development of my thesis. Quoting Duncan Watts, in regard to the mechanics of social networks: “To understand the pattern, we need to delve further into the rules by which individuals make decisions, and how, in the process, our apparently independent choices become inextricably bound together.” 17
  • 24. 3 Context The contextual narrowing of my thesis proposal starts on the broad area of Complex Networks, tights its limits on Social Networks and ends at its ultimate contextual boundary, Online Social Communities. Even though this Thesis proposition places itself on the center of a broad group of domains, I decided to deeply explore its closest and more direct domain – Online Social Communities, and the main subject of analysis – Blogs. Nevertheless, besides the omnipresent field of complex networks, the context of this thesis incorporates the domains of Information Diffusion, Memetics, Information Architecture, Data Visualization, Information Theory, Diffusion of Innovations, Epidemiology and Small Worlds. 3.1 Online Social Communities Online Social Communities, although much more concise than the Science of Complex Networks, is still a wide-ranging field that can include mostly every type of online inter- personal communication medium, from e-mail listings/threads, to Usenet groups, MUDs, chat environments, instant messaging, community forums, weblogs, online gamming, interest groups, among others. Online Communities offer an interesting change on the parameters that until now have defined social interaction. Several years after Milgram’s notorious small-world test, Russell Bernard and Peter Killworth did what they called a “reverse small-world experiment”. They interviewed hundreds of individuals, explaining Milgram’s experiment and asking them what personal criteria would they use to get a specific package to someone they didn’t know. Bernard and Killworth’s study found that most of the subjects used only a couple of dimensions to get their message sent to the next recipient. Most predominant dimensions were geography and occupation. Jon Kleinberg, a computer scientist who attended Cornell and MIT, was also motivated by Milgram’s small-world study, and questioned how did the individuals actually found the paths within the network. Kleinberg concluded that people have generally a strong sense of distance, which they use to distinguish themselves from others. A notion of 18
  • 25. distance can have several factors in which geographical distance is just one of them. Profession, race, religion, income, class, education, are other elements added to the equation, that describe how distant a specific person is from us. From the beginning of human existence, communities were created for the benefits of their own members. Usually by means of expediency, either in relation to the exchange of goods or improved security against enemies, these groups of people occurred as emergent systems by means of social convenience. Geography always played an essential role and without a common shared space most of these communities wouldn’t even exist. With the posterior developments of mail, and more recently, telephone, telex, and fax, human communication became highly enhanced and geography started diminishing its major influence. However, these new “technologies” only improved the way people communicated with each other, by giving them more tools and decreasing the time span and subsequently the distance; other then that, there were no major changes in the way social communities were formed. No matter how fast and easy it became for someone in Europe to talk with someone in America or China, there were never communities created on the basis of telephone calls. If we explore the word syntax structure of most communication tools prior to the Internet, such as telegraph, telex, telegram, and telephone, we encounter the constant presence of the prefix tele-. Tele is a greek word that means “at a distance”, usually implying “to be distant” or “over a distance”. The first use of the prefix tele was in the word telescope which was actually adapted from Galileo’s Italian word telescopi, followed by the word telegraph, meaning “writing at a distance”. Therefore, Telecommunications is the field that embodies all the systems that intent to communicate “at a distant” or “over a distance”. Once again we see the importance of geography as a crucial domain for human communication, where the advancement of technology, since the beginning, has been trying to diminish its constraints, by allowing people to communicate over an ever- present and disturbing distance. I find this analysis particularly interesting in such a way that the Internet, and all features associated with it, has completely abandoned the prefix tele-, drastically assuming the medium, and replaced it with the prefix e-. From e-mail, to e-commerce, and e-business, the prefix e- is usually associated with the latest heat of technological revolution, an abbreviation of the word electronic and an obvious association with the word cyber. 19
  • 26. The advent of the Internet and the World Wide Web changed these secular communal constraints, possibly forever. The Internet became not just a medium for social gathering and communication, but it absorbed it, and the medium became truly the message. The transmission of information on the Internet is regularly measured in milliseconds, and the time it usually takes for a message to leave a computer in Tokyo and arrive at a computer in New York is more or less the same as a message sent to you, from your next-door neighbor. The difference is merely a few milliseconds, which is by itself a measurement difficult to perceive. Geography, as a crucial criterion for the birth of social communities, has been utterly disregarded by online social communities. Without the limitations of geography and physical interaction and identification, online communities had to rely on a more abstract, but equally distinguishing criteria, interests. By analyzing most current online communities, from online players to chat rooms, blogs and newsgroups, we find out that in the absence of physical recognition, social values like trust, confidence, respect and even friendship are ultimately based on a set of shared interests. And of course, this “virtual” interaction would not be possible without specific communication channels, portrayed as technological sub-systems of the larger medium, the Internet. Personal interests are a central element of our social identity, and subsequently, a highly considered factor in relationships. Paraphrasing Duncan Watts in regards to peer-to- peer networks, “social identity is what leads networks to be searchable”. The fabulous aspect of online communities is the possibility of not only searching these clusters of shared interests, but also tracking the exchange of conversations, ideas and messages between them. By analyzing this data, it’s possible to understand, to some extend, how information travels through these virtual environments. Weblogs, in this conjecture, represent units of a remarkable social laboratory. It’s relatively easy to track their connectivity, but also, due to their highly clustering nature, it’s possible to examine in specific communities, how do news and trends travel through individual bloggers. 20
  • 27. 3.2 Weblogs Weblogs (alternate: blogs) are not just a new fad among Internet users and they are much more than a collection of online digital diaries of spread interest groups. Blogs represent a change in online information flow and they are becoming a rising news source for many people. We might not even be aware of how influential blogs will be in the future but one thing is sure, there are currently blogs with close to half a million visitors a day, more than many large newspapers, magazines and news broadcasters. Jorn Barger coined the term in 1997 and in 1999 Peter Merholz coined its alternative abbreviation “blog”. As Jorn Barger stated: quot;Weblogs are often-updated sites that point to articles elsewhere on the web, often with comments, and to on-site articles. A weblog is kind of a continual tour, with a human guide [whom] you get to know. There are many guides to choose from and each develops an audience. There's camaraderie and politics between the people who run weblogs. They point to each other in all kinds of structures, graphs, loops, etc.quot; The most common definition of a blog is that of an online diary of thoughts, links, events, or actions posted on a web page with a dated log format. These posts are often, but not necessarily, in reverse chronological order, and are updated on a daily or very frequent basis with new information about a particular subject or range of subjects. Despite this dry classification, the usefulness of a weblog is incredibly rich. Blogs are the vital elements of the personal publishing revolution. If we go back a few years, before the rise of online publishing, the only way someone could write something for general public would be through a letter to the editor, and hope for its message to be published in the magazine’s next issue. For the first time in the history of human communication, any single person has the opportunity to reach millions with their message, as the cliché proclaims, with “the touch of a button”. Instead of being passive consumers of information, Internet users are becoming active participants. This power to the people is debatably a positive trend, since many people subjectively consider this measure adds to the existent “junk” flowing on the Web. Since most blogs don’t obey to any kind of editorial process or peer review and sometimes “play” with anonymity, their public posts also raise legal concerns about intellectual property, defamation, and alike. 21
  • 28. Controversies apart, blogs, as the World Wide Web, are free democratic resources that embody the concept of free speech, which is unquestionably a right for all. Blogs also exemplify the true concept of diversity. Besides being oblivious to who might use this personal tool, blog content is as varied as the Web itself. The authors of Essential Blogging explain this diversity by pointing out that “creating a taxonomy of the blogiverse is a fruitless task”, since “there’s no good, central directory of blogs that puts each one in its own pigeonhole, because even the most topical blogger will stray from the subject from time to time to celebrate some personal victory or warn his readers off a terrible movie”. One might also argue that in fact, this personal publishing revolution started with the first website, and consequently with the birth of the Internet. This is obviously true, however, until the first blog publishing tools became available, anyone who wanted to circulate their own ideas online, had to be fluent in HTML, web hosting, and aware of most webdesign applications available. Even after GeoCites launch in 1996, offering free web hosting to non-commercial personal pages, web pioneers had to be HTML-savvy people who would spent the evenings working on their websites. Also, these few personal webpages that start populating the Web in the mid 90’s were just a scattered collection of isolated opinions, with no regular updates and unconnected from each other. The big blog phenomenon started escalating in the summer of 1999, when a small web company called Pyra Labs released a product called Blogger. From that point on the blog community exploded and the more bloggers came into scene, more online blog tools became available. This was the beginning of the personal publishing revolution. The inclination towards personalization is reaching every industry, from clothing to cars, from software to medicine. News and Information are just new elements added to the equation. In my opinion, the reasons why many blogs are so successful are due to two major factors: personalization and comforting lassitude. Blogs are usually maintained by a single person who filters the huge amount of available information according to his/her own preferences. For people who share common interests with the blogger, it’s not only exciting to get information from that source, since it’s going to match their inclination to some degree, but it also saves them a lot of time by avoiding the large, more abstract, and sometimes incongruent, news sources. In countries such as the US, where large media sources are becoming increasingly dry and biased, blogs might also represent an oasis of independent information. 22
  • 29. 3.3 Blogosphere Blogosphere (alternate: blogsphere), or blogspace, is the collective term encircling all weblogs (alternate: blogs). It’s almost impossible to determine with precision the existing number of weblogs, or even the ones currently active. Technorati is a leading search engine for the blogosphere, similar to Google or Yahoo, but exclusive to blogs. Technorati, as of February 2005, was tracking 7,245,866 blogs, and this number is far from stagnating. Out of curiosity, when reviewing this paper on April 6, 2005, I checked Technorati to see how the latest number had changed. To my not-so-surprised amazement, Technorati declared to be tracking 8,469,023 weblogs. It translates in an increase of more than 1 million blogs in less then two months. The latest Pew Internet study estimates that about 27%, or about 32 million, of American Internet users are regular blog readers. They say a new weblog is created every 2.2 seconds, which means there are about 38,000 new weblogs a day. Bloggers update their blogs regularly; there are about 500,000 posts daily, or about 5.8 posts per second. When we’re faced with a number of blogs higher than eight million (at least), it becomes hard to consider its whole as a single community. The blogosphere, in analogy to its medium, the Internet, does not represent a single community but a vast collection of endless communities. These communities shape a complex web of more than 8 million nodes and are key factors in the outburst and further development of trends, fads and innovations. Also, due to its inherent diversity, any kind of classification regarding the blogosphere is a mere exercise of oversimplification. 23
  • 30. 4 Audience Scientists/Researchers on Complex Networks Hopefully, Blogviz will offer a significant step in this long scientific journey towards the understanding of the dynamics of complex networks. To all researchers, academics, and scientists that have been persistently and bravely disentangling the networks around us, I truly hope this model can produce one important footprint in this expedition. It doesn’t have to be gigantic, just one step forward. By bringing my visual expertise and interest in Information Architecture, Data Visualization and Interface Design, I expect to make a small corner of the vast Science of Complex Networks more clear and understandable. This corner embodies the domain of Online Social Communities and the phenomenon of blogging. Sociologists Professionals, Researchers, Faculty and Students. Blogviz will offer an interesting case study for analyzing a dynamic, ever-changing and complex online social network – the Blogosphere. To map a word-of-mouth spreading in social communication has been, until now, an almost fruitless task. Blogs in the other hand offer an engaging experimental laboratory to better study and understand this occurrence. Memetics is an expanding field of study in social sciences, which is being explored by a significant number of researchers. Blogviz, by making a parallel between meme propagation and topics diffusion in blogspace, makes an important contribution to the understanding of Memetics. Information Architects and Data Visualization enthusiasts Professionals, Researchers, Faculty and Students. I look forward that my passion and fascination for the field of Information Architecture and Data Visualization can be reflected in my thesis project. I truthfully hope that Blogviz can be a relevant precedent in some of your projects, deserve a mention in your research, inspire or influence you at some level. 24
  • 31. Cultural Critics Blogging presents one of the most intriguing and captivating phenomenons of our time. We might be in for a long ride in the adulteration of most publishing media conglomerates. We cannot really predict the ultimate result of this major drift in the flow of online information, but one thing is sure, it has already started. Blogviz will offer an enhanced insight on the mechanics of this contemporary revolution. Marketers Possibly, the only open door to an eventual commercial viability for the application is based on its relevance for the Marketing industry. Even if Blogviz is a non-commercial research project, it is reassuring to know that it’s potentially useful outside the research and academic realms. Like sociologists, marketers have become more and more interested in the word-of-mouth behavior, even though the more traditional marketing strategists haven’t minimally explored this concept. In the blog community, most bloggers are incorporating the idea of syndication in their blogs, in the form of a data XML file, called RSS, which is basically a list of post summaries and links to them. These files can then be interpreted by a desktop application called a RSS Aggregator, and read by the user without the need to access the specific website. Some consider RSS to be the future of news distribution, and that might well be the case, which explains why, as in any communication medium, advertisement is now starting to infiltrate RSS Feeds. The potential use of Blogviz in this assertion is huge. Marketers interested in investing in the best RSS blog sources for advertisement, could easily track most seen blogs, locate the innovators, the followers, the major dispatchers of information, and then explore the conclusions accordingly. Bloggers Blogviz is a visualization model build to better understand the information dynamics within the blog community. By that order, any interested blogger who feels the need to comprehend the underlying network that he’s part of is a potential user of my research project. 25
  • 32. 5 Precedents The chain of influences and inspiration for my thesis project is, as expected, extremely widespread and goes from new media art, information architecture, data visualization, complex networks, interface design, among so many other fields, and life in general. Even if I started enumerating major key thinkers whose work I admire and respect, and subsequently absorbed for myself, I expect many names would still be unmentioned from the extensive list of people. In enunciating the key precedents for my thesis, I concentrated exclusively in projects developed in the area of Online Social Communities, my closest encircling thesis domain. Since the major goal of my thesis is to visually map a specific diffusion pattern and the connectivity among blog communities, I decided to establish as precedents, projects that make extensive use of a visual structure to portrait their field of research. 5.1 Blog Epidemic Analizer Authors: Eytan Adar, Li Zhang, Lada Adamic, Rajan Lukose Institution: HP Information Dynamics Lab URL: http://www.hpl.hp.com/research/idl/papers/blogs/index.html Description: HP Information Dynamics Lab created the Blog Epidemic Analyzer as part of their research on information propagation. They released their paper “Implicit Structure and the Dynamics of Blogspace” as a result of this research. Eytan Adar, Li Zhang, Lada Adamic, and Rajan Lukose, used the search engine BlogPulse to map the behavior of the blog community from May 11 to May 21, 2003. Relevance: This project is the closest to my thesis ambition and it obtained exciting results that became pertinent in selecting specific parameters for my work. Although highly useful as a research project, their few tryouts in terms of visualization were extremely poor. Their major breakthrough was announcing that the most popular blogs are not the most innovative, by commonly “stealing” news and information from smaller, less-known blog sources. I believe it’s a very significant allegation that decisively influences the way we understand the mechanics of blog communities. 26
  • 33. 5.2 Loom2 Authors: Danah Boyd, Hyun-Yeul Lee, Ethan Perry Institution: Sociable Media Group - MIT Media Lab URL: http://smg.media.mit.edu/projects/loom2/ Author’s Description: “The goal of our research is to use the salient features of social interaction to build a ‘legible’ interactive visual representation of Usenet. We started by exploring the Usenet environment, constructing a series of relevant questions. From the questions, we have started to explore how this information can be derived from the textual data available online. Simultaneously, we have started designing segments of visualization, under the assumption that the desired characteristics were ascertainable.” Relevance: This project is a major aesthetical inspiration. I believe the use they make of a radial structure fits the purpose of the project quite well, where specific degrees relate to a time dimension and nodes’ colors to specific theme categories. Usenet represents a subject of analysis closely related to blogging, since message/post threads in newsgroups have a similar pattern of contamination as topics among the blogosphere. For the construction of their appealing visual models it’s not surprising the amount of work they had to undertake: “To build our designs, we drew on a wide variety of theoretical and practical concepts from a range of fields, including graphic and interactive design, architecture, sociology, and computer animation.” 27
  • 34. 5.3 Social Network Fragments Authors: Danah Boyd, Jeff Potter Institution: Sociable Media Group - MIT Media Lab URL: http://smg.media.mit.edu/projects/SocialNetworkFragments/index.html Description: “Social Network Fragments was developed as a self-awareness tool for individuals to explore the social networks that they create without structural consideration”. Its goal was to “help users examine their structure so as to unveil the structural holes that are built in such complex networks. These structural holes exist when users choose to fragment portions of their network, often revealing facets of their own identity. As an individual interacts with a diverse range of people, they are motivated to reveal different aspects of their identity, thereby creating a multi-faceted social identity, whereby different people know different things about the individual. In engaging in this behavior, individuals start to segment their social network into a variety of different clusters, or types of people.” Relevance: The visualization of social networks undertakes a major leap in many of the projects produced by the Sociable Media Group (SMG) at MIT Media Lab. With some amazing visual displays the SMG “investigates issues concerning society and identity in the networked world”, addressing questions such as “How do we perceive other people on- line? What does a virtual crowd look like? How do social conventions develop in the networked world?”. Social Network Fragments aims at something so extraordinary as mapping someone’s unnoticed social network. Although it may seem simple and intuitive to track any individual connections to others, this project tries to reach further more then the immediate first-degree acquaintances, by reaching a friend-of-a-friend network. 28
  • 35. This approach to small world theory has been pursued by some companies, which sell products focusing on social networking management. The idea is simple: don’t just get to the people you know, get to the people they know. Manage your friend-of-a-friend network in order to find the shortest path for whatever you’re looking. Among the leading companies incorporating this concept are: Spoke Software, Visible Path, SRD and In-Q- Tel. Social Network Fragments offers a reasonable visual solution, where I believe some improvements could be implemented. By emphasizing the visual criteria solely on text, color and depth (simulated 3rd dimension), the interface becomes somehow limited to fully explore its content. 5.4 PostHistory Author: Fernanda Viégas Institution: Sociable Media Group - MIT Media Lab URL: http://web.media.mit.edu/~fviegas/posthistory/ Author’s Description: “Most of us deal with email on an everyday basis and some of us have been doing so for several years. Nevertheless, it is hard to perceive the accumulation of this frantic activity, it is hard to get a sense of the number of messages sent and received, not to mention how difficult it is keeping track of how many people have written to you or received messages from you. The aim is to provide users with a novel and hopefully richer experience of their email activities. PostHistory represents an opportunity for reflection and insightful monitoring of fundamental patterns of interactivity. The visualization aims at impressing on the user a sense of daily accumulation, of growth and scale – dimensions not normally conveyed on current email applications.” 29
  • 36. Relevance: Fernanda Viégas, a brazilian graduate student at MIT Media Lab, is a prolific new media designer that has been involved in many relevant projects. PostHistory is one of her best. What I find most interesting in this project is the series of new structures and features she proposes in order to better understand the pattern created by e-mail activity. This project is visually innovative and it’s a quite an impressive contribute to the field of Information Visualization. Another project conceptually related to PostHistory is Thread Arcs, a fresh interactive visualization technique designed to help people use threads found in email. Thread Arcs, which resulted in a published paper, is a truly interesting visual approach to e-mail threads and even to small sized graphs. This concept is part of a major E-mail Application developed by the Collaborative User Experience team at IBM Research. ReMail is being developed for almost a decade and it aims at improving the knowledge of how people use e-mail, and also, make that experience more functional and straightforward. Some of its features are very encouraging. Thread Arcs ReMail (IBM Research) 30
  • 37. 5.5 Social Circles Author: Marcos Weskamp URL: http://marumushi.com/apps/socialcircles/ Author’s Description: “Social Circles intends to partially reveal the social networks that emerge in mailing lists. The idea was to visualize in near real-time the social hierarchies and the main subjects they address. When subscribing to a mailing you never know who the principals are, how many people are listening or what subjects they are talking about. It's like entering a meeting room with plenty of people in the darkness and then having to learn who is who by just listening to their voices. Social Circles does not pretend to be a statistical application, but rather aims to raise the lights in that room just enough to let you enhance your perception of what’s happening.” Relevance: Marcos Weskamp is a key thinker in digital information design and a major personal influence. Newsmap, Weskamp’s most famous project, and one of the best online examples of data visualization, gathers google news and displays it in an innovative tree structure map in several languages (http://www.marumushi.com/apps/newsmap). In Social Circles, even thought Marcos Weskamp doesn’t push the project far from the most common network visualization schemas, its concept is very strong, particularly in a recent version of it, where the user can map its own inbox of e-mail messages. 31
  • 38. 5.6 WebFan Author: Rebecca Xiong Institution: Sociable Media Group - MIT Media Lab URL: http://www.sbox.tugraz.at/home/k/koebi/WebFan%20Description.htm Author’s Description: “WebFan visualizes user activities at WebBoards, or Web-based message boards, which contain messages posted by users. It uses the reply structure of the messages to lay them out using a fan-like hierarchical structure. This abstract structure allows a large set of Web pages with multiple levels to be represented at the same time for overview and comparison. Users can also interactively explore the fan structure to find out more about individual pages. Dynamic user activity is overlaid on top of this display.” Relevance: “Currently, Web users have little knowledge about the activities of fellow users. They cannot see the flow of on-line crowds or identify centers of on-line activity.” WebFan seeks to enrich this experience by visualizing the activity of other people in the message boards. I believe this is a very relevant project, particularly for the unconventional medium of WebBoards, that Rebecca Xiong chose to map. WebFan relates to my thesis project by visualizing overall patterns of usage and answering questions such as: What are people looking at? What is hot? Where do clusters of similar interests form? 32
  • 39. 5.7 Visual Who Author: Judith S. Donath Institution: Sociable Media Group - MIT Media Lab URL: http://smg.media.mit.edu/people/Judith/VisualWho/VisualWho.html Author’s Description: “The population of a real-world community creates many visual patterns. Some are patterns of activity: the web and flow of rush hour traffic or the swift appearance of umbrellas at the onset of a rain-shower. Others are patterns of affiliation, such as the sea of business suits streaming from a commuter train, or the bright t-shirts and sun- glasses of tourists circling a historic site. Visual Who makes these patterns visible. It creates an interactive visualization of the members’ affiliations and animates their arrivals and departures. The visualization uses a spring model. The user chooses groups (for example, subscribers to a mailing-list) to place on the screen as anchor points. The names of the community members are pulled to each anchor by a spring, the strength of which is determined by the individual’s degree of affiliation with the group represented by the anchor”. Relevance: Visual Who, besides offering a motivating contextual precedent in relation to social networks, portraits a tempting method of mapping social connectivity among a set of individuals. It offers an interesting approach to pattern recognition and visualization, although I think it suffers from the same inconsistencies pointed out in the Social Network Fragments project. 33
  • 40. 5.8 Avatars 2002 Authors: Katy Börner, William Hazlewood, Sy-Miaw Lin Institution: School of Library and Information Science, Indiana University URL: http://ella.slis.indiana.edu/%7Ekaty/gallery/ Description: This project originated a research paper: “Visualizing the Spatial and Temporal Distribution of User Interaction Data Collected in Three-Dimensional Virtual Worlds”. The project is a visualization of the social patterns in the Culture virtual environment, part of the Quest Atlantis universe. The map shows user trails over time. It was produced using a visualization tool developed by Katy Börner and colleagues at the School of Library and Information Science, Indiana University. Relevance: The particular relevance of this project relies on its visual pattern analysis. I think the underlying concept of being able to visually recognize different user trails on a 3D online game is extremely captivating. In a virtual game, many times played with unknown faces, the notions of time and space alter considerably, which makes this project particularly challenging by trying to recreate a defined user trail pattern throughout a physically undefined space. 34
  • 41. 5.9 PeopleGarden Author: Rebecca Xiong Institution: Sociable Media Group - MIT Media Lab URL: http://www.infovis.net/E-zine/num_46.htm Description: PeopleGarden: Creating Data Portraits for Users proposes the “Data Portrait” as a graphical medium for the visualization of information related to individual users of interactive media. The visual metaphor that PeopleGarden uses is of flowers in a garden. Each data portrait is the trace of the user’s activities and takes the shape of a flower. Relevance: “On-line interaction environments such as Web-based message boards, chat rooms, and Usenet newsgroups have become widely popular. As the number of participants rises, it is increasingly difficult to distinguish individual users and to comprehend the overall interaction context.” In PeopleGarden the representation of a vague virtual space reaches its extreme by allowing it to be portrayed as a digital garden. The concept is that flowers represent individuals in a chat room, and the more time a user stays active in a conversation the more its flower can grow and expand. I think this project is conceptually very strong as it presents an innovative visual method for representing a vague unspecified space. 35
  • 42. 5.10 History Flow Authors: Martin Wattenberg, Fernanda Viégas Institution: IBM Watson Research Center URL: http://researchweb.watson.ibm.com/history/index.htm Author’s Description: “The history flow application charts the evolution of a document as it is edited by many people using a very simple visualization technique. History flow provides answers at a glance to questions like, Has a community contributed to the text or has it been mostly written by a single author? How much has a particular contributor influenced the current version of the document? Is the text's evolution marked by spurts of intense revision activity or does it reflect a smooth transition from its beginning to the present? The current version of history flow visualizes the evolution of pages from Wikipedia”. Relevance: HistoryFlow is truly one of the most significant projects in reveling hidden patterns from a set of data, otherwise unnoticed by the user. This feature is undoubtedly one of the key strengths of Information Visualization. Using available data from the Wikipedia website, the authors build an inventive visualization model for analyzing the evolutionary pattern of individual contributions to Wikipedia articles through time. This visualization method has some resemblance to Theme River™, developed by the Pacific Northwest National Laboratory (PNNL), but it’s quite impressive the amount of conclusions history flow was able to facilitate. In a lecture given at Parsons D+T Lab, on February 23, 2005, Martin Wattenberg speaking on this project, mentioned that it takes an average of 2 minutes for any kind of article vandalism to be noticed and repaired. 36
  • 43. 5.11 Listening Post Authors: Mark Hansen, Ben Rubin URL: http://www.earstudio.com/projects/listeningPost.html Author’s Description: “Listening Post is an art installation that culls text fragments in real time from thousands of unrestricted Internet chat rooms, bulletin boards and other public forums. The texts are read (or sung) by a voice synthesizer, and simultaneously displayed across a suspended grid of more than two hundred small electronic screens.” Relevance: Although the toolset and the medium of this project are quite different from the screen- based interactive application intended for my thesis, I believe this project is an amazing precedent and one of the best installations I have ever seen. Exhibited at the List Visual Arts Center, Cambridge, Mass, and the Whitney Museum of American Art, New York, Listening Post has recently been awarded a prize at the Ars Electronica 2004 Festival. Co-author Ben Rubin emphasizes the motivation for the project: “My starting place was simple curiosity: What do 100,000 people chatting on the Internet sound like?”. The significance of Listening Post is remarkable. It displays short messages, randomly picked from chat rooms according to a specific set of keywords, and then, not only it gives life to them by placing the messages in a specific spatial configuration, a “suspended grid of more than two hundred small electronic screens”, but also gives them a sound dimension, which makes the experience truly memorable. This large display of small screens resembles a “window” overseeing the activity in cyberspace. 37
  • 44. 6 Methodology 6.1 Summer Research My first presentation in the beginning of the Fall 2004 semester enclosed some of the widespread research done through summer. It was entitled “Discovering Complex Networks”. My approach to this first assignment was to face the presentation as a lecture, by educating my audience about the engaging science of complex networks and narrating all the discoveries and knowledge gathered in this initial phase. The presentation contained explanations and diagrams about the specific properties of scale-free networks and took a holistic view by showing diverse examples of complex networks in different domains, as diverse as Gene Networks and Airline Routes. All the images shown at this presentation can be seen in Appendix A – Summer Research Presentation, at the end of this paper. In order to better understand the successive steps that led me to the study of complex networks one should consult the Impetus chapter on this Thesis. There I describe in detail the evolution of my research inclination and motivation course. I ended my Summer Research Presentation with a slide where I stated that my main interest was to “Visually map a dissemination/propagation pattern in a scale-free network”. I also made a short list of additional enquiries, where one could read: > How does an idea, innovation, fad, trend, disease or virus travel from A to B in a specific scale-free network? > How long does it takes? > How many nodes are affected? > How do the hubs react? 38
  • 45. I finally concluded the presentation by stating what were my future goals. “To choose an area and subject to analyze, where I can bring something new to the field and contribute to its development.” 6.2 Visual Explorations After an extensive research on Complex Networks I started to delve into different ways of visualizing them. The main premise was that complex networks are difficult to visualize, but we don't need to make them more complex in the process of trying. On September 27, 2004, I wrote the following in my thesis diary blog: “My thesis assertion has always been the visualization of dissemination patterns in a particular scale-free network. (…) However, I quickly found out that this premise is based on the assumption that the target network displays a visual structure suitable for analysis. Naturally, most of the time, this assumption is incorrect. Since a visual representation of a dissemination pattern cannot exist without a functional visual representation of the underlying network, I decided to dedicate my time, for now, to the visualization of complex networks. I've been delving into a set of visual explorations, collecting problems and proposing solutions.” quot;Functional visualizations are more than innovative statistical analyses and computational algorithms. They must make sense to the user and require a visual language system that uses colour, shape, line, hierarchy and composition to communicate clearly and appropriately, much like the alphabetic and character-based languages used worldwide between humans.quot; Matt Woolman Digital Information Graphics 39
  • 46. As acknowledged in another blog entry, also on September 2004: “I've tried several open-source network visualization tools and seen hundreds of visualization examples. I think I found a critical problem. In most tools I've seen, the user starts building its network from an initial node. The user places the first node in the center of the drawing board and then, node after node, link after link, the network starts expanding. Since there's no preceding method of organizing the nodes and links in the designated area, new nodes start naturally occupying any free space available. Unsurprisingly, after a certain threshold, the lattice of lines and nodes becomes unbearable. This problem happens so many times.” The difference between this method and Mark Lombardi's drawings, for example, is a question of organization. Instead of a bottom-up hierarchy described before, Lombardi used to plan his overall design with a holistic view of the entire network, knowing beforehand the amount of space he had and the exact number of nodes and links he needed to draw. Because of this, the cleanness of his drawings, where rarely there's an edge overlapping, is an excellent example of network visualization. What I cannot understand is why Lombardi's method, and alike, aren't taken into consideration whenever someone decides to build a visual representation of a network. A macro approach to the problem is definitely more appropriate. A top-down hierarchy instead of bottom-up. And to say Lombardi's networks where not complex enough is a mere exercise of oversimplifying his work. The beautiful and eloquent global networks of Mark Lombardi 40
  • 47. Besides the mentioned problem, I encountered two others in my research, which contribute drastically to the huge amount of bad visualization examples of complex networks. First, most visual applications are based in constructive algorithms that obey one rule: display the inputted data. Rarely the notion of how the data is displayed is considered. By that reason, often-stunning visual forms demonstrate a low level of clarity and function. Second, usually programmers who built open-source applications and scientists/researchers who use them, have no visual sensibility or graph drawing knowledge. Many researches produce a visual model of the analyzed network as a mere additional element for showing their research. Sometimes it adds nothing to it. On my second thesis presentation in the Fall 2004 semester, I applied many of my reflections and sketches to practical examples, proposing possible solutions to improve the visualization of complex networks. I divided my solutions into five major steps: The main slides of this presentation can be seen in Appendix B – Complex Networks: Visual Explorations, at the end of this paper. 41
  • 48. 6.3 Prototype #1 This was my first visual prototype shown at the Fall 2004 mid-term review. This review also marked the birth of the thesis title: Blogviz. The mid-term presentation was entitled Blogviz: An experimental social laboratory. The underlying concept was based on a major aspiration: nodes local stability and links global connectivity. The goal was to map the connectivity among blogs. What I tried was to position the nodes in a structured way, so they would remain fixed, and to some level, under control. The links, however, would be in constant change and the outcome would be highly random and unpredictable. The reason why I chose to sort all the nodes in a precise manner was to be able to isolate the major hubs and have some control over the lattice resulting from the links agglomeration. Looking at it now, it seems the result was too rigid and strict. The radial diagram with its implosive structure reinforces the structure rigidness by resembling a closed system that probably doesn’t describe so well the blogs fundamental openness. Blogviz Visual Studies – Prototype #1 I realized I had to take a different path. I was trying too hard to control the outcome and I believe the result showed exactly that. I had to loose some of my constant need for control and let the system be more auto-sufficient, self-organizing and adaptive. As stated in my Thesis blog in October 24, 2004: “Another criticism I received during the presentation was that I was being to concerned with the visual aspect of it, and that I was thinking too much as a visual designer. Well, although I agree in part with the critic, 42
  • 49. my thesis assertion has always been the visualization of a specific dissemination pattern, and from my extensive research in complex networks, I truly believe that the only way I can positively contribute to this field is by employing my visual and interface design knowledge. In my first prototype presentation I dissected several problems on the visualization of complex networks and proposed distinct solutions that might solve some of its inconsistencies. I believe there has to be a balance between highly complex network visualizations that offer a poor functionality and highly aesthetic/innovative visual representations that might suffer from the same dilemma. I just have to pursue that balance.” On this same presentation I also illustrated some of my initial studies regarding the linkage among blogs. Connectivity in the blogsphere is a very binary process; we only need to make two questions. Is blog A connected to blog B? If so, who is linking whom? If none of them is linking to the other, they become momentarily isolated islands. For that presentation I showed a few visual studies where I mainly explored the concept of directional linkage, by visualizing inbound or outbound links, or putting it simple, who is linking whom. The images below portrait some of these explorations. 43
  • 50. 6.4 Prototype #2 While on my first prototype I was trying to deal with a structured way to map connectivity among blogs, by isolating the hubs and sort the nodes according to popularity, on my second prototype, I basically explored possible ways of visualizing diffusion patterns over time. I tried several models based on a radial structure where time became the major imposing element. In most of these experiences I faced a common problem in representing a continuous flow of infected blogs. The underlying radial structure seemed to impose its rigidness by enforcing fractures in the pattern, particularly whenever there was a day transition. Blogviz Visual Studies – Prototype #2 44
  • 51. Blogviz Visual Studies – Prototype #2 Blogviz Visual Studies – Prototype #2 45
  • 52. I quickly found out I had to make a change in my visualization thinking, since a radial structure didn’t quite apply to my subject of analysis. Perhaps I was too much influenced or distracted with the Radial Form of Organization Chart from the Alexander Hamilton Institute or Loom 2, by Danah Boyd (et al). Radial Form of Organization Chart (1924) Loom2 - Danah Boyd, Hyun-Yeul lee, Ethan Perry Alexander Hamilton Institute Sociable Media Group - MIT Media Lab As I wrote in my thesis blog on November 16, 2004: “At the moment I’m becoming convinced that a horizontal array is truly the best way of representing the quantitative and temporal qualities of a pattern. Time is a crucial domain in a dissemination pattern, particularly in a word-of-mouth social behavior. The amazing potentialities of a horizontal assortment is the uninterrupted continuous flow of data and the possibility of collapsing time frames and still maintain a sense of scale and understanding of the pattern dynamics.” Blogviz Visual Studies – Horizontal array of adopting units 46
  • 53. Blogviz Visual Studies Different tryouts where adopting units (blogs) are structured in a vertical and horizontal array After this critical change in my visualization studies I started doing a lot of sketching and writing. I built a few diagrams to get a full understanding of my system; built several taxonomies and dissected the mechanics of blogging. This examination helped me putting my ideas straight and getting a sense of what I was dealing with. 6.5 Prototype #3 On my third prototype I introduced Blogviz as a “topological model of meme behavior”. From the conclusions of my previous tryouts, I decided to deeply explore the notion of a horizontal array of adopting units (weblogs) to portrait the propagation pattern of a specific topic. By doing that I would be constraining the Time element to the X axis. The following images represent a series of tryouts in this context. 47
  • 54. 48
  • 55. On this phase of the project I also introduced the first visual taxonomy of blogviz, by dissecting the system and its intrinsic elements. The following image portraits a critical understanding of the inherent structure of blogviz at that stage. At the same time, a list of goals was created (left image) in order to better understand the intent of Blogviz. 49
  • 56. 6.6 Prototype #4 From a series of independent and spread visual studies that characterized the initial trials, this fourth prototype was the first solid tryout for acknowledging Blogviz as an interactive visualization model. At the time I was pushing the concept of application or tool of analysis, which according to some critics was implying a need for commercial viability. Even though I’m convinced this thesis has several elements that could be successfully applied in commercial applications, my goal with this project is to elevate the understanding of Memetics in a specific social network and conduct a serious research experiment, which I believe fits more adequately within the academic realm. Another point worth of consideration is that, when developing this prototype, Blogviz was intended to work with real-time data, in the form of hourly updated XML RSS feeds. This idea changed afterwards, however, it was a crucial deliberation in the development of this prototype. Prototype #4 – Default First Page 50
  • 57. A quick explanation on the previous image’s visual schema is that circles represent topics; the diameter corresponds to the total number of adopting blogs; and the colors, pink and green, denote respectively, a decreasing or increasing course. Time is again incorporated in the X-axis, where the closer a circle is from the right edge of the window, the more recent was its last dispatch. The Y-axis position of each circle helps reinforce its level of adoption. The main interaction on this fourth prototype was based on a simple flow. The default first page would allow a swift view on the general pattern by showing the overall condition of current topics popularity. If one decided to investigate more deeply the structure and evolution of a particular topic, it would be taken to a sequence of examination methods. The following images illustrate some of the techniques proposed. Prototype #4 – Blogs’ evolutionary paths through time Prototype #4 – Plotting blogs according to time/popularity 51
  • 58. Prototype #4 – Detailed View Prototype #4 – Detailed View Prototype #4 – Blogs’ adoption represented by a Tree Map Prototype #4 – Blogs’ analysis by Theme and Generator Prototype #4 – Blogs’ relationship analysis 52
  • 59. 6.7 Final Application A major drift in the development of Blogviz was the decision of not incorporating real- time data for the backend of the application. As previously stated, on my fourth prototype I was mostly concentrated on developing a visualization schema that would expose current trends in the topics diffusion process, by reading data from hourly updated XML feeds. It would basically display the most adopted topics spreading in the blogosphere in any given time. Even if the application allowed an extended breakdown of each topic other then just a quick view at the present information tendencies, it was just considering a restrict number of topics. I believe Blogviz’s concept, at that phase, was trying to incorporate to many features, or levels of analysis, without being able to develop one efficiently. It was also becoming a trend analysis tool rather then a comprehensive model of topics distribution. I wanted Blogviz to become a serious visualization study on information diffusion in blogspace, and not so much a marketing application. I still believe there’s enormous potential on visualizing popular topics with real-time data integration, and that might be something Blogviz will incorporate in the future. However, I first wanted to better understand the topics’ inner structure and evolution through time. This change in Blogviz progress also coincided with a parallel immersion in the domains of Epidemiology and Diffusion of Innovations Theory. I never imagined that an apparent minor adjustment would require such a drastic turnaround in the project’s conceptualization. Until now, Blogviz had been dealing with a very restrict and manageable time span. Real time data visualization was merely constrained to one day, or at the most, one week. In opposition, by aiming at an adaptive model, the critical goal was to come up with a visualization method that could easily include time variations and still be consistent. Another crucial problem was to visualize, in a very tight space, a high number of topics. I had to come up with a visualization model that would answer these last two problems accordingly. First, it had to be flexible enough to embrace distinct time spans, but at the same time maintain uniformity throughout the process. Second, it had to be able to include a high number of topics, and also, allow an immediate understanding of the overall pattern and the individual life cycle of each topic. 53
  • 60. On the process of looking for inspiration in diverse sources, I came up with an elucidating diagram by E. J. Marey, on Edward Tufte’s The Visual Display of Quantitative Information, that resolved particularly well many of the challenges I was facing. Original Image: E. J. Marey, La Méthode Graphique (Paris, 1885), p.20. Source: Tufte, Edward R., The Visual Display of Quantitative Information The preceding image illustrates Marey’s graphical train schedule for Paris and Lyon in the 1880’s. The X-axis incorporates Time, measured in hours, and maintains the same scale in both the top edge (corresponding to departures and arrivals from Paris) and the bottom edge (for departures and arrivals from Lyon). The remaining horizontal lines represent other train stations between Paris and Lyon. The diagonal lines represent different trains, leaving and arriving from the two main stations, and the horizontal line- breaks represent waiting time in secondary stations. This chart influenced me greatly in the following steps of my project. I believe it is an extraordinary example of information visualization, where time and pattern become one intrinsic entity, allowing a substantial understanding of the data dynamics in one brief look. I applied a modified version of this concept to Blogviz, where the lines became representative of topics, and the time scale was measured in days. Blogviz’s model doesn’t incorporate any type of constraint on the Y-axis, as Marey’s graph does, therefore the overall height of the main window is rather arbitrary. The following image represents the main visualization window for topics’ evolution within the Blogviz environment. 54
  • 61. Blogviz’s topics visualization – Topic Lines and Time Scale The interesting characteristic of this model is that, as in the Paris/Lyon train schedule example, the angle of each line has a specific meaning. This happens because both top and bottom edges of the window maintain the same time scale. Therefore, the wider the angle, the shortest is the duration, in this case, the topic’s duration. On the image above for example, one may see a line, close to the center of the window, which seems to be almost vertical; what it means is that the life cycle of that particular topic was very short. This feature is even more relevant for topic lines that have either the starting or ending point outside the present timeframe. I conducted a small experiment within the same model, where the lines, instead of their diagonal placement, were drawn horizontally. This method was probably even more successful when the lines had the starting and ending point inside the selected time span. However, when topic lines had a first day or last day of spreading outside this frame, it would be unpredictable to calculate the amount of days beyond it. What the diagonal alignment facilitates is a full understanding of the topic’s life cycle, even when it spreads outside the present time span. To better understand the intricacies of this visualization model, the following images illustrate the four possible life cycles for every topic line, within each timeframe, and the way they are represented. 55
  • 62. Topic with first and last day of spreading within the current time span Topic with first day of spreading outside the current time span Topic with last day of spreading outside the current time span 56
  • 63. Topic with first and last day of spreading outside the current time span The prediction line angle for outsider dates is made through an equation that multiplies the number of days (topic duration) by the number of pixels of each day parcel. So if a specific topic line has the starting point (first day of spreading) within the present timeframe, the last day outside of it, and its total days are 64; the system multiplies 64 by 12 (number of pixels of a day parcel) from the starting point, and as a result, a line is drawn dynamically to the resulting end point. Another feature of this visualization method, further explained in the following Blogviz Interface section, refers to the brightness or color saturation of each line. In Blogviz, the default setting for the lines’ brightness is a depiction of the total number of adopting blogs. This allows for a comprehensible insight when evaluating the overall pattern. On a brief look, one is able to identify the life cycle of each topic, and also, the number of blogs that adopted it. I like to consider the visual representation of this model as a metaphor of a window, overlooking cyberspace, where lines of information flow continuously cross it. 57