[CPCC] SEMINAR: Uniform Sampling of Facebook 3/8 10 AM

Ender Ayanoglu ayanoglu at uci.edu
Mon Mar 1 12:17:23 PST 2010


                             CPCC SEMINAR

                     Uniform Sampling of Facebook

                                   by

                              Minas Gjoka

                        March 8, 2010, Monday
                                 10 AM
                        Engineering Gateway 3161


                                  ABSTRACT

With more than 250 million active users, Facebook is currently one of
the most important online social networks. Our goal is to obtain a
representative (unbiased) sample of Facebook users by crawling its
social graph. In this quest, we consider and implement several
candidate techniques. Two approaches that are found to perform well
are the Metropolis-Hasting random walk (MHRW) and a re-weighted random
walk (RWRW). Both have pros and cons, which we demonstrate through a
comparison to each other as well as to the "ground-truth", obtained
through true uniform sampling of Facebook userIDs. In contrast, the
traditional Breadth-First-Search (BFS) and Random Walk (RW) perform
quite poorly, producing substantially biased results. In addition to
offline performance assessment, we introduce online formal convergence
diagnostics to assess sample quality during the data collection
process.  We show how these can be used to effectively determine when
a random walk sample is of adequate size and quality for subsequent
use. Using these methods, we collect the first, to the best of our
knowledge, unbiased sample of Facebook. Finally, we use one of our
representative datasets, collected through MHRW, to characterize
several key properties of Facebook.


                          SPEAKER'S BIOGRAPHY

Minas Gjoka received the B.S. degree in Computer Science from the Athens
University of Economics and Business, Greece, in 2005 and the M.S.
degree in Networked Systems from the University of California, Irvine,
in 2008. He is currently a Ph.D. student in the EECS Department at the
University of California, Irvine. His research interests include online
social networks, peer-to-peer systems, network measurements, network
protocols, and internet modeling.


More information about the CPCC mailing list