Netgroup, October 7, 2010 Networks as a Motivating Domain for Computer Science Education Jeff Forbes...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Netgroup, October 7, 2010 Networks as a Motivating Domain for Computer Science Education Jeff Forbes...
Netgroup , October 7, 2010
Networks as a Motivating Domain for Computer Science Education
Jeff Forbeshttp://harambeenet.org/http://www.cs.duke.edu/forbes
Netgroup , October 7, 2010
Outline
Motivation State of Computer Science Education Looking inward & outward to improve
Project Overview Modules for Networks
Questions asked Tools developed
Goals for this workshop
Netgroup , October 7, 2010
Acknowledgements Duke CS Education
Group: Owen Astrachan Susan Rodger Robert Duvall
HarambeeNet Researchers:
Ben Spain Dametrious Peyton Beth Trushkowsky Zach Marshall Samantha Jones Diana Ni Dave Stecher Martin Azizyan Jonathan Mathew Chris Carlon Jian-Wei Gan Tiphany Jackson Andrea Scripa
Drawn from the work of: Eytan Adar, UW Lada Adamic,
Michigan John Breese, David
Heckerman, Microsoft Research
Marti Hearst, UC Berkeley
Michael Kearns, Upenn
Jon Kleinberg, Cornell Funders
NSF Duke CIT
Netgroup , October 7, 2010
What motivated our work? What should our concerns be for those
choosing to major in Computer Science? Take CS courses? courses, research, jobs, …
Should we be concerned by the precipitous decline in those taking our courses? majors, technical students, non-technical …
What can we do to ensure the ongoing success of our academic discipline? Look inward, look to others
Netgroup , October 7, 2010
Genesis of Our Project
Broadening Participation Increase number of students in a course Increase number of majors Extend approach across levels and institutions
Non-traditional computer science examples Languages aren’t enough
Options besides programming Leverage mathematics and sciences
Netgroup , October 7, 2010
Goals for Project Provide rich and profound area of
applications Examples for other disciplines Convey part of what computer science is (and
…)
Develop resources for our students that reflect what they’re interested in, but are relevant to our discipline
Enrich our own studies while doing the same for others
Netgroup , October 7, 2010
Why Networks & Social Media?
Logs Huge amount of data available about human
interaction e.g., Netflix, Audioscrobbler, Facebook, Twitter,
etc.
Leverage Interest and use of web-based social networks
• 700 billion minutes per month spent on Facebook!
Can make a difference for substantial societal problems
Laws Privacy & policy concerns are real and nuanced Data by the people for the people
Netgroup , October 7, 2010
Project goals
Build community around this approach
Develop, evaluate, and disseminate curricular modules
Netgroup , October 7, 2010
The Advisory Board.
» Eytan Adar is an Assistant Professor in the School of Information at the University of Michigan
» Noshir Contractor is Jane S. & William J. White Professor of Behavioral Sciences Professor of Industrial Engineering & Management Science, McCormick School of Engineering; Professor of Communication Studies, School of Communication; and Professor of Management & Organizations, Kellogg School of Management at Northwestern University.
» Jennifer Golbeck is an Assistant Professor in the College of Information Studies and was formerly the Research Director for the Joint Institute for Knowledge Discovery (JIKD) at the University of Maryland.
Netgroup , October 7, 2010
Advisory Board Continued
» Balachander Krishnamurthy is a researcher at AT&T Labs. His main focus of research of late is in the areas of unwanted traffic, Internet measurements, and Internet protocols.
» Deepak Kumar is a Professor of Computer Science at Bryn Mawr College working in Artificial Intelligence, Cognitive Science, Evolutionary Computation and other areas.
» Ellen Spertus is an Associate Professor of Computer Science at Mills College and a part-time software engineer at Google.
» Fred Stutzman is a Ph.D. student at the School of Information and Library Science at UNC-Chapel Hill and Co-Founder of claimID.com.
Netgroup , October 7, 2010
Faculty Learning Community
Build interdisciplinary, cross-institutional community centered around teaching
Discuss exemplars in network science education and applications What are great ideas in network science? What problems best encapsulate these great
ideas?
Contribute to development or evaluation of modules
Netgroup , October 7, 2010
The Faculty Learning Community 2007-2008Working to Create a Bank of Viable Science of Networks Modules
» Owen Astrachan, Facilitator, Professor of the Practice of Computer Science
» David Banks, Professor of the Practice of Statistics
» Jonathon Cummings, Associate Professor of Management, Fuqua School of Business
» Jeff Forbes, Associate Professor of the Practice of Computer Science
Netgroup , October 7, 2010
FLC Continued
» James Moody, Associate Professor of Sociology
» Susan Rodger, Professor of the Practice of Computer Science
» Joshua Socolar, Associate Professor of Physics
Netgroup , October 7, 2010
The Faculty Learning Community 2009-2010Expanding the Community
• UNC– Ketan Mayer-Patel, Associate Professor of
Computer Science
– Gary Marchionini, Professor of Information and Library Science
• NCCU– Cameron Seay, Computer and Information
Systems
Netgroup , October 7, 2010
The Faculty Learning Community 2009-2010Expanding the Community
• NCSU– Steve McDonald, Assistant Professor of Sociology
• NC A&T SU– Ed Carr, Assistant Professor of Computer Science
Netgroup , October 7, 2010
Science of Networks Courses
Networked Life (UPenn CS) Networks (Cornell CS/Econ/Soc/InfoSci) Introduction to Networks (Michigan School of Information) Social Networks 101 (Northwestern) Google: The Computer Science Within and its Impact on Society (Duke) Seminar on Social Networks (Duke) Information Technology (UMaryland) Online Social Networks (UNC) The Structure of Information Networks (Cornell) Networks and Complexity in Social Systems (Columbia) Social Network Analysis (UToronto) Networks and Complexity (UCalifornia, Irvine) Algorithms, Game Theory and the Internet (Berkeley) Graphs and Networks in Systems Biology (Penn State) Network Theory (UMich) Scaling in Networks (Columbia) Structural Data Mining (UIndiana) Networks (UPatras, Greece) Information Retrieval (UMich) Complex Human Networks Reading Group (MIT) Recommender Systems (Virginia Tech) Social Network Analysis (UEssex) Create Engaging Web Applications Using Metrics and Learning on Facebook (Stanford) Computer Networks (UMich) Information Retrieval, Discovery and Delivery (Princeton) Scaling, Power Laws and Small World Phenomena in Networks (Umass) Information Retrieval (Northeastern)
Arrange courses and evaluate their merits to helping build modules in three areas: Genre Level Theme
Netgroup , October 7, 2010
What can we do with real data?
What is the center of a graph? From rumor mills to terrorists How do we detect important agents?
What are the scale issues? What algorithms are feasible for large graphs? Computing’s contribution?
Visualizing data
Netgroup , October 7, 2010
Questions
Structure: Who is the most central agent in a network?
Structure: What are the factors that lead people to trust each other?
Algorithms & Visualization: How can we analyze large networks?
How to share/store information efficiently among local groups?
Dynamics: How do networks grow and evolve?
Information networks: What does the music interest network look like?
Netgroup , October 7, 2010
Themes
1. Can network influence behavior?2. Which characteristics of networks
matter or are desirable (e.g., strong/weak ties, centrality, etc.)?
3. Scale!4. Boundary specification. How do you
define who is in a network?5. Dynamic vs. Static processesContext:6. Gather data and then ask questions7. Simulate processes on networks8. Actual experiments on the classroom
network
Netgroup , October 7, 2010
Network questions
1. Mapping university social network (design experiment based on DARPA Network Challenge)
2. How do you use network to determine identity?
3. Using data (Wikipedia article links, information traversal)
4. Local vs. global emergent phenomena
Netgroup , October 7, 2010
Data yields a number of questions Is popular culture really making us smarter? How do we find a graph’s diameter?
• Maximal shortest path between any pair of vertices
What is the center of a graph?
Visualization and analysis of networks
GUESS developed by
Eytan Adar Gython
interpreter adapted for
Duke GUESS
Netgroup , October 7, 2010
iPods and social networks
Audioscrobbler Collaborative
filtering What is a
neighbor? What is the
network?
Netgroup , October 7, 2010
Recommending papers
Can we effectively facilitate collaboration within a research community with a citation database?
Netgroup , October 7, 2010
FaceTrust
How do we assess the credibility of identity statements made by online users?
Netgroup , October 7, 2010
Modules
Independent unit in a course 1 to 3 weeks in a course
Centered around problems and questions not concepts
Content Technical background Social and philosophical implications Data sources and tools
Developed and tested across disciplines Ultimately will be published as Open
Educational Resource
Netgroup , October 7, 2010
Common Threads
Position How does your position within a network
advantage or disadvantage you? Centrality (closeness, betweenness, degree,
etc.) Scale
Computing properties for large networks Dealing with incomplete or inaccurate
information Hard to visualize Longitudinal studies: adding time as a
dimension can complicate things
Netgroup , October 7, 2010
Building a module
What question are you answering?
What will students do? What will students need to solve the problem? Data sources & tools
What concepts from networks will they encounter in solving the problem?
What are the goals ?
Netgroup , October 7, 2010
First Module
Can we discover research communities given online faculty CVs? Based on co-authorship data for Duke
professors in the sciences & engineering, can we detect communities defined by departmental boundaries?
Do some professors play special roles in establishing the community structure?
Are there any identifiable communities that are interdepartmental in nature?
Netgroup , October 7, 2010
First Module
Engages sociological, statistical, computational, and pedagogical concepts and questions Centrality Modularity Social capital Dealing with incomplete or inaccurate data Information integration Algorithm efficiency
Netgroup , October 7, 2010
Detecting Communities
Scrape pages from Faculty Database System, standardize entries, and upload entries into bibliographic database
Generate map of authors to coauthors Create graph where two authors are connected if
they share a coauthor From co-authorship graph, use community
structure algorithm [Clauset,Newman,Moore] to discover community structure
Co-authorship graph
CoBib
Netgroup , October 7, 2010
Latest Modules
Hollywood Hookup How can we measure the “romantic
extroversion” of an individual? How do we assess the quality of the data?
Sex Differences in Social Connectedness Use the General Social Survey and a student-
generated survey to assess properties of the student’s networks
Fakebook! How can you cluster users based on their type
of profile?• Masqueraders, sharers, etc.
What would behavior on a fake social network reveal about each individual?
Netgroup , October 7, 2010
Spam Detection in Twitter
How can you identify spammers on Twitter? Use network structure
• What is the Twitter network structure? How would you discover it?
• What communities exist? Are there clusters of spammers and non-spammers?
– Bipartite graph?
Use tweet content• What are the informative features?• How do you determine similarity between
tweets?• What are the patterns of behavior of spammers
– Distribution of tweet frequency, length
Computing skills & tools : How do you work with Twitter API to download necessary information?
Netgroup , October 7, 2010
Musical similarity
How do we find one’s musical neighbors? Given two playlists, return a value indicating
their similarity What does the network structure tell you
about a community’s tastes? Whose tastes are most central?
• What measure of centrality makes the most sense here?
What is the network centralization? How do you visualize this network? Computing skills & tools: How do you
parse iTunes files?
Netgroup , October 7, 2010
Transmission models
How are pathogens transmitted from one actor to another? Pathogen may be an idea
• How do individuals influence each other’s opinions, ideology, and actions?
How do you model the network to effectively answer your question? In a STD network, friendship networks might
not matter, but they may be important in studying the social influence on depression and anxiety
Computing skills & tools: How do turn web pages into your dataset?
Netgroup , October 7, 2010
Hollywood Hookup
How do the relationships of Hollywood actors and actresses differ from that of the average student? How can we measure the “romantic
extroversion” of an individual? How do we assess the quality of the
data? Survey How can statistical modeling iteratively
construct formulae that provide useful and meaningful approximations to the observed networks, enabling insight into the processes that produced those networks
Netgroup , October 7, 2010
Fakebook
Facebook is great, but not suitable for courses due to privacy and legal concerns
Enter Fakebook How can you cluster users based on their type
of profile?• Masqueraders, sharers, etc.
What would behavior on a fake social network reveal about each individual?
What can we learn about the Social Graph?
Facebook is creating a sandboxed version for this purpose
Netgroup , October 7, 2010
Course Themes
Network Structure Graph theory & algorithms
Network Behavior Game theory, auctions,
markets Network Applications
Web search & markets Network effects and power
laws Modeling epidemics Aggregate behavior and
prediction markets
How does the science of networks shed light on how social, technological, and natural entities are structured and connected?
Netgroup , October 7, 2010
Topics
1. Graph TheorySocial networks, weak ties, homophily, structural
balance
2. Game Theory Nash equilibrium; examples from auctions, traffic
3. Strategic Interaction on Networks markets, matchings, network exchange theory
4. Information Networks and the Web Web structure, Web search, sponsored search markets
5. Network dynamics: population models information cascades, positive externalities, power
laws
6. Network dynamics: structural models diffusion of innovations, small-world phenomena,
epidemics
7. Institutions and Aggregate Behavior markets and information, voting, property rights
Netgroup , October 7, 2010
CS Themes Algorithms: breadth-first search, strongly
connected components, bipartite matching, weighted assignment.
Algorithmic game theory: traffic and congestion, design of auctions and truthful mechanisms, sponsored search.
Architecture of the Web: the idea of associative memory, search engines (crawl/index/process queries/advertise).
Social computing: reputation systems, recommendation systems, ranking systems, prediction markets.
Analyzing network datasets: community detection, hubs/authorities/PageRank.
Multi-agent systems: modeling systems of interacting agents, modeling agents as Bayesian reasoners.
Netgroup , October 7, 2010
Course Approach
In-class group exercises, problem sets, exams
Networks in the News Students post entries on current events
relating to networks Project: Network visualization & analysis
Recognize common graph structures Effectively apply visualization techniques to
answer questions about data In-class games
Identify rational behavior Reflect on how we are connected and how
communication has changed
Netgroup , October 7, 2010
Target Audience
Students who may have had no intention of taking a course in computing What will be our analogue of intro psych / intro
econ / intro political science? This course attempts this in the context of
current topics, but on a foundation of technical content in CS and economics.
Intended for students interested in the social and natural sciences
No programming background required Math background at level of AP, but…
Pilot in Spring 2010. Gearing up for Spring 2011
Netgroup , October 7, 2010
Getting involved
Are you doing work relevant to networks in education? Problems, data sources Course Materials Willing to give a guest lecture on your
research?
Can provide honoraria or travel funding to present relevant work
Tell students about our Networks course
Netgroup , October 7, 2010
Thanks
http://www.cs.duke.edu/forbes
HarambeeNet Projecthttp://harambeenet.org
Netgroup , October 7, 2010
Sample Blog Post
I'm Related to Kevin Bacon? Overview of the Oracle of Bacon:In class we have talked a
lot about social and computer networks and all of their component parts. We have learned many important aspects of networks and what makes them operate. One of the most interesting and complex notions is that of centrality and how one can go about calculating centrality within a social network. The Oracle of Bacon is one of the best examples of a project that has created an elaborate social network around the central figure of Kevin Bacon. However, it is interesting that the site proves Kevin Bacon to actually not be the center of the Hollywood network, in fact there are actually 1,048 actors who would make better centers than Bacon. Here is a breakdown of the best and worst centers of the Hollywood network. Although the only other actor mentioned who would make a better center is Sean Connery, it can be speculated as to what makes a great center. A good center would have to be an older actor, have appeared in many movies and many varities of movies, have appeared in large productions with many actors and have worked overseas. Alternatively, a bad center would be young, have appeared in only one type of movie, or one movie in general!
Netgroup , October 7, 2010
Why is the Oracle of Bacon Interesting to us?• In reality, the game is an example of the small world
phenomenon. The small world phenomenon was researched by Stanley Milgram as he examined the average path length for social networks of people in the United States. The phenomenon shows that paths between nodes are always shorter than expected, which is proved in the game. This oracle of Bacon game was designed by computer scientists at the University of Virginia in order to create an engaging way of dealing with the small world phenomenon. The program for calculating a Bacon number was developed by mapping networks from http://imdb.com/ (the database for movies and actors information).
Other related points• Here is the original paper by Stanley Milgram, upon
which all of this information is based. The game works to find links between different actors and find the degree of separation from Bacon. It is amazing that almost any actor, no matter how obscure, can be linked to Bacon within six degrees and the average is under three links (2.960).
• It is also interesting to look at the earlier examples of small world phenomenon, which inspired the oracle of Bacon. Erdos numbers refer to the number of nodes mathematicians are away from Paul Erdos, a Hungarian mathematician famous for collaboration. The Erdos number project gives details similar to the Oracle of Bacon about the amount of connectivity within the network of mathematicians. In this network the median Erdos number is 5; the mean is 4.65, and the standard deviation is 1.21. This shows that there is slightly less connectivity, but a high degree of centrality.
Netgroup , October 7, 2010
Here is a visualization of the Erdos Network.
More recent centrality work• There are many examples of computer scientists who
have dealt with the six degrees theory in their analysis of the small-world phenomenon including Jon Kleinberg. His paper: Could it be a Big World After All? The `Six Degrees of Separation’ Myth. Society, April 2002 deals with a lot of the important ideas discussed above. Kleinberg argues that the initial data used to create the notion of the small-world phenomenon was actually skewed and data shows that there might actually be less connectivity between people that was previously believed. This paper was published in 2002, and it does not seem to have garnered a large amount of debate amongst the scholarly community. It seems that more work and experimentation needs to be done in this field to in attempt to make claims about the connectedness of the actual world. Although Kleinberg and others made some really interesting points initially, unfortunately the computer science world seems focused on novelty, not finishing work on a phenomenon, so it may be awhile before all of our questions are answered!
Netgroup , October 7, 2010
Collaborative Filtering Goal: predict the utility of an item to a particular
user based on a database of user profiles User profiles contain user preference
information Preference may be explicit or implicit
• Explicit means that a user votes explicitly on some scale
• Implicit means that the system interprets user behavior or selections to impute a vote
Problems Missing data: voting is neither complete nor
uniform Preferences may change over time Interface issues
Netgroup , October 7, 2010
Network Models (Barabasi) Differences between Internet, Kazaa,
Chord Building, modeling, predicting
Static networks, Dynamic networks Modeling and simulation
Random and Scale-free Implications?
Structure and Evolution Modeling via Touchgraph
Netgroup , October 7, 2010
Physical Networks The Internet
Vertices: Routers Edges: Physical connections
Another layer of abstraction Vertices: Autonomous systems Edges: peering agreements Both a physical and business network
Other examples US Power Grid Interdependence and August 2003 blackout
Netgroup , October 7, 2010
Business & Economic Networks Example: eBay bidding
vertices: eBay users links: represent bidder-seller or buyer-seller fraud detection: bidding rings
Example: corporate boards vertices: corporations links: between companies that share a board
member Example: corporate partnerships
vertices: corporations links: represent formal joint ventures
Example: goods exchange networks vertices: buyers and sellers of commodities links: represent “permissible” transactions
Netgroup , October 7, 2010
Content Networks
Example: Document similarity Vertices: documents on web Edges: Weights defined by similarity See TouchGraph GoogleBrowser
Conceptual network: thesaurus Vertices: words Edges: synonym relationships
Netgroup , October 7, 2010
Social networks Example: Acquaintanceship networks
vertices: people in the world links: have met in person and know last names hard to measure
Example: scientific collaboration vertices: math and computer science researchers links: between coauthors on a published paper Erdos numbers : distance to Paul Erdos Erdos was definitely a hub or connector; had 507
coauthors How do we navigate in such networks?