Netgroup, October 7, 2010 Networks as a Motivating Domain for Computer Science Education Jeff Forbes...

74
Netgroup , October 7, 2010 Networks as a Motivating Domain for Computer Science Education Jeff Forbes http://harambeenet.org/ http://www.cs.duke.edu/forbes
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Netgroup, October 7, 2010 Networks as a Motivating Domain for Computer Science Education Jeff Forbes...

Netgroup , October 7, 2010

Networks as a Motivating Domain for Computer Science Education

Jeff Forbeshttp://harambeenet.org/http://www.cs.duke.edu/forbes

Netgroup , October 7, 2010

Outline

Motivation State of Computer Science Education Looking inward & outward to improve

Project Overview Modules for Networks

Questions asked Tools developed

Goals for this workshop

Netgroup , October 7, 2010

Acknowledgements Duke CS Education

Group: Owen Astrachan Susan Rodger Robert Duvall

HarambeeNet Researchers:

Ben Spain Dametrious Peyton Beth Trushkowsky Zach Marshall Samantha Jones Diana Ni Dave Stecher Martin Azizyan Jonathan Mathew Chris Carlon Jian-Wei Gan Tiphany Jackson Andrea Scripa

Drawn from the work of: Eytan Adar, UW Lada Adamic,

Michigan John Breese, David

Heckerman, Microsoft Research

Marti Hearst, UC Berkeley

Michael Kearns, Upenn

Jon Kleinberg, Cornell Funders

NSF Duke CIT

Netgroup , October 7, 2010

If you don’t take a course in Computer Science, you won’t major in it.

Netgroup , October 7, 2010

Enrollments in computing:from the past to the present

Netgroup , October 7, 2010

NYTimes in 1984

Netgroup , October 7, 2010

Netgroup , October 7, 2010

Netgroup , October 7, 2010

Who's going to College?

Netgroup , October 7, 2010

What’s happening more recently?

Netgroup , October 7, 2010

Is everything copacetic in CS?

Netgroup , October 7, 2010

Our response…

Netgroup , October 7, 2010

What motivated our work? What should our concerns be for those

choosing to major in Computer Science? Take CS courses? courses, research, jobs, …

Should we be concerned by the precipitous decline in those taking our courses? majors, technical students, non-technical …

What can we do to ensure the ongoing success of our academic discipline? Look inward, look to others

Netgroup , October 7, 2010

Genesis of Our Project

Broadening Participation Increase number of students in a course Increase number of majors Extend approach across levels and institutions

Non-traditional computer science examples Languages aren’t enough

Options besides programming Leverage mathematics and sciences

Netgroup , October 7, 2010

Goals for Project Provide rich and profound area of

applications Examples for other disciplines Convey part of what computer science is (and

…)

Develop resources for our students that reflect what they’re interested in, but are relevant to our discipline

Enrich our own studies while doing the same for others

Netgroup , October 7, 2010

A Future for Computer Science?

Netgroup , October 7, 2010

Why Networks & Social Media?

Logs Huge amount of data available about human

interaction e.g., Netflix, Audioscrobbler, Facebook, Twitter,

etc.

Leverage Interest and use of web-based social networks

• 700 billion minutes per month spent on Facebook!

Can make a difference for substantial societal problems

Laws Privacy & policy concerns are real and nuanced Data by the people for the people

Netgroup , October 7, 2010

Project goals

Build community around this approach

Develop, evaluate, and disseminate curricular modules

Netgroup , October 7, 2010

How do we create the materials and build the community?

Netgroup , October 7, 2010

The Advisory Board.

» Eytan Adar is an Assistant Professor in the School of Information at the University of Michigan

» Noshir Contractor is Jane S. & William J. White Professor of Behavioral Sciences Professor of Industrial Engineering & Management Science, McCormick School of Engineering; Professor of Communication Studies, School of Communication; and Professor of Management & Organizations, Kellogg School of Management at Northwestern University.

» Jennifer Golbeck is an Assistant Professor in the College of Information Studies and was formerly the Research Director for the Joint Institute for Knowledge Discovery (JIKD) at the University of Maryland.

Netgroup , October 7, 2010

Advisory Board Continued

» Balachander Krishnamurthy is a researcher at AT&T Labs. His main focus of research of late is in the areas of unwanted traffic, Internet measurements, and Internet protocols.

» Deepak Kumar is a Professor of Computer Science at Bryn Mawr College working in Artificial Intelligence, Cognitive Science, Evolutionary Computation and other areas.

» Ellen Spertus is an Associate Professor of Computer Science at Mills College and a part-time software engineer at Google.

» Fred Stutzman is a Ph.D. student at the School of Information and Library Science at UNC-Chapel Hill and Co-Founder of claimID.com.

Netgroup , October 7, 2010

Faculty Learning Community

Build interdisciplinary, cross-institutional community centered around teaching

Discuss exemplars in network science education and applications What are great ideas in network science? What problems best encapsulate these great

ideas?

Contribute to development or evaluation of modules

Netgroup , October 7, 2010

The Faculty Learning Community 2007-2008Working to Create a Bank of Viable Science of Networks Modules

» Owen Astrachan, Facilitator, Professor of the Practice of Computer Science

» David Banks, Professor of the Practice of Statistics

» Jonathon Cummings, Associate Professor of Management, Fuqua School of Business

» Jeff Forbes, Associate Professor of the Practice of Computer Science

Netgroup , October 7, 2010

FLC Continued

» James Moody, Associate Professor of Sociology

» Susan Rodger, Professor of the Practice of Computer Science

» Joshua Socolar, Associate Professor of Physics

Netgroup , October 7, 2010

The Faculty Learning Community 2009-2010Expanding the Community

• UNC– Ketan Mayer-Patel, Associate Professor of

Computer Science

– Gary Marchionini, Professor of Information and Library Science

• NCCU– Cameron Seay, Computer and Information

Systems

Netgroup , October 7, 2010

The Faculty Learning Community 2009-2010Expanding the Community

• NCSU– Steve McDonald, Assistant Professor of Sociology

• NC A&T SU– Ed Carr, Assistant Professor of Computer Science

Netgroup , October 7, 2010

What have we focused on so far?

Netgroup , October 7, 2010

Science of Networks Courses

Networked Life (UPenn CS) Networks (Cornell CS/Econ/Soc/InfoSci) Introduction to Networks (Michigan School of Information) Social Networks 101 (Northwestern) Google: The Computer Science Within and its Impact on Society (Duke) Seminar on Social Networks (Duke) Information Technology (UMaryland) Online Social Networks (UNC) The Structure of Information Networks (Cornell) Networks and Complexity in Social Systems (Columbia) Social Network Analysis (UToronto) Networks and Complexity (UCalifornia, Irvine) Algorithms, Game Theory and the Internet (Berkeley) Graphs and Networks in Systems Biology (Penn State) Network Theory (UMich) Scaling in Networks (Columbia) Structural Data Mining (UIndiana) Networks (UPatras, Greece) Information Retrieval (UMich) Complex Human Networks Reading Group (MIT) Recommender Systems (Virginia Tech) Social Network Analysis (UEssex) Create Engaging Web Applications Using Metrics and Learning on Facebook (Stanford) Computer Networks (UMich) Information Retrieval, Discovery and Delivery (Princeton) Scaling, Power Laws and Small World Phenomena in Networks (Umass) Information Retrieval (Northeastern)

Arrange courses and evaluate their merits to helping build modules in three areas: Genre Level Theme

Netgroup , October 7, 2010

What can we do with real data?

What is the center of a graph? From rumor mills to terrorists How do we detect important agents?

What are the scale issues? What algorithms are feasible for large graphs? Computing’s contribution?

Visualizing data

Netgroup , October 7, 2010

Questions

Structure: Who is the most central agent in a network?

Structure: What are the factors that lead people to trust each other?

Algorithms & Visualization: How can we analyze large networks?

How to share/store information efficiently among local groups?

Dynamics: How do networks grow and evolve?

Information networks: What does the music interest network look like?

Netgroup , October 7, 2010

Themes

1. Can network influence behavior?2. Which characteristics of networks

matter or are desirable (e.g., strong/weak ties, centrality, etc.)?

3. Scale!4. Boundary specification. How do you

define who is in a network?5. Dynamic vs. Static processesContext:6. Gather data and then ask questions7. Simulate processes on networks8. Actual experiments on the classroom

network

Netgroup , October 7, 2010

Network questions

1. Mapping university social network (design experiment based on DARPA Network Challenge)

2. How do you use network to determine identity?

3. Using data (Wikipedia article links, information traversal)

4. Local vs. global emergent phenomena

Netgroup , October 7, 2010

Data yields a number of questions Is popular culture really making us smarter? How do we find a graph’s diameter?

• Maximal shortest path between any pair of vertices

What is the center of a graph?

Visualization and analysis of networks

GUESS developed by

Eytan Adar Gython

interpreter adapted for

Duke GUESS

Netgroup , October 7, 2010

iPods and social networks

Audioscrobbler Collaborative

filtering What is a

neighbor? What is the

network?

Netgroup , October 7, 2010

Recommending papers

Can we effectively facilitate collaboration within a research community with a citation database?

Netgroup , October 7, 2010

FaceTrust

How do we assess the credibility of identity statements made by online users?

Netgroup , October 7, 2010

PGP for Facebook

How can we establish a web of trust? Key-signing party!

Netgroup , October 7, 2010

Creating a module

Netgroup , October 7, 2010

Modules

Independent unit in a course 1 to 3 weeks in a course

Centered around problems and questions not concepts

Content Technical background Social and philosophical implications Data sources and tools

Developed and tested across disciplines Ultimately will be published as Open

Educational Resource

Netgroup , October 7, 2010

Common Threads

Position How does your position within a network

advantage or disadvantage you? Centrality (closeness, betweenness, degree,

etc.) Scale

Computing properties for large networks Dealing with incomplete or inaccurate

information Hard to visualize Longitudinal studies: adding time as a

dimension can complicate things

Netgroup , October 7, 2010

Building a module

What question are you answering?

What will students do? What will students need to solve the problem? Data sources & tools

What concepts from networks will they encounter in solving the problem?

What are the goals ?

Netgroup , October 7, 2010

First Module

Can we discover research communities given online faculty CVs? Based on co-authorship data for Duke

professors in the sciences & engineering, can we detect communities defined by departmental boundaries?

Do some professors play special roles in establishing the community structure?

Are there any identifiable communities that are interdepartmental in nature?

Netgroup , October 7, 2010

First Module

Engages sociological, statistical, computational, and pedagogical concepts and questions Centrality Modularity Social capital Dealing with incomplete or inaccurate data Information integration Algorithm efficiency

Netgroup , October 7, 2010

Detecting Communities

Scrape pages from Faculty Database System, standardize entries, and upload entries into bibliographic database

Generate map of authors to coauthors Create graph where two authors are connected if

they share a coauthor From co-authorship graph, use community

structure algorithm [Clauset,Newman,Moore] to discover community structure

Co-authorship graph

CoBib

Netgroup , October 7, 2010

The Department vs. the School

Netgroup , October 7, 2010

The Department vs. the School

Netgroup , October 7, 2010

Latest Modules

Hollywood Hookup How can we measure the “romantic

extroversion” of an individual? How do we assess the quality of the data?

Sex Differences in Social Connectedness Use the General Social Survey and a student-

generated survey to assess properties of the student’s networks

Fakebook! How can you cluster users based on their type

of profile?• Masqueraders, sharers, etc.

What would behavior on a fake social network reveal about each individual?

Netgroup , October 7, 2010

Spam Detection in Twitter

How can you identify spammers on Twitter? Use network structure

• What is the Twitter network structure? How would you discover it?

• What communities exist? Are there clusters of spammers and non-spammers?

– Bipartite graph?

Use tweet content• What are the informative features?• How do you determine similarity between

tweets?• What are the patterns of behavior of spammers

– Distribution of tweet frequency, length

Computing skills & tools : How do you work with Twitter API to download necessary information?

Netgroup , October 7, 2010

Musical similarity

How do we find one’s musical neighbors? Given two playlists, return a value indicating

their similarity What does the network structure tell you

about a community’s tastes? Whose tastes are most central?

• What measure of centrality makes the most sense here?

What is the network centralization? How do you visualize this network? Computing skills & tools: How do you

parse iTunes files?

Netgroup , October 7, 2010

Transmission models

How are pathogens transmitted from one actor to another? Pathogen may be an idea

• How do individuals influence each other’s opinions, ideology, and actions?

How do you model the network to effectively answer your question? In a STD network, friendship networks might

not matter, but they may be important in studying the social influence on depression and anxiety

Computing skills & tools: How do turn web pages into your dataset?

Netgroup , October 7, 2010

Hollywood Hookup

How do the relationships of Hollywood actors and actresses differ from that of the average student? How can we measure the “romantic

extroversion” of an individual? How do we assess the quality of the

data? Survey How can statistical modeling iteratively

construct formulae that provide useful and meaningful approximations to the observed networks, enabling insight into the processes that produced those networks

Netgroup , October 7, 2010

Fakebook

Facebook is great, but not suitable for courses due to privacy and legal concerns

Enter Fakebook How can you cluster users based on their type

of profile?• Masqueraders, sharers, etc.

What would behavior on a fake social network reveal about each individual?

What can we learn about the Social Graph?

Facebook is creating a sandboxed version for this purpose

Netgroup , October 7, 2010

A new course:

CompSci 96: Science of Networks

Netgroup , October 7, 2010

Course Themes

Network Structure Graph theory & algorithms

Network Behavior Game theory, auctions,

markets Network Applications

Web search & markets Network effects and power

laws Modeling epidemics Aggregate behavior and

prediction markets

How does the science of networks shed light on how social, technological, and natural entities are structured and connected?

Netgroup , October 7, 2010

Topics

1. Graph TheorySocial networks, weak ties, homophily, structural

balance

2. Game Theory Nash equilibrium; examples from auctions, traffic

3. Strategic Interaction on Networks markets, matchings, network exchange theory

4. Information Networks and the Web Web structure, Web search, sponsored search markets

5. Network dynamics: population models information cascades, positive externalities, power

laws

6. Network dynamics: structural models diffusion of innovations, small-world phenomena,

epidemics

7. Institutions and Aggregate Behavior markets and information, voting, property rights

Netgroup , October 7, 2010

CS Themes Algorithms: breadth-first search, strongly

connected components, bipartite matching, weighted assignment.

Algorithmic game theory: traffic and congestion, design of auctions and truthful mechanisms, sponsored search.

Architecture of the Web: the idea of associative memory, search engines (crawl/index/process queries/advertise).

Social computing: reputation systems, recommendation systems, ranking systems, prediction markets.

Analyzing network datasets: community detection, hubs/authorities/PageRank.

Multi-agent systems: modeling systems of interacting agents, modeling agents as Bayesian reasoners.

Netgroup , October 7, 2010

Course Approach

In-class group exercises, problem sets, exams

Networks in the News Students post entries on current events

relating to networks Project: Network visualization & analysis

Recognize common graph structures Effectively apply visualization techniques to

answer questions about data In-class games

Identify rational behavior Reflect on how we are connected and how

communication has changed

Netgroup , October 7, 2010

Target Audience

Students who may have had no intention of taking a course in computing What will be our analogue of intro psych / intro

econ / intro political science? This course attempts this in the context of

current topics, but on a foundation of technical content in CS and economics.

Intended for students interested in the social and natural sciences

No programming background required Math background at level of AP, but…

Pilot in Spring 2010. Gearing up for Spring 2011

Netgroup , October 7, 2010

Getting involved

Are you doing work relevant to networks in education? Problems, data sources Course Materials Willing to give a guest lecture on your

research?

Can provide honoraria or travel funding to present relevant work

Tell students about our Networks course

Netgroup , October 7, 2010

Thanks

http://www.cs.duke.edu/forbes

HarambeeNet Projecthttp://harambeenet.org

Netgroup , October 7, 2010

Sample Blog Post

I'm Related to Kevin Bacon? Overview of the Oracle of Bacon:In class we have talked a

lot about social and computer networks and all of their component parts. We have learned many important aspects of networks and what makes them operate. One of the most interesting and complex notions is that of centrality and how one can go about calculating centrality within a social network. The Oracle of Bacon is one of the best examples of a project that has created an elaborate social network around the central figure of Kevin Bacon. However, it is interesting that the site proves Kevin Bacon to actually not be the center of the Hollywood network, in fact there are actually 1,048 actors who would make better centers than Bacon. Here is a breakdown of the best and worst centers of the Hollywood network. Although the only other actor mentioned who would make a better center is Sean Connery, it can be speculated as to what makes a great center. A good center would have to be an older actor, have appeared in many movies and many varities of movies, have appeared in large productions with many actors and have worked overseas. Alternatively, a bad center would be young, have appeared in only one type of movie, or one movie in general!

Netgroup , October 7, 2010

Why is the Oracle of Bacon Interesting to us?• In reality, the game is an example of the small world

phenomenon. The small world phenomenon was researched by Stanley Milgram as he examined the average path length for social networks of people in the United States. The phenomenon shows that paths between nodes are always shorter than expected, which is proved in the game. This oracle of Bacon game was designed by computer scientists at the University of Virginia in order to create an engaging way of dealing with the small world phenomenon. The program for calculating a Bacon number was developed by mapping networks from http://imdb.com/ (the database for movies and actors information).

Other related points• Here is the original paper by Stanley Milgram, upon

which all of this information is based. The game works to find links between different actors and find the degree of separation from Bacon. It is amazing that almost any actor, no matter how obscure, can be linked to Bacon within six degrees and the average is under three links (2.960).

• It is also interesting to look at the earlier examples of small world phenomenon, which inspired the oracle of Bacon. Erdos numbers refer to the number of nodes mathematicians are away from Paul Erdos, a Hungarian mathematician famous for collaboration. The Erdos number project gives details similar to the Oracle of Bacon about the amount of connectivity within the network of mathematicians. In this network the median Erdos number is 5; the mean is 4.65, and the standard deviation is 1.21. This shows that there is slightly less connectivity, but a high degree of centrality.

Netgroup , October 7, 2010

Here is a visualization of the Erdos Network.

More recent centrality work• There are many examples of computer scientists who

have dealt with the six degrees theory in their analysis of the small-world phenomenon including Jon Kleinberg. His paper: Could it be a Big World After All? The `Six Degrees of Separation’ Myth. Society, April 2002 deals with a lot of the important ideas discussed above. Kleinberg argues that the initial data used to create the notion of the small-world phenomenon was actually skewed and data shows that there might actually be less connectivity between people that was previously believed. This paper was published in 2002, and it does not seem to have garnered a large amount of debate amongst the scholarly community. It seems that more work and experimentation needs to be done in this field to in attempt to make claims about the connectedness of the actual world. Although Kleinberg and others made some really interesting points initially, unfortunately the computer science world seems focused on novelty, not finishing work on a phenomenon, so it may be awhile before all of our questions are answered!

Netgroup , October 7, 2010

Collaborative Filtering Goal: predict the utility of an item to a particular

user based on a database of user profiles User profiles contain user preference

information Preference may be explicit or implicit

• Explicit means that a user votes explicitly on some scale

• Implicit means that the system interprets user behavior or selections to impute a vote

Problems Missing data: voting is neither complete nor

uniform Preferences may change over time Interface issues

Netgroup , October 7, 2010

Network Models (Barabasi) Differences between Internet, Kazaa,

Chord Building, modeling, predicting

Static networks, Dynamic networks Modeling and simulation

Random and Scale-free Implications?

Structure and Evolution Modeling via Touchgraph

Netgroup , October 7, 2010

Physical Networks The Internet

Vertices: Routers Edges: Physical connections

Another layer of abstraction Vertices: Autonomous systems Edges: peering agreements Both a physical and business network

Other examples US Power Grid Interdependence and August 2003 blackout

Netgroup , October 7, 2010

What does the Internet look like?

Netgroup , October 7, 2010

US Power Grid

Netgroup , October 7, 2010

Business & Economic Networks Example: eBay bidding

vertices: eBay users links: represent bidder-seller or buyer-seller fraud detection: bidding rings

Example: corporate boards vertices: corporations links: between companies that share a board

member Example: corporate partnerships

vertices: corporations links: represent formal joint ventures

Example: goods exchange networks vertices: buyers and sellers of commodities links: represent “permissible” transactions

Netgroup , October 7, 2010

Content Networks

Example: Document similarity Vertices: documents on web Edges: Weights defined by similarity See TouchGraph GoogleBrowser

Conceptual network: thesaurus Vertices: words Edges: synonym relationships

Netgroup , October 7, 2010

Enron

Netgroup , October 7, 2010

Social networks Example: Acquaintanceship networks

vertices: people in the world links: have met in person and know last names hard to measure

Example: scientific collaboration vertices: math and computer science researchers links: between coauthors on a published paper Erdos numbers : distance to Paul Erdos Erdos was definitely a hub or connector; had 507

coauthors How do we navigate in such networks?

Netgroup , October 7, 2010

Netgroup , October 7, 2010

Acquaintanceship & more