Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1-...
Transcript of Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1-...
![Page 1: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/1.jpg)
Social Network Analysis and Big Scholarly Data
Yogendra SinghUniversity Librarian
Swami Rama Himalayan University
Email - [email protected]
![Page 2: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/2.jpg)
After this talk
• You should have a basic knowledge of Social Network Analysis
• You have an basic understanding of Big Scholarly Data
• You have an understanding of how Social Network Analysis can be applied toBig Scholarly Data
![Page 3: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/3.jpg)
Social Network
• Society is made of individuals (like wife and husband). They are known asnodes/actors/vertices in Social Network parlance
• Individuals have relations
• Information flows between these relations
• These relations are known as ties/links/images in Social Networking
• Social network is a network (group) of individuals which have certain type of relationsfor a particular information flow
![Page 4: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/4.jpg)
Look at this picture
![Page 5: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/5.jpg)
Now the big question is
• Who matters among this crowd? There could be different answers dependingupon different point of views
• The analysis of this type of social network using graph theory is called SocialNetwork Analysis
• Since Scholarly networks are the network of people (co-authors), it can well beapplied to large scholarly data or Big Scholarly Data
![Page 6: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/6.jpg)
Big Scholarly Data (BSD)
•BSD refers to millions of scholarly records available today due to tremendous changes in scholarly communication cycle
•BSD may include
• E-books, articles, reports, standards, patents etc., published by major commercial and not for profit organizations - sciencedirect.com, tandfonline.com, doaj.org etc.
• Abstracting and Indexing databases- Scopus, Web of Science, EBSCO, Google Scholar
• Academic social networks- Academia, ResearchGate, Mendeley etc.
• Many other type of scholarly data
![Page 7: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/7.jpg)
Three major scholarly data providers
Sl. No. Brand-
name
Publisher Coverage No. of
Records
1. Google
Scholar
Google Full Universe of Knowledge/ All
Formats350+ million
2. Web of
Science
Clarivate
Analytics
Bibliographic Information including
citations and other details including
abstract
90+ million
3. Scopus Elsevier
Science
Bibliographic Information including
citations and other details including
abstract
75+ million
![Page 8: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/8.jpg)
Big data analysis methods
•Statistical analysis
Suitable for smaller datasets
•Scholarly text mining
Can be used with big data
•Scholarly Network Analysis (or Social Network Analysis)
![Page 9: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/9.jpg)
Scholarly Network Analysis/ Social Network Analysis: Important measures include Centralities
• Average path length
• Clustering coefficient
• Centralities
![Page 10: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/10.jpg)
Average Path Length
• Average path length: Average distance of any two nodes in anetwork is known as Average path length
![Page 11: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/11.jpg)
Clustering Coefficient
• Average path length: Average distance of any two nodes in anetwork is known as Average path length
• Clustering coefficient: is a measure of the degree to which nodes ina graph tend to cluster together
![Page 12: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/12.jpg)
Centrality Measures
• Average path length: Average distance of any two nodes in a network is known as Average path length
• Clustering coefficient: is a measure of the degree to which nodes in a graph tend to cluster together
• Centrality Measures: They measure how central (important) a node is in a network
![Page 13: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/13.jpg)
Which individuals (nodes) are important (Central)
Measurement of importance is called Centrality in SNA
Centrality may mean differently for different people and in different context
Network Centrality
![Page 14: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/14.jpg)
Why are Centrality and Centralization Important?
• Access to information and ideas
• Interaction among members of the network
• Control the flow of information, resources, and other network content
• Visibility
• Ability to act together collectively
![Page 15: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/15.jpg)
Multiple Ways to Calculate Centrality
• Degree
• Closeness
• Betweenness
• Eigenvector
![Page 16: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/16.jpg)
Calculating Centrality
• Degree – Proportional to the number of other nodes to which a node is links –Number of links divided by (n-1).
![Page 17: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/17.jpg)
Calculating Centrality
• Degree – Proportional to the number of other nodes to which a node is links – Number of links divided by (n-1).
• Closeness – The sum of geodesic distances (shortest paths) to all other points in the graph. Divide by (n-1), then invert.
![Page 18: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/18.jpg)
Calculating Centrality
• Degree – Proportional to the number of other nodes to which a node is linked – Number of links divided by (n-1).
• Closeness – The sum of geodesic distances (shortest paths) to all other points in the graph. Divide by (n-1), then invert.
• Betweenness – The extent to which a particular point lies ‘between’ other points in the graph; how many shortest paths (geodesics) is it on? A measure of brokerage or gatekeeping.
![Page 19: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/19.jpg)
Calculating Centrality
• Degree – Proportional to the number of other nodes to which a node is links – Number oflinks divided by (n-1).
• Closeness – The sum of geodesic distances (shortest paths) to all other points in the graph.Divide by (n-1), then invert.
• Betweenness – The extent to which a particular point lies ‘between’ other points in the graph;how many shortest paths (geodesics) is it on? A measure of brokerage or gatekeeping.
• Eigenvector– A weighted measure of centrality that takes into account the centrality of othernodes to which a node is connected. That is, being connect with other central nodes increasescentrality. E.g., secretary of powerful person. Google’s page rank algorithm is based on avariation of this approach.
![Page 20: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/20.jpg)
Network Analysis Tools Applied to BSD
Software/ Access Platform/ Language Description
CiteSpace/
Free
Windows, IOS/
JavaVisualizing and analyzing trends and patterns in scientific
literature; knowledge domain visualization, best for WoS datasets
Gephi/
Free
Windows/Linux/IOS
JavaExploratory Data Analysis; Social Network Analysis; Link
Analysis
iGraph/
Free
Windows/IOS
C/R/Python/PerlA collection of network analysis tools with the emphasis on
efficiency, portability and ease of use
NetworkX /
Free
Windows/IOS
PythonCreation, manipulation, and investigation of the structures,
dynamics,
and functions of complex networks
Pajek/
Free
Windows/IOS
C/RAnalysis and visualization of large networks having some
thousands or even millions of vertices
![Page 21: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/21.jpg)
Types of Scholarly Networks could be Generated by Applying SNA to BSD
• Co-Author Network
Personal Network
Organizational Network
Geographic Network
• Co-Word Network
• Co-Citation Network
![Page 22: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/22.jpg)
BSD Analysis Applications
•Scientific Impact Evaluation
Article Impact
Author Impact
Journal Impact
Institutional Impact
![Page 23: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/23.jpg)
BSD Analysis Application - Academic Recommendations
• Literature Recommendations
• Expert Recommendations
• Collaboration Recommendations
• Priority Recommendations
![Page 24: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/24.jpg)
Scholarly Data Analysis: Steps
•Data Collection Download desired dataset from appropriate source
•Data cleaning Most difficult task as same name, institute, department is represented in different ways even by
the same individual
•Create graph using the data
•Use graph for further processing
![Page 25: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/25.jpg)
Source and Software Needed
•An appropriate data source to download desired datasets I used Scopus to download research data of IIT Roorkee
•A software tool to clean data I used OpenRefine an open source software to clean the data, however, quite a bit was
done manually
•A software tool to create the graph An online tool Table2Net was used
•Process the graph for further obtaining necessary measures Gephi was used for this purpose
![Page 26: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/26.jpg)
Steps in Analysing BSD through SNA
• Download a dataset
• Clean the data by some cleaning software such as OpenRefine and Manually
• Create Graph File through some scientific network creating online tool such as Table2Net or Scopus2Net
• Analyse that Graph file in Network Analysis software such as Gephi. You can calculate all SNA measures using Gephi
![Page 27: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/27.jpg)
Conclusion
• Application of Social Networking Tools to Big Scholarly Data is going to be big area of interest to scientometricians as very large BSD is generated daily.
• These measures can be used to evaluate the authors, institutions, subject areas or countries objectively.
• Special areas of interests, possible collaboration opportunities can be easily identified.
• As the impact of the publications can be easily identified, it will have great impact in policy making.
• Librarians can also use SNA for analyzing in-house generated data such as circulation, reference data, even footfall data.
![Page 28: Social Network Analysis and Big Scholarly Datalibrary.iitd.ac.in/arpit/Week 10- Module 1- Social... · Visualizing and analyzing trends and patterns in scientific literature; knowledge](https://reader036.fdocuments.in/reader036/viewer/2022081402/5f831dbbe1621237350e396a/html5/thumbnails/28.jpg)
THANK YOU