Searching Google. Advanced Search format Searching Google Basic Search format.
Teaching information: from Google Search to Big Data
-
Upload
martin-patrick -
Category
Education
-
view
142 -
download
4
description
Transcript of Teaching information: from Google Search to Big Data
Ohio Business Teachers AssociationProfessional Development Conference
October 3, 2014By Martin Patrick
Teaching Information
Teaching information: from Google Search to Big Data by Martin I. Patrick is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
What is information?
Image credit: http://commons.wikimedia.org/wiki/File:Machine_language.jpeg
• Wisdom
• Knowledge
• Information
• Symbols
• Data
The ‘net generation’
Sometimes called digital natives, these are generally people born after 1982 so not necessarily the same thing as “Millenials,” but Millenials are included.
Some assumptions made in popular literature and common discourse:• Born into a networked world, therefore:
• Innate understanding of the web and information• Different ways of behaving socially and academic• Different ways of learning and making sense
The ‘net generation’
• Net generation is not more comfortable with information technology overall than non-net generation students (Bullen, et al., 2011)
• Less than 20% of net generation scores as information literate, and there is no correlation between GPA/standardized test scores and information literacy (Gross & Latham, 2009)
Realities made in scholarly literature:
What do librarians & employers think about the net generation?
• Community college learners are not computer literate• On one test, 1.2% of net gen students scored at the mastery level of computer
literacy• Further, another found that there is “considerable deficiency” in information literacy
among first-year college students• Once again, the studies reveal no correlation between student perception of
competence and actual competence
Duke, C. M. (2011). Computer Literacy Skills of Net Generation Learners(Doctoral dissertation, Texas A&M University).
What do librarians & employers think about the net generation?
A study commissioned by Bentley University that surveyed students, graduates, employers, educators, and parents found:
• 45% of employers and 39% of recruiters would give net gen students a C or lower for tech skills, math, and writing.
Study: Millennials And Employers Disagree On Path To Success. (2014, January 28). http://www.forbes.com/sites/robasghar/2014/01/28/study-millennials-and-employers-disagree-on-path-to-success/
What do librarians & employers think about the net generation?
Hootsuite, one of the biggest social media firms out there, found that:• “millennials often have an intuitive understanding” of what will do well on
social media, but they don’t know “what data to look for, where to find it, and what to do with it” to analyze effectiveness.
Holmes, R. (2014, April). 5 Social Media Skills Millennials Lack. Retrieved from http://blog.hootsuite.com/5-social-media-skills-millennials-lack/
What do researchers & employers think about the net generation?
Pew has found:• “Very few teachers rate their students “excellent” on any of the research skills
included in the survey.”• only about one-quarter of teachers surveyed here rate their students
“excellent” or “very good” at appropriate and effective search queries.• 78% of teachers rate their students determination to find hard-to-find data as
poor or fair.
Purcell, K., Rainie, L., Heaps, A., Buchanan, J., Friedrich, L., Jacklin, A., . . . Zickuhr, K. (2012, November 1). How Teens Do Research in the Digital World | Pew Research Center's Internet & American Life Project. Retrieved from http://www.pewinternet.org/2012/11/01/how-teens-do-research-in-the-digital-world
What do researchers & employers think about the net generation?
Project Information Literacy found:• Employers place a high premium on searching online, using tools other than
search engines, and identifying the best solution.• Employers have found most net gen employees deliver the quickest answer they
can, using a search engine, a few keywords, and the first few pages.• Employers are surprised that students don’t use traditional forms of research
(calling someone, or looking through an annual report).• “[t]here is a distinct difference between the information competencies and
strategies today’s graduates bring with them to the workplace and the broader skill set that more seasoned employers need and expect.”
Head, A. J. (2012). Learning curve: how college graduates solve information problems once they join the workforce. Retrieved from Project Information Literacy website: http://journalistsresource.org/wp-content/uploads/2013/01/PIL_fall2012_workplaceStudy_FullReport.pdf
• Everything on the net is indexed by Google• Ease of access = quality• Google’s ranking algorithm is perfect• Information = knowledge• Therefore, net generation has mastered the internet
Myths of the ‘net generation’
Myth 1: Google finds everything out there
Google crawls “publicly available webpages” (http://www.google.com/intl/en-US/insidesearch/howsearchworks/crawling-indexing.html) (also known as the “Surface Web”), using what it calls Page Rank, which originally mimicked an academic citation style of linking resources. Google estimates it searches over 100,000,000 gigabytes (100 petabytes) of information and has spent over one million computer hours to build this index.
Reality 1: Google finds as little as 1/500th of online information
However, the “Deep Web” is estimated to contain as much as 500 (50,000 petabytes) times the amount of information that Google indexes (http://oedb.org/ilibrarian/invisible-web/).
Goog le D eep Web
1
500
Google vs. the Deep Web
Reality 1: Google finds as little as 1/500th of online information
http://commons.wikimedia.org/wiki/File:DeepWebDiagram.png
Reality 1: Google finds as little as 1/500th of online information
So, what is the deep web? • Library Databases • Company Intranets • Password-, or CAPTCHA-, protected or subscription based content (“proprietary web”)• Anything that is not linked to by any other page • Anything existing behind non-http:// based links (like FTP) • Files that can’t be internally indexed • Any website with a robots.txt file that asks for certain pages or directories, or the
entire web site, to not be indexed (“private web”)• Big data sets, like census information, historical stock quotes, etc.
Myth 2: Ease of Access is Synonymous with Quality
• Studies have found that as many as 94% of students in college start with Google and end with Google
• They perceive library databases as complicated compared to Google• But what have libraries started doing? Single search box
Reality 2: Easy is not Always Better
Reality 2: Easy is not Always Better
Myth 3: Google’s Ranking Algorithm is Neutral and Shows the Best Results
• Nearly all students stop within the first 10 results, or the end of the first page of search results.
• Google’s ranking is about how many pages link to a page, and the resulting ‘web’ of links to and from associated with that page. In a way, this is “academic” in that it ranks things based on how many times it has been “cited.”
• But, and this is key, the algorithm also takes into account what Google knows about you. Even if not signed in, Google can tell geographically where your IP address is.
Reality 3: Google’s Algorithm is Tailored to You
Incognito Window (not logged in) Logged in with Miami Google Apps Account
Tailored results based on what Google knows about me
Reality 3: Google’s Algorithm is Tailored to You (and up for sale)
Google shows my signed-in account an ad for John Boehner because Google knows I work in Boehner’s district
Reality 3: Google’s Algorithm is Tailored to You (and up for sale)
• Google’s Eric Schmidt: the “product I’ve always wanted to build” will “guess what I’m trying to type.”
• This depends on tailoring.• Google’s Larry Page: “The ultimate search engine would understand exactly what you
mean and give back exactly what you want.”• This is the product that Google has built: it returns results you want, not
unbiased results.• Yahoo’s Tapan Baht: “The future of the web is about personalization… weaving the
web together in a way that is smart and personalized for the user.”
From Pariser, E. (2011). The filter bubble: What the Internet is hiding from you. New York: Penguin Press.
Myth 4: All Information = Authoritative and Accurate Knowledge
• As we saw from the Project Information literacy report, net gen students will go for the quickest results – not the best results.
• Google is the quickest – as we saw earlier, it completed the search for “21st century workplace skills” in .37 seconds.
Reality 4: Information Needs to be Analyzed
A search for “barack obama vs mitt romney” returned:
Reality 4: Information Needs to be Analyzed
A search for “barack obama vs mitt romney” returned:
Reality 4: Information Needs to be Analyzed
A search for “barack obama vs mitt romney” returned:
Meta-myth: The ‘net generation’ has mastered the internet
Reality: The ‘net generation’ has not even come close to mastery
• Students are ignoring tons of information• The information they do find might not be relevant in any way• Students don’t know how to analyze the information they do find• Employers are noticing this, and they are concerned
• Some assignment ideas
• Everything beyond this point relies on resources that are free, or at most require an Ohio public library card (which is free!)
Teaching Information
To Google and Beyond!
Broad Idea: Compare Google, Bing, and Yahoo!Outcome: Understand that search engines search and rank content differently.
1. Have students search Google, Bing, and Yahoo for the same set of keywords (“solar energy”) and write about the differences.• Have them also write about why the results on page four aren’t
relevant to them (if they can!)2. Have students explore Google, Bing, and Yahoo! to try to find information
about advanced search capabilities and then re-do the search with this information.
3. You can also throw in a metasearch engine like dogpile.com
To Google and Beyond!
To Google and Beyond!
To Google and Beyond! Accessing advanced search options
Google Bing????Yahoo!
To Google and Beyond! Accessing advanced search options
To Google and Beyond! Accessing advanced search options
Bing does have advanced search options, but you have to know Boolean operators and use them in the search box:
Bing also has operators for file types, ip address ranges, language, site specific searches, etc.
Moving onto to the good stuff
What is the Ohio Web Library? Ohioweblibrary.org
• “Premium information purchased for Ohioans by Ohio libraries”• The Ohio Web Library includes: popular magazines, trade publications,
scholarly research journals, newspapers, encyclopedias, dictionaries, speeches, poems, plays, maps, satellite images of Ohio, and more
How to get access?• Just need an Ohio public library card
• Available to anyone who can show they live, work, or go to school in Ohio
• Or, access it from on-site at a public, academic, or school library
Moving onto to the good stuff
What is the Ohio Web Library?
Access to resources like Academic Search Premier, Business Source Premier, MasterFILE Premier, Newspaper Source, several World Book resources, and more!
You can also access resources through this portal that your local library may have purchased. For instance, my public library has access to Consumer Reports, Morningstar Investment Research, Standard & Poor’s Net Advantage tool, and a small engine repair center.
Going Deeper: Access Issues
Broad Idea: Check Deep Web Links Against GoogleOutcome: Realize that Google doesn’t index and search some things; URLs are not necessarily permanent
1. Have students go to google.com and enter a search term for a basic , surface web level webpage like ohioweblibrary.org.
1. Point out that Google returns the actual site as the first search result.2. Have students use a library database from the Ohio Web Library and
search for anything of interest.3. Copy the URL of the results page to the clipboard.4. Go to google.com and paste the URL into the clipboard.5. Google will return that the search “did not match any documents.”6. Have a discussion about why Google can’t find the database results page
(the database results page is dynamic, and just came into being).
Going Deeper: Access Issues
Google finds the website; you can search for a URL in Google and Google will find it as a result
Going Deeper: Access Issues
You can see I have the tab open with the database search results – that page still exists but Google can’t find it
Going Deeper: Evaluation Issues
Broad Idea: Compare Google with a Library DatabaseOutcome: Understand that search engines and databases search and rank content differently.
1. Have students search Google and one of the library databases for the same set of keywords (“solar energy”) and write about the differences.• How many resources are the same?
2. Have students explore the library database’s advanced search options and try the same search with some of the facets/filters turned on.
3. Have students also write about which set of results they would prefer to use for an assignment and why.
What we don’t mean by Big Data
So what then is Big Data?
IBM says that 90% of the information in the world today has been created in the last two years. They use as examples:• Data from sensors used gathering climate information• Social media posts• Digital pictures and videos• Cell phone GPS signals
http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html
So what then is Big Data?
The 3 (or 4) V’s of Big Data
• Volume• Variety• Velocity• Veracity
http://www.villanovau.com/resources/bi/what-is-big-data/
Does Big Data Matter?
From Forbes:(http://www.forbes.com/sites/gregsatell/2013/10/11/why-big-data-matters/)
“big data is important because it will transform how we manage our enterprises. For most of the 20th century, business leaders relied on “scientific” studies and “statistical significance” to determine what information they could trust. Now, technology is making those assumptions obsolete and the practice of management will never be the same.”
Does Big Data Matter?
From Harvard Business Review:(http://hbr.org/2012/10/big-data-the-management-revolution/ar)
“Smart leaders across industries will see using big data for what it is: a management revolution. But as with any other major change in business, the challenges of becoming a big data–enabled organization can be enormous and require hands-on—or in some cases hands-off—leadership. Nevertheless, it’s a transition that executives need to engage with today.”
How in the heck can we teach big data in high school?
1. Big data is often synonymous with “data science” when used for data sets in the hard sciences, like genetics and public health
2. We don’t have to use the size of data that IBM is talking about3. It is kind of like statistics
The first place to find resources, including data and lesson plans, is the US Census Bureau.
Census Resources
Their two lesson plans include an exercise in introducing government data for World Statistics Day (October 20, 2015) and Congressional apportionment using Census data.
http://www.census.gov/schools/materials_for_schools/lessons_and_maps.html
NASA Resources
NASA also has open lesson plans and data sets to use: “MY NASA DATA is an online avenue whereby educators can bring NASA data into their classroom and provide students with real-world science experiences. The website offers a growing collection of over 120 standard based lesson plans to help teachers get started with data exploration.”
Accessed at: http://mynasadata.larc.nasa.gov/lesson-plans/lesson-plans-hs-educators/
Further Reading & References
Slide show and links to articles cited available at: http://martinipatrick.com/obta/
Teaching information: from Google Search to Big Data by Martin I. Patrick is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.