Teaching information: from Google Search to Big Data

49
Ohio Business Teachers Association Professional Development Conference October 3, 2014 By Martin Patrick Teaching Information Teaching information: from Google Search to Big Data by Martin I. Patrick is licensed under a Creative Commons Attribution- ShareAlike 4.0 International License .

description

The Internet is the biggest store of information the world has ever known and will be more and more central to eco- nomic activity in the future. All this information and activity comes at a price: surveys routinely show that employers are underwhelmed by young people’s information skills. In this session we will explore web-based resources that can help students better master information technology and skills us- ing resources freely available online. Together we will talk about ideas to use these resources to augment curricula, and briefly explore the next big thing in information: Big Data.

Transcript of Teaching information: from Google Search to Big Data

Page 1: Teaching information: from Google Search to Big Data

Ohio Business Teachers AssociationProfessional Development Conference

October 3, 2014By Martin Patrick

Teaching Information

Teaching information: from Google Search to Big Data by Martin I. Patrick is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Page 2: Teaching information: from Google Search to Big Data

What is information?

Image credit: http://commons.wikimedia.org/wiki/File:Machine_language.jpeg

• Wisdom

• Knowledge

• Information

• Symbols

• Data

Page 3: Teaching information: from Google Search to Big Data

The ‘net generation’

Sometimes called digital natives, these are generally people born after 1982 so not necessarily the same thing as “Millenials,” but Millenials are included.

Some assumptions made in popular literature and common discourse:• Born into a networked world, therefore:

• Innate understanding of the web and information• Different ways of behaving socially and academic• Different ways of learning and making sense

Page 4: Teaching information: from Google Search to Big Data

The ‘net generation’

• Net generation is not more comfortable with information technology overall than non-net generation students (Bullen, et al., 2011)

• Less than 20% of net generation scores as information literate, and there is no correlation between GPA/standardized test scores and information literacy (Gross & Latham, 2009)

Realities made in scholarly literature:

Page 5: Teaching information: from Google Search to Big Data

What do librarians & employers think about the net generation?

• Community college learners are not computer literate• On one test, 1.2% of net gen students scored at the mastery level of computer

literacy• Further, another found that there is “considerable deficiency” in information literacy

among first-year college students• Once again, the studies reveal no correlation between student perception of

competence and actual competence

Duke, C. M. (2011). Computer Literacy Skills of Net Generation Learners(Doctoral dissertation, Texas A&M University).

Page 6: Teaching information: from Google Search to Big Data

What do librarians & employers think about the net generation?

A study commissioned by Bentley University that surveyed students, graduates, employers, educators, and parents found:

• 45% of employers and 39% of recruiters would give net gen students a C or lower for tech skills, math, and writing.

Study: Millennials And Employers Disagree On Path To Success. (2014, January 28). http://www.forbes.com/sites/robasghar/2014/01/28/study-millennials-and-employers-disagree-on-path-to-success/

Page 7: Teaching information: from Google Search to Big Data

What do librarians & employers think about the net generation?

Hootsuite, one of the biggest social media firms out there, found that:• “millennials often have an intuitive understanding” of what will do well on

social media, but they don’t know “what data to look for, where to find it, and what to do with it” to analyze effectiveness.

Holmes, R. (2014, April). 5 Social Media Skills Millennials Lack. Retrieved from http://blog.hootsuite.com/5-social-media-skills-millennials-lack/

Page 8: Teaching information: from Google Search to Big Data

What do researchers & employers think about the net generation?

Pew has found:• “Very few teachers rate their students “excellent” on any of the research skills

included in the survey.”• only about one-quarter of teachers surveyed here rate their students

“excellent” or “very good” at appropriate and effective search queries.• 78% of teachers rate their students determination to find hard-to-find data as

poor or fair.

Purcell, K., Rainie, L., Heaps, A., Buchanan, J., Friedrich, L., Jacklin, A., . . . Zickuhr, K. (2012, November 1). How Teens Do Research in the Digital World | Pew Research Center's Internet & American Life Project. Retrieved from http://www.pewinternet.org/2012/11/01/how-teens-do-research-in-the-digital-world

Page 9: Teaching information: from Google Search to Big Data

What do researchers & employers think about the net generation?

Project Information Literacy found:• Employers place a high premium on searching online, using tools other than

search engines, and identifying the best solution.• Employers have found most net gen employees deliver the quickest answer they

can, using a search engine, a few keywords, and the first few pages.• Employers are surprised that students don’t use traditional forms of research

(calling someone, or looking through an annual report).• “[t]here is a distinct difference between the information competencies and

strategies today’s graduates bring with them to the workplace and the broader skill set that more seasoned employers need and expect.”

Head, A. J. (2012). Learning curve: how college graduates solve information problems once they join the workforce. Retrieved from Project Information Literacy website: http://journalistsresource.org/wp-content/uploads/2013/01/PIL_fall2012_workplaceStudy_FullReport.pdf

Page 10: Teaching information: from Google Search to Big Data

• Everything on the net is indexed by Google• Ease of access = quality• Google’s ranking algorithm is perfect• Information = knowledge• Therefore, net generation has mastered the internet

Myths of the ‘net generation’

Page 11: Teaching information: from Google Search to Big Data

Myth 1: Google finds everything out there

Google crawls “publicly available webpages” (http://www.google.com/intl/en-US/insidesearch/howsearchworks/crawling-indexing.html) (also known as the “Surface Web”), using what it calls Page Rank, which originally mimicked an academic citation style of linking resources. Google estimates it searches over 100,000,000 gigabytes (100 petabytes) of information and has spent over one million computer hours to build this index.

Page 12: Teaching information: from Google Search to Big Data

Reality 1: Google finds as little as 1/500th of online information

However, the “Deep Web” is estimated to contain as much as 500 (50,000 petabytes) times the amount of information that Google indexes (http://oedb.org/ilibrarian/invisible-web/).

Goog le D eep Web

1

500

Google vs. the Deep Web

Page 13: Teaching information: from Google Search to Big Data

Reality 1: Google finds as little as 1/500th of online information

http://commons.wikimedia.org/wiki/File:DeepWebDiagram.png

Page 14: Teaching information: from Google Search to Big Data

Reality 1: Google finds as little as 1/500th of online information

So, what is the deep web? • Library Databases • Company Intranets • Password-, or CAPTCHA-, protected or subscription based content (“proprietary web”)• Anything that is not linked to by any other page • Anything existing behind non-http:// based links (like FTP) • Files that can’t be internally indexed • Any website with a robots.txt file that asks for certain pages or directories, or the

entire web site, to not be indexed (“private web”)• Big data sets, like census information, historical stock quotes, etc.

Page 15: Teaching information: from Google Search to Big Data

Myth 2: Ease of Access is Synonymous with Quality

• Studies have found that as many as 94% of students in college start with Google and end with Google

• They perceive library databases as complicated compared to Google• But what have libraries started doing? Single search box

Page 16: Teaching information: from Google Search to Big Data

Reality 2: Easy is not Always Better

Page 17: Teaching information: from Google Search to Big Data

Reality 2: Easy is not Always Better

Page 18: Teaching information: from Google Search to Big Data

Myth 3: Google’s Ranking Algorithm is Neutral and Shows the Best Results

• Nearly all students stop within the first 10 results, or the end of the first page of search results.

• Google’s ranking is about how many pages link to a page, and the resulting ‘web’ of links to and from associated with that page. In a way, this is “academic” in that it ranks things based on how many times it has been “cited.”

• But, and this is key, the algorithm also takes into account what Google knows about you. Even if not signed in, Google can tell geographically where your IP address is.

Page 19: Teaching information: from Google Search to Big Data

Reality 3: Google’s Algorithm is Tailored to You

Incognito Window (not logged in) Logged in with Miami Google Apps Account

Tailored results based on what Google knows about me

Page 20: Teaching information: from Google Search to Big Data

Reality 3: Google’s Algorithm is Tailored to You (and up for sale)

Google shows my signed-in account an ad for John Boehner because Google knows I work in Boehner’s district

Page 21: Teaching information: from Google Search to Big Data

Reality 3: Google’s Algorithm is Tailored to You (and up for sale)

• Google’s Eric Schmidt: the “product I’ve always wanted to build” will “guess what I’m trying to type.”

• This depends on tailoring.• Google’s Larry Page: “The ultimate search engine would understand exactly what you

mean and give back exactly what you want.”• This is the product that Google has built: it returns results you want, not

unbiased results.• Yahoo’s Tapan Baht: “The future of the web is about personalization… weaving the

web together in a way that is smart and personalized for the user.”

From Pariser, E. (2011). The filter bubble: What the Internet is hiding from you. New York: Penguin Press.

Page 22: Teaching information: from Google Search to Big Data

Myth 4: All Information = Authoritative and Accurate Knowledge

• As we saw from the Project Information literacy report, net gen students will go for the quickest results – not the best results.

• Google is the quickest – as we saw earlier, it completed the search for “21st century workplace skills” in .37 seconds.

Page 23: Teaching information: from Google Search to Big Data

Reality 4: Information Needs to be Analyzed

A search for “barack obama vs mitt romney” returned:

Page 24: Teaching information: from Google Search to Big Data

Reality 4: Information Needs to be Analyzed

A search for “barack obama vs mitt romney” returned:

Page 25: Teaching information: from Google Search to Big Data

Reality 4: Information Needs to be Analyzed

A search for “barack obama vs mitt romney” returned:

Page 26: Teaching information: from Google Search to Big Data

Meta-myth: The ‘net generation’ has mastered the internet

Page 27: Teaching information: from Google Search to Big Data

Reality: The ‘net generation’ has not even come close to mastery

• Students are ignoring tons of information• The information they do find might not be relevant in any way• Students don’t know how to analyze the information they do find• Employers are noticing this, and they are concerned

Page 28: Teaching information: from Google Search to Big Data

• Some assignment ideas

• Everything beyond this point relies on resources that are free, or at most require an Ohio public library card (which is free!)

Teaching Information

Page 29: Teaching information: from Google Search to Big Data

To Google and Beyond!

Broad Idea: Compare Google, Bing, and Yahoo!Outcome: Understand that search engines search and rank content differently.

1. Have students search Google, Bing, and Yahoo for the same set of keywords (“solar energy”) and write about the differences.• Have them also write about why the results on page four aren’t

relevant to them (if they can!)2. Have students explore Google, Bing, and Yahoo! to try to find information

about advanced search capabilities and then re-do the search with this information.

3. You can also throw in a metasearch engine like dogpile.com

Page 30: Teaching information: from Google Search to Big Data

To Google and Beyond!

Page 31: Teaching information: from Google Search to Big Data

To Google and Beyond!

Page 32: Teaching information: from Google Search to Big Data

To Google and Beyond! Accessing advanced search options

Google Bing????Yahoo!

Page 33: Teaching information: from Google Search to Big Data

To Google and Beyond! Accessing advanced search options

Page 34: Teaching information: from Google Search to Big Data

To Google and Beyond! Accessing advanced search options

Bing does have advanced search options, but you have to know Boolean operators and use them in the search box:

Bing also has operators for file types, ip address ranges, language, site specific searches, etc.

Page 35: Teaching information: from Google Search to Big Data

Moving onto to the good stuff

What is the Ohio Web Library? Ohioweblibrary.org

• “Premium information purchased for Ohioans by Ohio libraries”• The Ohio Web Library includes: popular magazines, trade publications,

scholarly research journals, newspapers, encyclopedias, dictionaries, speeches, poems, plays, maps, satellite images of Ohio, and more

How to get access?• Just need an Ohio public library card

• Available to anyone who can show they live, work, or go to school in Ohio

• Or, access it from on-site at a public, academic, or school library

Page 36: Teaching information: from Google Search to Big Data

Moving onto to the good stuff

What is the Ohio Web Library?

Access to resources like Academic Search Premier, Business Source Premier, MasterFILE Premier, Newspaper Source, several World Book resources, and more!

You can also access resources through this portal that your local library may have purchased. For instance, my public library has access to Consumer Reports, Morningstar Investment Research, Standard & Poor’s Net Advantage tool, and a small engine repair center.

Page 37: Teaching information: from Google Search to Big Data

Going Deeper: Access Issues

Broad Idea: Check Deep Web Links Against GoogleOutcome: Realize that Google doesn’t index and search some things; URLs are not necessarily permanent

1. Have students go to google.com and enter a search term for a basic , surface web level webpage like ohioweblibrary.org.

1. Point out that Google returns the actual site as the first search result.2. Have students use a library database from the Ohio Web Library and

search for anything of interest.3. Copy the URL of the results page to the clipboard.4. Go to google.com and paste the URL into the clipboard.5. Google will return that the search “did not match any documents.”6. Have a discussion about why Google can’t find the database results page

(the database results page is dynamic, and just came into being).

Page 38: Teaching information: from Google Search to Big Data

Going Deeper: Access Issues

Google finds the website; you can search for a URL in Google and Google will find it as a result

Page 39: Teaching information: from Google Search to Big Data

Going Deeper: Access Issues

You can see I have the tab open with the database search results – that page still exists but Google can’t find it

Page 40: Teaching information: from Google Search to Big Data

Going Deeper: Evaluation Issues

Broad Idea: Compare Google with a Library DatabaseOutcome: Understand that search engines and databases search and rank content differently.

1. Have students search Google and one of the library databases for the same set of keywords (“solar energy”) and write about the differences.• How many resources are the same?

2. Have students explore the library database’s advanced search options and try the same search with some of the facets/filters turned on.

3. Have students also write about which set of results they would prefer to use for an assignment and why.

Page 41: Teaching information: from Google Search to Big Data

What we don’t mean by Big Data

Page 42: Teaching information: from Google Search to Big Data

So what then is Big Data?

IBM says that 90% of the information in the world today has been created in the last two years. They use as examples:• Data from sensors used gathering climate information• Social media posts• Digital pictures and videos• Cell phone GPS signals

http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html

Page 43: Teaching information: from Google Search to Big Data

So what then is Big Data?

The 3 (or 4) V’s of Big Data

• Volume• Variety• Velocity• Veracity

http://www.villanovau.com/resources/bi/what-is-big-data/

Page 44: Teaching information: from Google Search to Big Data

Does Big Data Matter?

From Forbes:(http://www.forbes.com/sites/gregsatell/2013/10/11/why-big-data-matters/)

“big data is important because it will transform how we manage our enterprises. For most of the 20th century, business leaders relied on “scientific” studies and “statistical significance” to determine what information they could trust. Now, technology is making those assumptions obsolete and the practice of management will never be the same.”

Page 45: Teaching information: from Google Search to Big Data

Does Big Data Matter?

From Harvard Business Review:(http://hbr.org/2012/10/big-data-the-management-revolution/ar)

“Smart leaders across industries will see using big data for what it is: a management revolution. But as with any other major change in business, the challenges of becoming a big data–enabled organization can be enormous and require hands-on—or in some cases hands-off—leadership. Nevertheless, it’s a transition that executives need to engage with today.”

Page 46: Teaching information: from Google Search to Big Data

How in the heck can we teach big data in high school?

1. Big data is often synonymous with “data science” when used for data sets in the hard sciences, like genetics and public health

2. We don’t have to use the size of data that IBM is talking about3. It is kind of like statistics

The first place to find resources, including data and lesson plans, is the US Census Bureau.

Page 47: Teaching information: from Google Search to Big Data

Census Resources

Their two lesson plans include an exercise in introducing government data for World Statistics Day (October 20, 2015) and Congressional apportionment using Census data.

http://www.census.gov/schools/materials_for_schools/lessons_and_maps.html

Page 48: Teaching information: from Google Search to Big Data

NASA Resources

NASA also has open lesson plans and data sets to use: “MY NASA DATA is an online avenue whereby educators can bring NASA data into their classroom and provide students with real-world science experiences. The website offers a growing collection of over 120 standard based lesson plans to help teachers get started with data exploration.”

Accessed at: http://mynasadata.larc.nasa.gov/lesson-plans/lesson-plans-hs-educators/

Page 49: Teaching information: from Google Search to Big Data

Further Reading & References

Slide show and links to articles cited available at: http://martinipatrick.com/obta/

Teaching information: from Google Search to Big Data by Martin I. Patrick is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.