Computing and the Web Databases: Controlling the Information Deluge.

9
Computing and the Web Computing and the Web Databases: Controlling the Databases: Controlling the Information Deluge Information Deluge

Transcript of Computing and the Web Databases: Controlling the Information Deluge.

Page 1: Computing and the Web Databases: Controlling the Information Deluge.

Computing and the WebComputing and the WebDatabases: Controlling the Databases: Controlling the Information DelugeInformation Deluge

Page 2: Computing and the Web Databases: Controlling the Information Deluge.

OverviewOverview Information OverloadInformation Overload Overview of Data CollectionOverview of Data Collection Retrieving DataRetrieving Data Visualization of InformationVisualization of Information Transforming Data into InformationTransforming Data into Information Database Structure & DesignDatabase Structure & Design Software ApplicationsSoftware Applications

Page 3: Computing and the Web Databases: Controlling the Information Deluge.

Information OverloadInformation Overload Sheer volume of data that is availableSheer volume of data that is available

– Processed multiple ways to provide Processed multiple ways to provide informationinformation

– Much of the data / information is interrelatedMuch of the data / information is interrelated It is projected that 1 terabyte of data is It is projected that 1 terabyte of data is

put online on the internet each dayput online on the internet each day– Assuming that a 500 page book takes Assuming that a 500 page book takes

4,750M4,750M– 1 terabyte would hold 220 books1 terabyte would hold 220 books– That’s the equivalent of 6,600 books per That’s the equivalent of 6,600 books per

monthmonth

Page 4: Computing and the Web Databases: Controlling the Information Deluge.

Information OverloadInformation Overload Relationships based upon the data are more Relationships based upon the data are more

complicated due to computers and complicated due to computers and automationautomation

Example: Grocery store checkoutExample: Grocery store checkout– Item is scanned by bar code readerItem is scanned by bar code reader– Bar code is looked up and resolved to item nameBar code is looked up and resolved to item name– Current price of item is looked upCurrent price of item is looked up– Inventory is automatically updated to reflect Inventory is automatically updated to reflect

salesale– If inventory drops below established levels, an If inventory drops below established levels, an

automatic order can be generated to replenish automatic order can be generated to replenish the inventory levels of the itemthe inventory levels of the item

Page 5: Computing and the Web Databases: Controlling the Information Deluge.

Overview of Data CollectionOverview of Data Collection Previous methods of data collectionPrevious methods of data collection

– Survey / inventory : paper & pencilSurvey / inventory : paper & pencil– Keypunch cardsKeypunch cards– Optical mark scan formsOptical mark scan forms

Current methods of data collectionCurrent methods of data collection– Bar codes & bar code scannersBar codes & bar code scanners– Proximity devices (Easy pass)Proximity devices (Easy pass)– Data probesData probes– Satellite sensing (imagery, infrared, etc.)Satellite sensing (imagery, infrared, etc.)– Intelligent devices generating “alerts”Intelligent devices generating “alerts”

Page 6: Computing and the Web Databases: Controlling the Information Deluge.

Retrieving DataRetrieving Data How data will be retrieved will figure How data will be retrieved will figure

prominently in determining how to store itprominently in determining how to store it– Who will use it (single or multiple users)Who will use it (single or multiple users)– How often will it be neededHow often will it be needed– How will it be utilized (select information or all)How will it be utilized (select information or all)

Example: Library systemExample: Library system– All books have a unique identifierAll books have a unique identifier– All patrons have separate identification numbersAll patrons have separate identification numbers– Indexes are built based upon book or patron Indexes are built based upon book or patron

numbernumber

Page 7: Computing and the Web Databases: Controlling the Information Deluge.

Retrieving DataRetrieving Data Complexity of data is growing with timeComplexity of data is growing with time Reducing everything to a number is not Reducing everything to a number is not

“natural” in relation to our brains“natural” in relation to our brains Example: FBI fingerprint filesExample: FBI fingerprint files

– Database contains over 30 million sets of printsDatabase contains over 30 million sets of prints– Fingerprints can be divided into 1,024 groupsFingerprints can be divided into 1,024 groups– Before computers: 1,400 techs / 24K req per Before computers: 1,400 techs / 24K req per

dayday– After computers: 700 techs / 30K req per dayAfter computers: 700 techs / 30K req per day– Nature of fingerprints makes it difficult to Nature of fingerprints makes it difficult to

reduce them to a numberreduce them to a number

Page 8: Computing and the Web Databases: Controlling the Information Deluge.

Visualization of InformationVisualization of Information Too much data can be a liability instead Too much data can be a liability instead

of an assetof an asset Need exists to reduce or simplify the Need exists to reduce or simplify the

datadata Visualization allows large amounts of Visualization allows large amounts of

data to be represented in a graphical data to be represented in a graphical wayway– Superimpose values on charts, maps, or Superimpose values on charts, maps, or

graphsgraphs– Example: population growth chart in bookExample: population growth chart in book

Page 9: Computing and the Web Databases: Controlling the Information Deluge.

Transforming Data into Transforming Data into InformationInformation

Data transformation is another means of Data transformation is another means of simplifying large amounts of datasimplifying large amounts of data

Statistics is the most commonly Statistics is the most commonly employed means of data transformationemployed means of data transformation– PercentPercent– ProbabilityProbability– Sampling: limited selection (beware skewing)Sampling: limited selection (beware skewing)– Normal distributionNormal distribution– CorrelationCorrelation

Connection of two related pieces of informationConnection of two related pieces of information False correlation looks normal but is erroneousFalse correlation looks normal but is erroneous