Mdst3705 2013-02-05-databases

48
From Data Structures to Databases Prof. Alvarado MDST 3703 5 February 2013

description

 

Transcript of Mdst3705 2013-02-05-databases

Page 1: Mdst3705 2013-02-05-databases

From Data Structures to Databases

Prof. AlvaradoMDST 3703

5 February 2013

Page 2: Mdst3705 2013-02-05-databases

Business

• Quiz 1– To be posted this evening– Due Thursday evening– Covers content before Databases– End-of-week reflections still due

• Blogging– Please remember to be timely

• Safari Resources– If you can’t access, try going through

the Library page

Page 3: Mdst3705 2013-02-05-databases

Review

• Building as knowing– Ramsay’s point in “On Building”

• DH as cultural reverse engineering– Finding the rules in the patterns– Texts and images are the patterns in

question

• Reverse engineering is like building– Same process in reverse

(deconstruction)– Also requires building other things – like

databases to store stuff

Page 4: Mdst3705 2013-02-05-databases

For example, in Studio on Thursday we began to reverse engineer Plato’s Republic. The next step in our exercise was to parse the text into “words” and organize them in a list using an array

Page 5: Mdst3705 2013-02-05-databases

By the way, were we actually grabbing words?

Page 6: Mdst3705 2013-02-05-databases

Not really – we were find substrings, letter patterns that could also exist

within words (e.g. “cavern”)

Also, these patterns did not match synonyms or pronouns (e.g. “this”) that stand for the same thing as the

word in question

This is the difference between SYNTAX and SEMANTICS

Page 7: Mdst3705 2013-02-05-databases

Syntax = sequences of signsSemantics = meanings of signs

Semantics is much harder for computers to grasp than syntax

In fact, some think that semantics is beyond the capacity of any computer

Page 8: Mdst3705 2013-02-05-databases

Getting back to PHP

We can use arrays to model the text. So, within a FOREACH loop iterating through the lines of a text and parsing each line for “words,” we could do the following:

$words[$word]++;$words[] = $word;$lines[$lineNumber][] = $word;

Each method suggests a different model

Page 9: Mdst3705 2013-02-05-databases

More about PHP Arrays

• Arrays can be added to like so:$myArray[] = $newItem;

• Arrays can also use strings instead of number as indices, e.g.$myArray[3] = ‘foo’;$myArray[‘person’] = ‘Bob’;

• Array items may also point to arrays, creating multidimensional arrays$myArray[‘person’] = array();$myArray[‘person’][‘Bob’] = $something;

Page 10: Mdst3705 2013-02-05-databases

Arrays with string indices are called “associative arrays” in PHP

Arrays of arrays can be used to create data structures like trees and grids

Page 11: Mdst3705 2013-02-05-databases

Read Chapter 5 of PHP: The Good Parts to learn more about arrays (see link in Resources on the course blog)

Also, the PHP manual is always a good place to lookhttp://php.net/manual/en/language.types.array.php

Page 12: Mdst3705 2013-02-05-databases

Arrays as Data Structures

• PHP arrays can be used to create data structures to model things, like texts, e.g.$words[$word]++;$words[] = $word;$lines[$lineNumber][] = $word;

• These three create the following1. A simple list of word types (and their

counts)2. A list of each word in order (position and

word)3. A grid of line numbers and words

Page 13: Mdst3705 2013-02-05-databases

Here is an example of how we would create the third kind of data structure. This would store a grid of words.

Page 14: Mdst3705 2013-02-05-databases

And it would store the text in grid something like this one …

These numbers are the first dimension of the array (Y)

These horizontal numbers are the second dimension of the array (Y)

Page 15: Mdst3705 2013-02-05-databases

In this model, a text is a grid of words, each with an X and Y coordinate

Is this the only way to represent a text?

Is it the most accurate?

Page 16: Mdst3705 2013-02-05-databases

Texts can also be represented as trees

Page 17: Mdst3705 2013-02-05-databases

Document Elements and Structures

Play– Act +

• Scene +– Line +

Book– Chapter +

• Verse +

Letter

– Heading• Return Address• Date• Recipient Info

– Name– Title– Address

– Content• Salutation• Paragraph +• Closing

Page 18: Mdst3705 2013-02-05-databases

XML is designed to represent text

Page 19: Mdst3705 2013-02-05-databases

What are some differences between trees and tables?

Page 20: Mdst3705 2013-02-05-databases

Tables are more rigidTrees allow for indefinite depth

But tables are easier to manipulate

In any case, tables and trees are two major kinds of data structure that you will encounter …

Page 21: Mdst3705 2013-02-05-databases

Speaking of trees … what is this?

Page 22: Mdst3705 2013-02-05-databases

". . . the tree of nature and logic by the thirteenth-century poet, philosopher, and missionary Ramon Lull. The main trunk supports a version of the tree of Porphyry, which illustrates Aristotle's categories. The ten leaves on the right represent ten types of questions, and the ten leaves on the left are keyed to a system of rotating disks for generating answers. Such diagrams and disks comprise Lull's Ars Magna (Great Art), which was the first attempt to develop mechanical aids to reasoning. It served as an inspiration to the pioneer in symbolic logic, Gottfried Wilhelm Leibniz.”

John Sowa, explaining the cover art for Knowledge Representation

Tree of Logic (and a primitive computer)

Page 23: Mdst3705 2013-02-05-databases
Page 24: Mdst3705 2013-02-05-databases
Page 25: Mdst3705 2013-02-05-databases

What is this tree an example of?

Page 26: Mdst3705 2013-02-05-databases

The tree is a “knowledge representation” (KR)

Page 27: Mdst3705 2013-02-05-databases

A KR is a model that comprises

1. A set of categories (aka Ontology)Names and relationships between names

2. A set of inference rules (aka Logic)A method of traversing names and relations

3. A medium for computationA medium for producing inferences

4. A language for expressing these things

Such as a programming or markup language

Page 28: Mdst3705 2013-02-05-databases

Ontologies are systems of categories rooted in world views

Page 29: Mdst3705 2013-02-05-databases

Ontologies consist of categories and their relationships

These are often mapped onto physical things – the human body, or trees – as part of our cognitive model

Page 30: Mdst3705 2013-02-05-databases

The tree as body as society among the Umeda of New Guinea

Page 31: Mdst3705 2013-02-05-databases

Logic is a name for the systematic unpacking ontologies in discourse …

Page 32: Mdst3705 2013-02-05-databases

Here is a sample ontology, one very similar to Aristotle’s

Page 33: Mdst3705 2013-02-05-databases

And this is a syllogism, the basic unit of reasoning in classical logic

How is it related to the tree?

Page 34: Mdst3705 2013-02-05-databases

The sentences in the syllogism stand for the traversal of the tree that represents an implicit ontology

Page 35: Mdst3705 2013-02-05-databases

Reasoning always implies an ontology

Ontologies are often unexpressed

Ontologies often conflict with each other

(Digital) Humanists excavate or reverse engineer these ontologies

Page 36: Mdst3705 2013-02-05-databases

Now, a KR for a computer has to be an operationalized KR

How would we express a syllogism in PHP?

Page 37: Mdst3705 2013-02-05-databases

One way is to convert the tree into an array

Page 38: Mdst3705 2013-02-05-databases

0 1 2 3 4

But, given such an array, how can we find out if Socrates is mortal?

How do we find if the following is set:

Page 39: Mdst3705 2013-02-05-databases

We’d have to some some complicated nested looping to find the answer …

Page 40: Mdst3705 2013-02-05-databases

So, PHP gives us tools to create an ontology, but not a way to reason

efficiently with them

To create more effective KRs, we need the services of a database

Page 41: Mdst3705 2013-02-05-databases

A database is a “a system that allows for the efficient storage and retrieval of information”

But beyond this, it also allows us to “represent knowledge”

Given Unsworth’s definition, how must it do this?

Page 42: Mdst3705 2013-02-05-databases

Databases provide a language to define ontologies (schema) and to “unpack” these ontologies –

via a query language that lets us efficiently search and retrieve

data organized schema

Page 43: Mdst3705 2013-02-05-databases

In this course, we are going to use a relational database to store and access information

Relational databases use a language known as SQL

(pronounced S-Q-L, although some say “sequel”)

Page 44: Mdst3705 2013-02-05-databases

SQL

• SQL stands for “Structured Query Language”– NOT invented by Microsoft

• Invented in the 1970s and commercialized in the 1980s– Probably responsible for new business

models like JIT inventories

• Built on Codd’s relational model (1970)– Implements set theory and formal logic– Around the time of SGML

Page 45: Mdst3705 2013-02-05-databases

SQL

• A language used by relational databases– Oracle, SQL Server, Access, etc.

Page 46: Mdst3705 2013-02-05-databases

MySQL

• A very fast, simplified, and easy to use relational database

• A client/server app– Runs on the internet– Not a desktop app like Access

• Created by Monty Widenius in the mid 1990s– Open Source– A Finn living in Sweden – Same time as PHP

• Powered the Web 2.0 revolution

Page 47: Mdst3705 2013-02-05-databases

phpMyAdmin

• A PHP interface to MySQL• Relatively easy to use– No need to know SQL

• Great to manage databases that your PHP programs will use

• Today you will get started using UVA’s free MySQL server

Page 48: Mdst3705 2013-02-05-databases

The role of PHP