Section 1 # 1 CS 765 1. The Age of Infinite Storage.

14
Section 1 # 1 CS 765 1. The Age of Infinite Storage

Transcript of Section 1 # 1 CS 765 1. The Age of Infinite Storage.

Page 1: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

Section 1 # 1

CS 765

1. The Ageof

Infinite Storage

Page 2: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

1. The Ageof

Infinite Storagehas begun

Many of us have enough money in our pockets right now to buy all the storage we will be able to fill for the next 5 years.

So having the storage capacity is no longer a problem.

Managing it is a problem (especially when the volume gets large).

How much data is there?

Section 1 # 2

Page 3: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

Tera Bytes (TBs) are Here 1 TB costs 1k$ to buy 1 TB costs ~300k$/year to own

Management and curation are the expensive part Searching 1 TB takes hours

I’m Terrified by TeraBytes

I’m Petrified by PetaBytes

Googi 10100

. . .

Yotta 1024

Zetta 1021

Exa 1018

Peta 1015

Tera 1012

Giga 109

Mega 106

Kilo 103

We are here

I’m completely Exafied by ExaBytes

I’m too old to ever be Zettafied by ZettaBytes, but you may be in your lifetime.

You may be Yottafied by YottaBytes.

You may not be Googified by GoogiBytes,

but the next generation may be?

Section 1 # 3

Page 4: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

How much information is there?

Soon everything can be recorded and indexed.

Most of it will never be seen by humans.

Data summarization, trend detection, anomaly detection, data mining, are key technologies

Yotta

Zetta

Exa

Peta

Tera

Giga

Mega

Kilo

A BookA Book

.Movie

All books (words)

All Books MultiMedia

Everything!Recorded

A PhotoA Photo

10-24 Yocto, 10-21 zepto, 10-18 atto, 10-15 femto, 10-12 pico, 10-9 nano, 10-6 micro, 10-3 milli Section 1 # 4

Page 5: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

First Disk, in 1956 IBM 305 RAMAC

4 MB

50 24” disks

1200 rpm (revolutions per minute)

100 milli-seconds (ms) access time

35k$/year to rent

Included computer & accounting software(tubes not transistors)

Section 1 # 5

7th Grade

C.S. lab Tech.

Page 6: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

10 years later1.

6 m

eter

s 30 MB

Section 1 # 6

Page 7: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

In 2003, the Cost of Storage was about 1K$/TB.It’s gone steadily down since then.

y = 6.7x

y = 17.9x

0100200300400500600700800900

1000

0 20 40 60GB

$ IDE

SCSI

Price vs disk capacity

6

0

5

10

15

20

25

30

35

40

0 10 20 30 40 50 60GB

$

IDE

SCSI

k$/TB

12/1/1999

y = 3.8x

y = 13x

0100200300400500600700800900

1000

0 20 40 60 80Raw Disk unit Size GB

$

SCSI

IDE

Price vs disk capacity

0

5

10

15

20

25

30

35

40

0 20 40 60 80Disk unit size GB

$

SCSI

IDE

raw k$/TB

9/1/2000

y = 2.0x

y = 7.2x

0

200

400

600

800

1000

1200

1400

0 50 100 150 200Raw Disk unit Size GB

$ SCSI

IDE

Price vs disk capacity

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

0 50 100 150 200Disk unit size GB

$ SCSI

IDE

raw k$/TB

9/1/2001

y = 6x

y = x

0

200

400

600

800

1000

1200

1400

0 50 100 150 200Raw Disk unit Size GB

$

SCSI IDE

Price vs disk capacity

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

0 50 100 150 200Disk unit size GB

$ SCSI

IDE

raw k$/TB

4/1/2002

y = 2x y = x

0

200

400

600

800

1000

1200

1400

0 50 100 150 200 250Raw Disk unit Size GB

$

SCSI IDE

Price vs disk capacity

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

0 50 100 150 200 250Disk unit size GB

$ SCSI

IDE

raw k$/TB

11/4/2003

Section 1 # 7

Page 8: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

Disk EvolutionKilo

Mega

Giga

Tera

Peta

Exa

Zetta

Yotta

Section 1 # 8

Page 9: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

MemexAs We May Think, Vannevar Bush, 1945

“A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility”

“yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can enter material freely”

Section 1 # 9

Page 10: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

Can you fill a terabyte in a year?

Item Items/TB Items/day

a 300 KB JPEG image 3 M 9,800

a 1 MB Document 1 M 2,900

a 1 hour, 256 kb/s MP3 audio file

9 K 26

a 1 hour 1 MPEG video 290 0.8

Section 1 # 10

Page 11: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

On a Personal Terabyte, How Will We Find Anything?

Need Queries, Indexing, Data Mining, Scalability, Replication…

If you don’t use a DBMS, you will implement one of your own!

Need for Data Mining, Machine Learning is more important then ever!

Of the digital data in existence today,

80% is personal/individual

20% is Corporate/Governmental

DBMSDBMS

Section 1 # 11

Page 12: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

We’re awash with data! Network data:

10 terabytes by 2004 ~ 1013 Bytes

US EROS Data Center archives Earth Observing System (near Soiux Falls SD) Remotely Sensed satellite and aerial imagery data

15 petabytes by 2007 ~ 1016 Bytes

National Virtual Observatory (aggregated astronomical data) 10 exabytes by 2010 ~ 1019 Bytes

Sensor data from sensors (including Micro & Nano -sensor networks) 10 zettabytes by 2015 ~ 1022 Bytes

WWW (and other text collections) 10 yottabytes by 2020 ~ 1025 Bytes

Genomic/Proteomic/Metabolomic data (microarrays, genechips, genome sequences) 10 gazillabytes by 2030 ~ 1028 Bytes?

Stock Market prediction data (prices + all the above?) 10 supragazillabytes by 2040 ~ 1031 Bytes?

Useful information must be teased out of these large volumes of raw data.

AND these are some of the 1/5th of Corporate or Governmental data collections. The other 4/5ths of data sets are personnel!

I made up these Name! Projected data sizes are overrunning our ability to name their orders of magnitude!

Section 1 # 12

Page 13: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

Parkinson’s Law (for data) Data expands to fill available storage

Disk-storage version of Moore’s Law

Available storage doubles every 9 months!

How do we get the information we need from the massive volumes of data we will have? Querying (for the information we know is there) Data mining (for the answers to questions we

don't know to ask precisely).Section 3 # 13

Page 14: Section 1 # 1 CS 765 1. The Age of Infinite Storage.

Thank you.

Section 3 # 1