Chapter 5 Paper Outline

download Chapter 5 Paper Outline

of 4

Transcript of Chapter 5 Paper Outline

  • 8/2/2019 Chapter 5 Paper Outline




    I came across the top 10 largest databases in the world.

    Im not surprised that the top two are from our own government.

    What did surprise me was that Google was #7, despite the wealth of information it


    Overall, the size of all of these databases is very astounding.

    #1: The Library of Congress

    They have 130 million documents altogether.

    The text data, if digitized, would approximately be 20 TB in total size.

    They have 5 million digital documents.

    10,000 items are being added to the database each day.

    I did a search on Vietnam, and came across the 10,000-item limit.

    The newest document I came across in my search was an article from 1991.

    I was given 5 minutes for a search session before I had to renew it.

    #2: The CIA

    The overall size of the database is unknown, due to the number of classified files

    that it contains.

    However, there are portions of it available to the public, such as The World Fact

    Book and the Freedom of Information Act Electronic Reading Room.

    The Electronic Reading Room makes some (potentially sensitive) government

    documents available to the document.

  • 8/2/2019 Chapter 5 Paper Outline


    I did a search on Africa, and was able to come up with 98 items (available in both

    GIF and PDF formats).

    The database contains statistics on more than 250 countries and entities.


    Database contains 42 TB of data.

    Maintains extensive records on its customers.

    This database that gathers and keeps massive amounts of intimate information

    about its millions of shoppers, including their religion, sexual orientation, ethnicity

    and income.

    Combines information disclosed voluntarily by customers with facts gleaned from

    public databases.

    This gives Amazon more detailed information about its customers than any other


    #4: YouTube

    In 2006, it was projected to have 45 TB of data.

    Database is open for people who want to access it, which I find kind of astonishing.

    You must request special developer and client keys before accessing the Data API.

    Estimating the size of YouTube's database is particularly difficult due to the

    varying sizes and lengths of each video.

    Geared toward developers with experience programming server-side languages, the

    Data API contains pre-built client libraries that simply the development task.

    #5: ChoicePoint

  • 8/2/2019 Chapter 5 Paper Outline


    ChoicePoint's database of 17 billion public records is used for background checks,

    insurance applications and tenant screening.

    Contains information on 250 million people.

    Database contains 250 TB of personal data.

    Data is mostly being sold to the highest bidders, which include our government.

    Much of the companys business is being governed by the Fair Credit Reporting


    #6: Sprint

    Has 53 million subscribers.

    Database is spread across 2.85 trillion data insertions (largest in the world).

    365 million call detail records processed per day.

    Phone information has been leaked out of it, though.

    Large telecommunication companies like Sprint are notorious for having immense

    databases to keep track of all of the calls taking place on their network.

    #7: Google

    Googles database contains all of the words that are used in search terms.

    A crawler visits a page, copies the content and follows the links from that page to

    the pages linked to it, repeating this process over and over until it has crawled

    billions of pages on the web.

    Like the CIAs database, the size of Googles database is unknown (due to it being

    locked in a vault).

    Google searches account for more than 50% of all internet searches.

    Database contains virtual profiles of countless number of users.

  • 8/2/2019 Chapter 5 Paper Outline


    #8: AT&T

    Database contains 323 TB of data.

    Database has 1.9 trillion phone call records.

    AT&T is so meticulous with their records that they've maintained calling data from

    decades ago -- long before the technology to store hundreds of terabytes of data ever

    became available.

    #9: NERSC

    The NERSC database encompasses 2.8 PB of information and is operated by more

    than 2,000 computational scientists.

    The database is privy to a host of information including atomic enegry research,

    high energy physics experiements, simulations of the early universe and more.

    What distinguishes the center is its success in creating an environment that makes

    these resources effective for scientific research.

    #10: The World Data Centre for Climate

    Largest database in the world.

    220 TB of web data.

    110 TB of climate simulation data.

    6 PB of additional data on magnetic tape.

    Database is used on a computer that cost 35 million euros.
