Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC...

12
Agent Technology for Agent Technology for Data Analysis Data Analysis Tony Johnson - SLAC Tony Johnson - SLAC 21 21 st st October 1998 October 1998 WORKSHOP ON SCIENTIFIC DATA WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS MANAGEMENT PROBLEMS AND SOLUTIONS

Transcript of Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC...

Page 1: Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.

Agent Technology for Data Agent Technology for Data AnalysisAnalysis

Tony Johnson - SLACTony Johnson - SLAC

2121stst October 1998 October 1998WORKSHOP ON SCIENTIFIC DATA WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND MANAGEMENT PROBLEMS AND SOLUTIONSSOLUTIONS

Page 2: Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.

Motivation and DisclaimerMotivation and Disclaimer

Many efforts to use Many efforts to use supersupernetworks to link networks to link supersupercomputers to transfer huge datasetscomputers to transfer huge datasets

Few efforts to make Few efforts to make effectiveeffective use of use of existing existing real-worldreal-world networks networks• Allow university users to access remote dataAllow university users to access remote data

I am I am notnot an agent technology expert an agent technology expert• We do have a We do have a prototype applicationprototype application• I’m hoping some of you are!I’m hoping some of you are!

Page 3: Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.

OutlineOutline

Overview of problemOverview of problem• Network restraintsNetwork restraints

Why agent technology?Why agent technology? Why JavaWhy Java

• For Agent Technology?For Agent Technology?• For Data Analysis?For Data Analysis?

Analysis Studio applicationAnalysis Studio application More informationMore information

Page 4: Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.

What Problem are we What Problem are we trying to solve?trying to solve?

Widely distributed users who need access Widely distributed users who need access to petabyte datasetsto petabyte datasets• Many university users with mediocre networksMany university users with mediocre networks• Most universities have no way to handle Most universities have no way to handle

petabyte data samplespetabyte data samples Physicist needs unfettered access to dataPhysicist needs unfettered access to data

• Would like effective use of desktop machineWould like effective use of desktop machine• Canned analysis wont doCanned analysis wont do

CPU/data access requirements are infiniteCPU/data access requirements are infinite

Page 5: Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.

Faster Faster networks?networks?

• Faster networks will Faster networks will not solve our not solve our problems anytime problems anytime soonsoon

• No matter how fast No matter how fast networks are they networks are they are always are always saturated.saturated.

• As networks become As networks become saturated latency saturated latency becomes highbecomes high

Page 6: Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.

Why Agent Technology?Why Agent Technology?

By encapsulating users analysis code as a By encapsulating users analysis code as a “user agent”“user agent” we can send it to the data, wide- we can send it to the data, wide-area network bandwidth requirements become area network bandwidth requirements become trivialtrivial• Analysis modules are typically small <10’s kBytesAnalysis modules are typically small <10’s kBytes• HEP output is typically histograms (binned) and HEP output is typically histograms (binned) and

scatterplots, which are both small scatterplots, which are both small Possible to do GUI based analysis of large Possible to do GUI based analysis of large

datasets using 28.8 modem connectiondatasets using 28.8 modem connection Give user the Give user the impressionimpression his analysis is his analysis is

running locally.running locally.

Page 7: Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.

Why Java for Agent Why Java for Agent Technology?Technology?

Java produces Java produces machine independent machine independent bytecodesbytecodes• Trivial to move from one machine to Trivial to move from one machine to

anotheranother• NetworkNetwork handling and handling and Remote Method Remote Method

InvocationInvocation (RMI c.f. Corba) built-in (RMI c.f. Corba) built-in• (Remote) (Remote) Dynamic loadingDynamic loading build-in build-in• MultithreadedMultithreaded servers easy to write servers easy to write• Built-in Java “Built-in Java “SandboxSandbox” can be used to ” can be used to

restrict agentsrestrict agents

Page 8: Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.

Why Java for Data AnalysisWhy Java for Data Analysis

Easy to learnEasy to learn yet very yet very powerfulpowerful, fully , fully OOOO languagelanguage• Very wide industry supportVery wide industry support• Just In Time compilation = Just In Time compilation = FastFast• Dynamic Optimization = Dynamic Optimization = FasterFaster• Very fast code, load, test, fix cycleVery fast code, load, test, fix cycle• Built in debugger, including remote Built in debugger, including remote

debuggingdebugging• Numerical functionality goodNumerical functionality good

– Java Grande Forum enhancing numerical supportJava Grande Forum enhancing numerical support

Page 9: Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.

““Java Analysis Studio”Java Analysis Studio”

Network Data Server DIM

Remote Data

Desktop Client DIM

Local Data

Network Data Controller

Distributed DataData Server DIMData Server DIMData Server DIMData Server DIMData Server DIMData Server DIM

Page 10: Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.

DemoDemo

Page 11: Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.

Network PerformanceNetwork Performance

View(Histogram)

Model(Data Source)

View Adapter Model Adapter

Caching Prefetching of dataData clumping, streaming

Page 12: Agent Technology for Data Analysis Tony Johnson - SLAC 21 st October 1998 WORKSHOP ON SCIENTIFIC DATA MANAGEMENT PROBLEMS AND SOLUTIONS.

More InformationMore Information

JavaJava• http://http://javajava.sun.com.sun.com

Java Analysis StudioJava Analysis Studio• http://www-http://www-sldntsldnt..slacslac..stanfordstanford..eduedu//jasjas

Java Grande Forum Java Grande Forum (numeric computing in (numeric computing in Java)Java)

• http://www.http://www.javagrandejavagrande.org/.org/• DDesktop esktop aaccess ccess toto rremote emote rresourcesesources

– http://www-http://www-fpfp..mcsmcs..anlanl..govgov/~/~gregorgregor//datorrdatorr//