Java Analysis Studio & Object Oriented Data Analysis (in Java)

46
Java Analysis Studio & Object Oriented Data Analysis (in Java) KEK 25 th May 2000 Tony Johnson - SLAC [email protected] d.edu

description

Java Analysis Studio & Object Oriented Data Analysis (in Java). KEK 25 th May 2000 Tony Johnson - SLAC [email protected]. Contents. Overview of Java Why Java for Data Analysis Java Analysis Studio Recently added features Using Java for Reconstruction - PowerPoint PPT Presentation

Transcript of Java Analysis Studio & Object Oriented Data Analysis (in Java)

Page 1: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java Analysis Studio &Object Oriented Data Analysis (in Java)

KEK

25th May 2000

Tony Johnson - SLAC

[email protected]

Page 2: Java Analysis Studio & Object Oriented Data Analysis (in Java)

ContentsOverview of JavaWhy Java for Data AnalysisJava Analysis Studio

Recently added features

Using Java for ReconstructionLinear Collider Simulation FrameworkIs Java fast enough for Data Analysis?

HEP-wide java librariesConclusions Demo

Page 3: Java Analysis Studio & Object Oriented Data Analysis (in Java)

History of Java1991 James Gosling at Sun creates Java language (née Oak)

Targeted at consumer electronics - cable top boxes, VCR, TV etc.

Goal was reliability not speed

1994 Hot Java Web browser written (in Java)Supports Applets - Downloadable programs that run inside web browser

Java licensed by Netscape, Oracle, Microsoft many others

• Huge hype surrounding “Web Programming language”

1997 Java 1.1 released with many standard librariesSun’s mantra becomes “Write Once Run Anywhere”

Enthusiastically supported by all major hardware and many software vendors

Microsoft begins to have second thoughts

1998 Java 2 released, even more standard librariesNow truly general purpose language

Sun (and DOJ) sue Microsoft

Page 4: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java Architecture

More than just a Web ToolMore than just a Web ToolJava is a fully functional, Java is a fully functional, platform independent, , object-oriented language language

Powerful set of Powerful set of machine independent libraries, including GUI library.libraries, including GUI library.

Totally Buzzword CompliantTotally Buzzword Compliant

SimpleSimple, , Object OrientatedObject Orientated, , Distributed, Distributed, Dynamic, Robust, Secure, Architecture Neural, Dynamic, Robust, Secure, Architecture Neural, Portable, High Performance, Multithreaded.Portable, High Performance, Multithreaded.

Interpreted?Interpreted?

Java Source code

Java “Bytecodes”

Compiler

Mac Unix PC

Bytecode

Interpreter

JITCompiler

Machine Code

Compiled + Interpreted.Dynamic Optimization may make Java faster than statically compiled languages (in principle).

Page 5: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java FeaturesSimpleSimple

But not trivial…you need to read a bookBut not trivial…you need to read a book• Syntax very close to C++

No backwards compatibility issues Some features of C++ which add undue complexity dropped. Good stepping stone to (or from) C++

• Clean and Efficient Object-Oriented LanguageLanguage features guide programmer toward reliable programming habitsLanguage features guide programmer toward reliable programming habits

RobustRobust• Extensive Compile-Time checking of code• Second level of run-time checking of code• Memory management done by system, not by programmer• No pointers to mess up (Java uses references rather than pointers)

Chances of program running as designed without the need for time-Chances of program running as designed without the need for time-consuming debugging is greatly increased.consuming debugging is greatly increased.

Page 6: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java Features (continued)

Highly PortableJava works today on NT, Win95/98, Unix (including Linux), Mac, VMSJava works today on NT, Win95/98, Unix (including Linux), Mac, VMS

• Personal Java - Windows CE, Palm Pilot

Programs written in Java are very portablePrograms written in Java are very portable• Move to another platform and it just works

Care needed with AWT GUI components (obsolete) and web browsers

Lifetime of HEP experiments > OS lifetime. Lifetime of HEP experiments > OS lifetime. • Lifetime of Java > Lifetime of HEP experiment??

Encourages true modularityBuild entire framework for HEP experiment in Java Build entire framework for HEP experiment in Java

Abstract away underlying systems (batch system, IO system etc.)Abstract away underlying systems (batch system, IO system etc.)

Page 7: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java Features (continued)Distributed

Built in support for Internet protocols, URL’s, HTTP, Remote Method Invocation, Corba, Database access etc.

SecureBytecode “verifier”, padded cell (c.f. Web Browser)

MultithreadedLanguage has direct support for multithreading

Dynamic

Libraries can change without recompiling programs that use them

Can dynamically load and unload code during program executionCan move objects across the network (agents), or store them in databases and retrieve them later.

Page 8: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java Libraries and API’s

Standard Libraries and API’s2D + 3D graphics + GUI (Swing) + Imaging + Printing

Database connectivity (JDBC) + ODMG

Collections, IO (Serialization), Data Compression

Networking, Sockets, SSL, Corba, RMI

Java Beans (components), Help

Multimedia, Sound, Speech

Security, Code Signing, Cryptography

Math, Arbitrary Precision Math

Shared Data (Collaborative Applications)

Huge “Community-Ware” software archiveIBM alone has hundreds of Java resources on its Alphaworks site

Page 9: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java Tools

Popularity of Java = many tools• And they are cheap (or even free)

Development Environments (IDE’s)

• Editor, Compiler, Debugger, WYSIWYG GUI designer, Source control

Automatic Documentation generatorsMemory and CPU Optimizers

• Since debugging time is minimal you might actually have time to use them

Object ModelersMany commercial sets of components

Page 10: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java Limitations?

No operator overloadingAnnoying for complex numbers, matrices, 3/4-vectors

Perhaps more often abused than sensibly used

Lightweight Objects (value semantics) may overcome this

Bugs sometimes slow to be fixedPrinting, Imaging existed for >1 year

Perhaps “Community Source License” will help

Little control over Memory Allocation

Integration with C++ could be better

Standardization lacking

Sun had promised to submit Java to ISO for standardization, but has so far failed to deliver

Page 11: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Why Java for HEP Computing?Previous generation of experiments used Fortran + Data Management System (== Jazelle, Zebra, BOS)

Solves Three Problems

Ability to Represent Complex Data Structures

Persistence (i.e. read in and write out complex structures)

Run time access to named data in structures (for analysis)

Now time has marched on and modern experiments use C++ Represent Complex Data Persistence Run time access to data

Still need to build (or buy and deploy) data management system (e.g. Root, Objectivity)

Java Represent Complex Data Persistence (serialization) Run time access to data

(reflection)support built-in to language

Page 12: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Where would HEP use Java?

GUI systemsonline + control (not really any alternative)Event Display

Reconstruction+Simulation packages? Data Analysis tasks

OfflineOnline

Event Generators

Page 13: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java Analysis Studio

Experiment independent analysis tools for High Energy Physics data

Page 14: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Introduction to JAS

JAS starts from experience with SLD interactive data analysis

IDA (Toby Burnett) + SLD extensions

Integrates ideas from • Reason, Hippodraw, LHC++, Histoscope, …

Exploit advantages of Java• Cross platform, dynamic loading, GUI, many standard API’s –

networking, HTML, etc.

Aim is to solve real life physicist problemsWant to get input from as many people as possible.

System is flexible enough to change.

Page 15: Java Analysis Studio & Object Oriented Data Analysis (in Java)

JAS OverviewModular Java Toolkit for Analysis of HEP data

Data Format IndependentExperiment IndependentSupports arbitrarily complex analysis modules written in JavaRich Graphical User Interface (GUI) with:

• Data Explorer• Flexible Histogram + Scatterplot display • Histogram manipulation+fitting• Built-in Editor/Compiler (for writing analysis modules)• Extensible via plugins

User extensible via Object Orientated API'sWritten entirely in Java so will run on any platform with a Java VM (JDK 1.1 or better)

• Support: Windows 95/98/NT/2000 + Linux + Solaris• Works on: DEC + SGI + Mac

Page 16: Java Analysis Studio & Object Oriented Data Analysis (in Java)

JAS Components

JASHist(Plot Bean)

FittingFramework

Functions Fitters

AnalysisFramework

GUIFramework

Plugin

HistogramAccumulation

3-4 VectorUtilities

DataInterface

Histo/PlotAdaptor

NetworkAdapter

ParticleProperties

JetFinder

PAW SQL stdHEP

Page 17: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Data Access Classes

Analyze local or remote data

User interface independent of Data LocationDoes not assume fast network (works well at 28.8 bps]Analysis code moves (transparently) to data

Desktop Client DIM

Local Data

Network Data Server DIM

Remote Data

Page 18: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Remote Data Analysis

GUIDataAnalysis Engine

UsersJava Code

ExperimentInterface

JavaCompiler +Debugger

ExperimentExtensions(Event Display)

TCP/IP Network

Padded Cell

C++ Code

Data•Zebra•Jazelle•Paw•Root•Objectivity

Page 19: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Distributed Data Analysis

Network Data Server

Desktop Client

Network Data Controller

Distributed DataData Server DIMData Server DIMData Server DIMData Server DIMData Server DIMData Server DIM

Page 20: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Plot Display Package

1-d/2-d Histogram/ScatterPlot Displaymultiple axes, direct user interaction, overlays, fitting

Page 21: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java Analysis Studio GUI

Page 22: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Example AnalysisCode (TrackRecon)

Page 23: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Demo

Page 24: Java Analysis Studio & Object Oriented Data Analysis (in Java)

New FeaturesModular Plot Component

Can be used in other applications• GUI, servlets

Model-view-controller designSupports many display styles, 1d, 2d, scatterplot, fitting, slices, user interaction, XML for data interchange with other apps.

jEdit EditorFull featured program editor

• Syntax highlighting, indenting, bracket matching

Expect to be able to integrate advanced features• Debugging, auto-completion

Page 25: Java Analysis Studio & Object Oriented Data Analysis (in Java)

New Features – HTML support

Page 26: Java Analysis Studio & Object Oriented Data Analysis (in Java)

New Features – WIRED Plugin

Page 27: Java Analysis Studio & Object Oriented Data Analysis (in Java)

New Features – AIDA support

AIDA is attempt to standardize HEP histogram interface Abstract interface

• C++ and Java supported

Multiple implementations• JAS now supports AIDA interface

• Now possible to create JAS histograms from C++

C++Program

AIDA

JNI

JavaAIDA

JAS

Page 28: Java Analysis Studio & Object Oriented Data Analysis (in Java)

New Features – G4 interface

Page 29: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Future Features - 3D Support

Page 30: Java Analysis Studio & Object Oriented Data Analysis (in Java)

UsageBabar using for Online Monitoring

Using Online Monitoring APIHTML Pages with embedded plotsCustom Overlays

US Linear Collider StudiesHave an entire recon+analysis package written in Java

• Using JAS as analysis interface• Making use of remote data access using repository at University of

Pennsylvania

CLEOUsing plot bean for online displays

Other smaller scale usersAll giving very valuable feedback

Helping to produce more reliable solution

Page 31: Java Analysis Studio & Object Oriented Data Analysis (in Java)

OpenSource – Anyone can Contribute!

All source code now stored in CVS Use any CVS client for anonymous (read-only) access

• We recommend jCVS (pure Java CVS client)

Source code all web browsable • Implemented using jCVS servlet

Write access can be given to interested developers

Intend to put entire code under LGPLPlatform independent build system

Uses jmk - pure java make-like tool• To build entire system on any platform with CVS and Java

cvs co jas cd jas java -jar jmk.jar

Page 32: Java Analysis Studio & Object Oriented Data Analysis (in Java)

DocumentationLCD Tutorial exists

Nice step by step tutorial for beginners Examples are all based on LCD but can be used by anyoneStarts from very beginning

Slowly adding information to Users Guide Still nowhere near complete

How To being created to cover specific topics Servlets How ToHTML How ToXML How ToOnline API How ToWorking on Fitting How To

JavaDoc generated API documentation availableDocumentation remains weak link

We are aware of this and are working on producing more documentationAlso need more design specs/internals documentation to make open source model more effective

Page 33: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java for Reconstruction/Simulation

Dual Goals:Contribute to Linear Collider Detector/Physics Studies Experiment with using Java for full offline reconstruction and analysis package

Page 34: Java Analysis Studio & Object Oriented Data Analysis (in Java)

LC Detector studies in US Goals:

Detailed Study of physics processes in a variety of possible LC Detectors.

• Reference Small and Large detectors

Full simulation with GISMO • Switch to Geant4, when ready

Analysis using • Paw

• C++ & Root

• Java & JAS

Software Requirements

• Flexibly handle different detector geometries and technologies

• Rapid development of variety of reconstruction and analysis algorithms

Page 35: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java package hep.lcd

Reconstruction ProcessorsTrack finder+fitter written

Interface to Fortran fitter in progress

Several clustering algorithms

Parameterized MC ProcessorsCan read generator input or Gismo output

Track and Cluster smearing

Analysis UtilitiesEvent Shape + Thrust utilities

Jet finder [Jade, Durham]

Histograming

Event DisplaysSimple 2D Event display

Full 3D WIRED event display

FrameworkDriver framework

interactively controlcalling of processorsdebugging/histograming

Parameter (Constant) accessdriven by detector geometry

MC event input (StdHEP format)IO system based on Java IO

random access filesCan be run inside JAS or standalone

Page 36: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Event Display

Page 37: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Event Display

Page 38: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Event Display

Page 39: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Event Display

Page 40: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java for Reconstruction/SimulationLooks very promising

Have been able to develop framework very fast

People have no problem learning and using it

Performance looks good

Future

Java interface to Geant4?

Page 41: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Reconstruction Performance Cluster Finding

0

0.2

0.4

0.6

0.8

1

1.2

Virtual Machine

Sec

on

ds/

Eve

nt

JDK1.1.8 -nojit

JDK1.1.8

MS 5.00.3177

IBM1.1.7

IBM1.1.8

JDK 1.2.1 Classic

JDK 1.2.1 HotSpot

Track Finding + Fitting

0

5

10

15

20

25

30

35

40

Virtual Machine

Sec

on

ds/

Eve

nt

JDK1.1.8 -nojit

JDK1.1.8

MS 5.00.3177

IBM1.1.7

IBM1.1.8

JDK 1.2.1 Classic

JDK 1.2.1 HotSpot

Page 42: Java Analysis Studio & Object Oriented Data Analysis (in Java)

Java Performance SummaryIs Java Fast Enough for Physics Analysis?

Yes• Time gained in development well worth runtime overhead• Good design has more effect on final speed than language

Many tools available to help optimize code

Java will continue to get fasterMore information -

• ACM 1999 Java Grande Conference http://www.cs.ucsb.edu/conferences/java99/

• THE JAVA PERFORMANCE REPORT http://www.javalobby.org/features/jpr/

Page 43: Java Analysis Studio & Object Oriented Data Analysis (in Java)

HEP-wide Java libraries

FreeHep java libraryExtract common code from JAS+WIRED

Add other utilities (not highly hep specific)• Encapsulated Postscript generator

• JACO – Java to C++ interface

Encourage others to look at what is there• We welcome contributions from others

HEP library – more physics specific3 and 4 vectors, jet finders, MC generators

Histograming package (AIDA)

Page 44: Java Analysis Studio & Object Oriented Data Analysis (in Java)

HEP-wide Java libraries

FreeHEP library already has useful stuff in it, HEP library just getting started

Both libraries in CVS• Read access available to anyone

• Write access to qualified developers

Web Sitehttp://java.freehep.org

Contributions welcome

Page 45: Java Analysis Studio & Object Oriented Data Analysis (in Java)

ConclusionsJava is a very useful language+environment that could be very beneficial to HEP in many areas.Could Java be used for entire offline for major experiment?

Technically - YesWill Java Survive long enough?

• Need ISO standard• Need to see how market forces play out.

Programming in Java is Fun!!Spend time architecting an elegant solution to problem to be solved

• Not Reinventing the wheel, Debugging someone else’s problem Porting to different platforms

Page 46: Java Analysis Studio & Object Oriented Data Analysis (in Java)

More Information…Java Analysis Studio

http://jas.freehep.org

FreeHEP libraryhttp://java.freehep.org

US Linear Collider Reconstructionhttp://www-sldnt.slac.stanford.edu/nld

WIREDhttp://wired.cern.ch

AIDAhttp://wwwinfo.cern.ch/asd/lhc++/AIDA/index.html