February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

37
February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano 1

Transcript of February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

Page 1: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

1

Page 2: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

2

Motivations for the projectMotivations for the project

• Large collaborations in HEP have vast amounts of code maintained by developers scattered around the globe

• Tools exist to use a WEB browser to navigate (hyperlinks) through the code from remote locations (LightLight, LXRLXR)

none of them provides sophisticated search functions to locate specific instances of language tokens (variable names, class names, common blocks etc…)

but...

• LightLight covers FORTRAN and C++ but has no search function

• LXRLXR has a search function, albeit limited to literal search; no regular expression support is provided. It only works for C

Page 3: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

3

This unsettled situation prompted us to try a different approach to the problem of providing remote access to source code: This unsettled situation prompted us to try a different approach to the problem of providing remote access to source code:

• we focus more on the search capabilitiessearch capabilities of the source code navigator rather than it’s hyperlink connectivityhyperlink connectivity

Powerful language parsers under the hoodPowerful language parsers under the hood

• CC and particularly C++C++ dominate current software efforts, but vast amounts of FORTRANFORTRAN code still linger around (legacy codelegacy code)

We aim to provide search functionality for all three languages We aim to provide search functionality for all three languages

• Portability is an important issue, together with scalability

The navigator is based on PERLPERL and JavaScriptJavaScript (open software)The navigator is based on PERLPERL and JavaScriptJavaScript (open software)

Page 4: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

4

Why a WEB based source code navigator? A typical scenario in HEPWhy a WEB based source code navigator? A typical scenario in HEP

• access to repository usually requires remote login

Experiment’s official source code repository

• users need to be knowledgeable of whereabouts such as location and structure of the repository

Users Users

• users often need to locate the occurrences of a particular token (e.g.: a variable) within the entire name-space of a reconstruction or simulation code• need an easy to use control panel as a navigation steering-wheel

Not always feasibleNot always feasible

Need practice and a mapNeed practice and a map

Hyperlinks are insufficientHyperlinks are insufficient

WEB browsers are an optimal choiceWEB browsers are an optimal choice

Page 5: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

5

General philosophy and guidelines for the WhereTheHeck projectGeneral philosophy and guidelines for the WhereTheHeck project

• the only navigation interface required is a WEBWEB browser

• languages supported are FORTRANFORTRAN, CC and C++C++

• the package is entirely written in PERLPERL and JavaScriptJavaScript Scales well and ensures portability

Scales well and ensures portability

• the parsers are: f2cf2c for FORTRANFORTRAN and gccgcc (egcsegcs) for CC and C++C++

Provision for others to comeProvision for others to come

Piggy-back parsertechnology expertise

Piggy-back parsertechnology expertise

Not reinvent the wheelNot reinvent the wheel

• input of the navigation package is a directory tree with not yet preprocessed source code files (no provision to directly access any code management format)• output of the navigation package is a set of HTML web pages created on the fly

No static HTML files existevery information is thus up-to-date

by definition

No static HTML files existevery information is thus up-to-date

by definition

• output is produced very very fast (there is a specialized database under the hood)

Page 6: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

6

An installationpackage

An installationpackage

A UNIX tar file containing the whole WhereTheHeck sourcecode. Installation and configuration scripts take care of thecustomization process on the user local machine (usually theofficial experiment’s repository computer with a WEB server)

A UNIX tar file containing the whole WhereTheHeck sourcecode. Installation and configuration scripts take care of thecustomization process on the user local machine (usually theofficial experiment’s repository computer with a WEB server)

A parserA parser

A tokens databaseA tokens database Tokens extracted by the parser are stored in a database for fast access (slow for updates, extremely fast for retrieval)

Tokens extracted by the parser are stored in a database for fast access (slow for updates, extremely fast for retrieval)

A WEB interface A WEB interface A control panel (query forms) and an output panel: HTML pages are created on the fly by user demand in real time

A control panel (query forms) and an output panel: HTML pages are created on the fly by user demand in real time

The project has four major componentsThe project has four major components

A set of scripts capable of extracting the list of tokens from a source code and their location (file, line, column, type etc…)

A set of scripts capable of extracting the list of tokens from a source code and their location (file, line, column, type etc…)

Page 7: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

7

The structure of WhereTheHeckThe structure of WhereTheHeck

Experiment’s sourcecode official repository

/root /subdir1 /subdir2 …...

Experiment’s sourcecode official repository

/root /subdir1 /subdir2 …...

The parserThe parser

Token list• token name• token type (variable, class…)• token qualifier (reference, modified location…)• token position (file,line, column)

Token list• token name• token type (variable, class…)• token qualifier (reference, modified location…)• token position (file,line, column)

The database manager

The database manager

The database files• linked lists• extremely fast access during read• reasonably slow during creation

The database files• linked lists• extremely fast access during read• reasonably slow during creation

The browser’sclient (Netscape)

The browser’sclient (Netscape)

The CGI searchengine (PERL)

The CGI searchengine (PERL)

The HTML formatter

The HTML formatter

Dynamically created HTML page

Dynamically created HTML page

Off-line processing on remote repository done (at the Lab)

Off-line processing on remote repository done (at the Lab)

Page 8: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

8

The parserThe parser

To avoid reinventing the wheel, we used as parsers our customized version of two public domain compilers of wide spread use:

f2c for FORTRANf2c for FORTRAN

gcc for C and C++ gcc for C and C++

For both of them we modified those part which perform the lexical analysis of the source code:compiling code with them gives, as a by-product, the full list of tokens for each language.

f2cf2cToken name: Px

Token name: Px

Token qualifier: referenced

End column: 13

Start column: 12

Line number: 1238

Source file: /disk1/menasce/analysis/FindTrack.f... If ( Px .Gt. 100. ) Then Endif...

... If ( Px .Gt. 100. ) Then Endif...

1238

Token type: real variable

FindTrack.fFindTrack.f

Database of all tokensDatabase of all tokens

Source code repositorySource code repository

Page 9: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

9

FORTRAN has a rather simple syntax and is correspondingly easy to parse (so to speak…)C, on the other hand, is much more complex and C++ is a real nightmare.

This has important consequences:

• The FORTRAN parser is already almost complete and working

• For C we are on the way of completion (for a limited set of syntactic elements)

• For C++ we customized the parser for a very limited set of tokens (classes, methods)

There is a rather long list of technical difficulties which hampered a straightforward use of the f2c and gcc parsers

There is a rather long list of technical difficulties which hampered a straightforward use of the f2c and gcc parsers

• The code developers work with is usually filled with precompiler directives (#define(#define ...). As a result, token location within the code found by a parser (which works on files already preprocessed ) does not match the original position within the source code:

• The lexer part of those parsers is not documented anywhere

• We wanted a customization in the form of a patch, so that later versions of the official compiler could be easily patched to provide the functionality we need

Page 10: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

10

#define F(x,y,z) (((x)&(y)) | ((~x) &(z))) ………

Var = F(a,beta,c) + zeta123456789-123456789-123456789-123456789-12345

In the original source code the zeta variable is placed in position 24 / 27

In the original source code the zeta variable is placed in position 24 / 27

Original source code snippetOriginal source code snippet

……… ………

Var = (((a)&(beta)) | ((~a) &(c))) + zeta123456789-123456789-123456789-123456789-12345

After precompiler expansion has occurredAfter precompiler expansion has occurredNow zeta appears at location 41 / 44. This is what the compiler sees as a source code

Now zeta appears at location 41 / 44. This is what the compiler sees as a source code

Macro definitionMacro definition

To solve the problem, we developed an inverse precompiler (uncpp)To solve the problem, we developed an inverse precompiler (uncpp)

#define F(x,y,z) (((x)&(y)) | ((~x) &(z))) ………

Var = F(a,beta,c) + zeta

Given the columns of a token as determined by the compiler, uncpp recovers the originalcolumns within the source code: these quantities are then stored in the tokens database.

Given the columns of a token as determined by the compiler, uncpp recovers the originalcolumns within the source code: these quantities are then stored in the tokens database.

Page 11: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

11

We designed an ad hoc database, implemented as a PERL module

• Basically its a multiply sorted linked list, featuring a very fast retrieval time

Takes less than a second to retrieve the location of a token among 4 million entries

• Tokens can be searched for by specifying an arbitrary complex regular expression (following PERL’s implementation of regexp)

The tokens databaseThe tokens database

Regexp have a concise yet powerful syntaxRegexp have a concise yet powerful syntax

WEB Input form can be made extremely simpleWEB Input form can be made extremely simple

Short regexp conveys lots of information: accurate pinpoint of tokensShort regexp conveys lots of information: accurate pinpoint of tokens

Page 12: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

12

The WEB interface consists of an HTML page containing a JavaScript input form

• It is essentially an abstraction layer to the token database:

Regardless of the language of the source code being browsed, the input form and the generated WEB output pages have always the same appearance

• HTML output pages are created on-demand by means of CGI scripts (PERL)

The WEB interface and the search engine (CGI)The WEB interface and the search engine (CGI)

Future extensions to additional languages have little effect on the infrastructure (even the database has an abstraction layer to the parser)

Future extensions to additional languages have little effect on the infrastructure (even the database has an abstraction layer to the parser)

• Output consists of HTML formatted pages with the source code line numbered color coded and the requested token highlighted in redred

• Only two types of token have associated hyperlinks

• Subroutine and function references• Include files

Page 13: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

13

The WEB interface and the search engine (CGI)The WEB interface and the search engine (CGI)

Users are sometimes interested in finding out where a particular pattern of charactersis located even if it’s not part of the language (like a comment line)

An input form accepts a regular expression by the user

A match is then attempted for any source file in the specified directory tree

original files are scanned, no token database is used!

This option ensures full coverage of almost any possible request a user might have

Page 14: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

14

This entire tool is available via download from the WEB

• To install and locally configure the system only two scripts are needed

INSTALL: takes care of locally compiling the customized versions of gcc, f2c and additional PERL modules) CONFIGURE: provides the local configuration of the tool via a user driven menu: takes care of adapting the tool to the local WEB server and other associated tasks

• To create the navigation structure for a particular project a single script is needed:

wth.pl: menu driven: given the path of the directory tree containing the code to hyperlink, it generates the tokens database needed for WEB navigation and any other ancillary file

Now, let’s see a working exampleNow, let’s see a working example

The installation and configuration packageThe installation and configuration package

Demo available online athttp://almifo1e.mi.infn.it/W_Main.html

Demo available online athttp://almifo1e.mi.infn.it/W_Main.html

Page 15: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

15

This is the main entry point for the interfaceto project mcfast (a large simulation program).

This is the main entry point for the interfaceto project mcfast (a large simulation program).

Page 16: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

16

WEB browser multiframe pop-up

window

WEB browser multiframe pop-up

window

Page 17: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

17

Page 18: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

18

Page 19: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

19

Page 20: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

20

Page 21: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

21

Page 22: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

22

Find where a token containing thestring zmin oror the string zmax is located in the whole source code ofthe project mcfast,but only in placeswhere its value gets modified

Find where a token containing thestring zmin oror the string zmax is located in the whole source code ofthe project mcfast,but only in placeswhere its value gets modified

Use of a simple regular expressionUse of a simple regular expression

Page 23: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

23

Page 24: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

24

Even simple stringscan be searched for,either as plain strings or as regularexpressions

Even simple stringscan be searched for,either as plain strings or as regularexpressions

Page 25: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

25

Page 26: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

26

22) /vtx28/winner/btev/mcfast/v2_6_2/mcfast/src/geom/load_beampipe.f

Page 27: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

27

Page 28: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

28

Page 29: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

29

Page 30: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

30

Page 31: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

31

Page 32: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

32

Page 33: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

33

Page 34: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

34

Page 35: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

35

Floating

Integer….

Page 36: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

36

Page 37: February 2000CHEP2000 - Dario Menasce, I.N.F.N. Milano 1.

February 2000 CHEP2000 - Dario Menasce, I.N.F.N. Milano

37

ConclusionsConclusions

• We have devoleped a WEB based source code navigator using a novel approach

• Focus is on search-findsearch-find capabilities rather than hyperlinked navigationhyperlinked navigation

• FORTRANFORTRAN browsing capabilites already fully implemented

• CC on it’s way to completion. C++C++ with limited capabilites

• The possible connectivities that can be implemented once a database of token pointers has been made available are still all to be explored….

• This tool has been recently made available (beta-test on a best-effort) to a limited set of experimental groups for evaluation

• For future developments, particularly in the C++ sector, we definitely envisage help from software professionals

The basic infrastructure (parsers, database manager, search engine)

is all in place at this very momentis all in place at this very moment The basic infrastructure (parsers, database manager, search engine)

is all in place at this very momentis all in place at this very moment