International Journal of Software Engineering (IJSE) Volume 2 Issue 1

8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

1/22


2/22

INTERNATIONAL JOURNAL OF SOFTWARE

ENGINEERING (IJSE)

VOLUME 2, ISSUE 1, 2011

EDITED BYDR. NABEEL TAHIR

ISSN (Online): 2180-1320

IInternational Journal of Software Engineering (IJSE) is published both in traditional paper formand in Internet. This journal is published at the website http://www.cscjournals.org, maintained by

Computer Science Journals (CSC Journals), Malaysia.

IJSE Journal is a part of CSC Publishers

Computer Science Journals

http://www.cscjournals.org


3/22

INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING

(IJSE)

Book: Volume 2, Issue 1, March 2011

Publishing Date: 04-04-2011ISSN (Online): 2180-1320

This work is subjected to copyright. All rights are reserved whether the whole or

part of the material is concerned, specifically the rights of translation, reprinting,

re-use of illusions, recitation, broadcasting, reproduction on microfilms or in any

other way, and storage in data banks. Duplication of this publication of parts

thereof is permitted only under the provision of the copyright law 1965, in its

current version, and permission of use must always be obtained from CSC

Publishers.

IJSE Journal is a part of CSC Publishers

http://www.cscjournals.org

IJSE Journal

Published in Malaysia

Typesetting: Camera-ready by author, data conversation by CSC Publishing Services CSC Journals,

Malaysia

CSC Publishers, 2011


4/22

EDITORIAL PREFACE

The International Journal of Software Engineering (IJSE) provides a forum for softwareengineering research that publishes empirical results relevant to both researchers andpractitioners. It is the fourth issue of First volume of IJSE and it is published bi-monthly, withpapers being peer reviewed to high international standards.

The initial efforts helped to shape the editorial policy and to sharpen the focus of the journal.Starting with volume 2, 2011, IJSE appears in more focused issues. Besides normal publications,IJSE intend to organized special issues on more focused topics. Each special issue will have adesignated editor (editors) either member of the editorial board or another recognized specialistin the respective field.

IJSE encourage researchers, practitioners, and developers to submit research papers reportingoriginal research results, technology trend surveys reviewing an area of research in softwareengineering, software science, theoretical software engineering, computational intelligence, andknowledge engineering, survey articles surveying a broad area in software engineering andknowledge engineering, tool reviews and book reviews. Some important topics covered by IJSEusually involve the study on collection and analysis of data and experience that can be used to

characterize, evaluate and reveal relationships between software development deliverables,practices, and technologies. IJSE is a refereed journal that promotes the publication of industry-relevant research, to address the significant gap between research and practice.

IJSE give the opportunity to researchers and practitioners for presenting their research,technological advances, practical problems and concerns to the software engineering.. IJSE isnot limited to a specific aspect of software engineering it cover all Software engineering topics. Inorder to position IJSE amongst the most high quality journal on computer engineering sciences, agroup of highly professional scholars are serving on the editorial board. IJSE include empiricalstudies, requirement engineering, software architecture, software testing, formal methods, andverification.

International Editorial Board ensures that significant developments in software engineering from

around the world are reflected in IJSE. The submission and publication process of manuscriptdone by efficient way. Readers of the IJSE will benefit from the papers presented in this issue inorder to aware the recent advances in the Software engineering. International Electronic editorialand reviewer system allows for the fast publication of accepted manuscripts into issue publicationof IJSE. Because we know how important it is for authors to have their work published with aminimum delay after submission of their manuscript. For that reason we continue to strive for fastdecision times and minimum delays in the publication processes. Papers are indexed &abstracted with International indexers & abstractors.

Editorial Board MembersInternational Journal of Software Engineering (IJSE)


5/22


6/22

International Journal of Software Engineering (IJSE), Volume (2), Issue (1)

TABLE OF CONTENTS

Volume 2, Issue 1, March 2011

Pages

1 - 12 Next-Generation Search Engines for Information Retrieval

Ranjeet Devarakonda, Les Hook, Giri Palanisamy, Jim Green


7/22


8/22

Ranjeet Devarakonda, Les Hook, Giri Palanisamy & Jim Green

International Journal of Software Engineering (IJSE), Volume (2) : Issue (1) : 2011 2

1. INTRODUCTIONScientific data, in its most general context across multiple disciplines, includes measurementsand observations of natural phenomena for the purpose of explaining the behavior of or testinghypotheses about the natural systems. This includes observational data captured in real-time bysensors, surveys, and imaging devices; experimental data from laboratory instruments, forexample, gene sequences, chromatograms, and characterization of samples; simulation data

generated from models where the model is equally important with the input and output data suchas for climate and economic models; and derived or compiled data that are the result of text anddata mining, and compiled and integrated databases from multiple sources.

Remote sensing and high throughput scientific instruments and high spatial and temporalresolution model simulations are creating vast data repositories and as a result of increasedinvestments in engineering research, disease research , and environmental monitoring, thevolume of scientific data is approximately doubling each year [1]. Data volume requires newscientific methods to analyze and organize the data for fully exploiting the value of the data andbeyond the original collection purpose.

The data content can be collected in any format, most is digital by necessity and to be of greatestvalue for analyses. Instrument platform and science program specific formats are oftenintegrated into the data processing stream. The data can be formatted as straightforward tabularASCII files, spatially gridded files, Geotiff image formats or formatted to be consistent withprogrammatic repository needs such as Genomes-to-Life.

What is metadata and why is it so useful?

The digital data are not complete without descriptive information (meta) data to tell us what itmeans. Metadata are the descriptive information about data that explains the measuredattributes, their names, units, precision, accuracy, data layout and ideally a great deal more. Mostimportantly, metadata includes the data lineage that describes how the data was measured,acquired, or computed [2]. Metadata in its simplest form, are those elements that conciselyidentify the "who, what, where, when, why, and how" of the data.

Metadata enables data sharing and access. Sharing data with colleagues the broader scientific

community and public is highly desirable and will result in greater advancement of science [1].

We know data sharing is a good thing. Doesn't everyone else? Open science and new research Data longevity Data reusability Greater exposure to data Generation of Value added products Verification of published works Possibility for future research collaborations More value for the research investment

The volume of scientific data, and the interconnectedness of the systems under study, makesintegration of data a necessity. For example, life scientists must integrate data from acrossbiology and chemistry to comprehend disease and discover cures, and climate change scientistsmust integrate data from wildly diverse disciplines to understand our current state and predict theimpact of new policies [3].

It is critical to begin to document your data at the very beginning of your research project, evenbefore data collection begins; doing so will make data documentation easier and reduce thelikelihood that you will forget aspects of your data later in the research project.


9/22


10/22



FIGURE 1:Combination of Spatial Coordinates for indexing

The compact logic available in XPATH allows us to construct the following, which would extractthe minimum value from the embedded array of lattitude values:

= ../preceding-sibling::Point/PointLatitude)and not(. >=../following-sibling::Point/PointLatitude)]/>Assignment of this as an indexed (searchable) value is then possible, with the result that we canthen provide exact or ranged searches within the collections. This is just one example of the typeof operation possible with the combination of types defined in the compiled code and the use ofinjection to provide new behavior at run-time. This methodology allows for growth not just of thesingle use index, but extension of the tool to associated projects via use of multiple schemadefinitions allocated on a per source basis to index and serve up related content.


11/22



2.2 Indexing MetadataIts not uncommon to have millions of pages to index in a corporate and government circles.Normally, search engine creates the index and the database of documents that it accesses whenprocessing a user query. Indexer engine sorts every word on every page and stores the resultingindex in a huge database [6]. Example: Google. Query processor then compares the search

query against the index and returns the documents that it considers most relevant.

In Modern search indexers like Apache SOLR, Users can pass a number of optional Queryparameters to the request handler to control what information is returned. Solrconfig.xml containsthe default parameters values. Default parameters can be overridden during the request query-time. Schema.xml specifies all the fields that can be indexed from the metadata documents andhow those fields should be dealt with when adding documents to the index or querying thosefields (SEE figure below).

FIGURE 2:SOLR Fields

Common SOLR Fields:

field nameis the actual name of the file in the SOLR index typeis the data structure type the field is stored in indexedis a Boolean type to enable the field for indexing, and hence for searching,

sorting storedis a Boolean, used for specifying if the field should be retrievable during a search,

Multivalve is used to specify if the field can accept multiple values per document.

Depending on the data and required search capability, there could be a number of factors toconsider while indexing.

2.3 Factors Affecting IndexingThe Indexing process as implemented for Mercury is comprised of the steps necessary to firstidentify the content and then construct the necessary controlling structures which will allow this

data to be processed for inclusion within a SOLR (LUCENE) index file. Disparate sources andmisapprehensions regarding content are the major issues affecting this indexing process. Impactsof these issues on accuracy and responsiveness of the search application are minimized byproper design strategy.

An XML schema/DTD/XSD commonly is constructed for and used to define the structure of adocument, with editors and parsers dependent upon its completeness and applicability. Using thisconstruct, the indexer process is built according to the defined structures and types. But, there isno universal enforcement via a standard editor or data logging mechanism. Metadata files might


12/22



originate in the lab, at a field site, or be harvested from the web. As a result, varying degrees ofadherence to the structures and content rules are found, with the standard treated as a looseguide. These variances in the XML content can result in apparent errors and confusion withinterpretations. There is a tradeoff to be made between use of strict standards for validation whichcan result in rules which result in a large rejection rate, and data volume processed.

International character sets, translation errors with http formatting, obsolete web standards andthe general evolution of the art present challenges which are best addressed in the Java parsingcode. This normalization is part of the design process for individual parser objects, and includeshandling for special or illegal characters, Character set issues (ISO 8859 variants -vs- UTF8) andforeign or extended character sets. Additionally, dealing with date format issues is a recurringproblem.

Given that the parsing operation (Extraction of the facet data) can be computationally intensive, itmakes sense to offload this process to a scheduled operation (the harvested model). Suchharvesting serves to increase the apparent speed of the application, which is a critical concern asregards usability.

Generation of a flexible schema for indexing and queries at design time allows translation to acommon data language. Having a fixed set of elements which are then populated based upon

logic customized per data source minimizes time to stand-up a new instance and enhancescontent navigation.

FIGURE 3:Metadata Indexing Flow

3. OPERATING THE METADATA

3.1 RequirementsThe goal of the Metadata is to make it easy for the users to learn, locate, and retrieve the desireddata. Metadata must also be compatible to add new fields to the metadata, provide user support


13/22


14/22



1

FIGURE 2:BestBuy using facetted results

3.3 Querying Data

Querying is the third part of the search engine. In this, user specified terms are searched andranked against the millions of pages that are recorded in the index. SOLR provides manyfunctionalities, including keyword searching, search terms highlighting, parsing of the results,

1 http://www.bestbuy.com


15/22



filtering and facet-based browsing. Mercurys GUI interacts with SOLRs functions through HTTP,and the responses are retrieved in its own generic XML representation.

FIGURE 3:SOLR search response

In the above SOLR response example, Element represent the Title of themetadata, which is of type String. Element Status, Integer type, defines the success of thequery, 0 being the successful. QTime is specified in milliseconds, it is the time taken to processthe entire query [8].

Main search responses are captured in the result element. It specifies the total number ofdocuments matched the search parameters by numFound. Offset of the returned results isspecified by start.

Parameters are passed to SOLR web application just like a simple HTTP GET form submission.User search terms are embed into the URL with some key required and other essential fields [8].

Example:

http://localhost:8080/project_solr/select/?q=soil+temperature&version=2.2&start=0&rows=10&indent=on&fl=*

/solr/ is the web application running under

apache Tomcat or other Application Server

q is the actual query string. In the above example Soil temperature is search term.Operands such as and/or/andnot can be specified as q.op

As mentioned earlier start and rows defines the offset of elements to be displayed on theresulting page. Default values start: 0 and row: 10, meaning return first ten documentsfrom the index, starting from zero position

fl is the field list, separated by commas and spaces. [*] refers to all the fields to bereturned. Field list are the list of fields to be returned in the response


16/22



3.4 Fielded SearchesMany search engines categories their searches into: Simple or quick search, advanced search,and command search. Simple search is like a free text search. In this, search terms are notlimited to any specific field; entire document is scanned for its occurrence. Example: Googlessimple search

2.

FIGURE 4:Google's simple search

Advanced search, also called as fielded search, allows the user to limit the search to certain fieldsor elements in document. This gives the users more flexibility to narrow their results. Especially in

scientific data, fielded searches can be useful as the data is extensively parameterized. Example:Users can search for Carbon Emissions only limiting its occurrence in the Title field rather thanan Abstract.

Mercurys advanced search3

allows users can do a search by Keyword, Spatial-coordinates,

temporal data-range or by data providers [9]. In keywords search, searchable fields vary byproject to project, but some common fields include: Data set Title, Project, Site, Parameter,Sensor, Source, Term, Topic, Keyword, Investigator, and Abstract. Mercury indexes all thesefields into SOLR Index.

Search Option Comments

Keyword Search Allows to search for words or phrases within a specificmetadata field, restricting the results to only a handful

Spatial Coordinate Search Restricts data to certain location or region. A area can bedefined by selecting a bounding box, this area will becompared with coordinates found in the metadata records,and information for all datasets that cover, intersect, or arecovered by the defined area will be included in the results

Temporal Search Allows to restrict data only from certain time periods

Since Indexed Fields are searchable and sortable. You also can run a SOLR query on indexedFields, which can alter the content to improve or change results. Mercury also provides data

2 http://www.google.com3 http://mercury.ornl.gov/ornldaac


17/22



sharing capabilities across data providers using Open Archive Initiatives Protocol for MetadataHandling (OAI-PMH) [10].

FIGURE 5:Mercury's advanced search

4. CONCLUSIONThe volume of scientific data, and the interconnectedness of the systems under study, makesintegration of data a necessity. Also, the file format in which we keep the data is a primary factor

for data reusability. As technology continually changes, researchers should plan for bothhardware and software obsolescence. SOLR is one of the popular OpenSource libraries forscientific data management, and is gaining recognition for its scalability and extendibility inhandling varied combinations in XML.

5. ACKNOLEDGEMENTSOak Ridge National Laboratory is managed by the UT-Battelle, LLC, for the U.S. Department ofEnergy under contract DE-AC05-00OR22725.

6. REFERENCES[1] Gray J. et al, Scientific Data Management in the Coming Decade, Technical Report,

Microsoft Research, MSR-TR-2005-10, 2005

[2] Mayernik, M.S., Metadata Realities for Cyberinfrastructure: Data Authors as MetadataCreators. In Proceedings of the iConference 2010

[3] Creative Commons (last accessed February 2011) Protocol for implementing open access


18/22



data. http://sciencecommons.org/projects/publishing/open-access-data-protocol/

[4] UC3: Data Management Guidelines, Organizing your datahttp://www.cdlib.org/services/uc3/datamanagement/organizing.html (last accessed February2011)

[5] Markus Scherer, UTF-16 FOR PROCESSING Technical Note #12 Unicode, Inc., 2004

[6] Blachman, N. and Peek, J. How Google Works Retrieved Feb 2011 from Googleguide Website: http://www.googleguide.com/google_works.html

[7] Ricardo Baeza- Yates and Bethier Ribeiro- Neto,Modern Information Retrieval ACM pp. 73 108, 1999

[8] David Smiley and Eric Pugh, Solr 1.4 Enterprise Search Server PACKT Publishing Ltd.,Birmigham, UK, ISBN 978-1-847195-88-3, pp. 89 156, 2009

[9] Devarakonda R, Palanisamy G, Wison B, Green J Mercury: reusable metadatamanagement, data discovery and access system, Earth Science Informatics, 3(1):87-94,2010

[10] Devarakonda R, Palanisamy G, Green J, Wison B, (2010) Data sharing and retrieval usingOAI-PMH, Earth Science Informatics, 4(1):1-5, 2010


19/22

INSTRUCTIONS TO CONTRIBUTORS

The International Journal of Software Engineering (IJSE) provides a forum for softwareengineering research that publish empirical results relevant to both researchers and practitioners.IJSE encourage researchers, practitioners, and developers to submit research papers reportingoriginal research results, technology trend surveys reviewing an area of research in software

engineering and knowledge engineering, survey articles surveying a broad area in softwareengineering and knowledge engineering, tool reviews and book reviews. The general topicscovered by IJSE usually involve the study on collection and analysis of data and experience thatcan be used to characterize, evaluate and reveal relationships between software developmentdeliverables, practices, and technologies. IJSE is a refereed journal that promotes the publicationof industry-relevant research, to address the significant gap between research and practice.

The initial efforts helped to shape the editorial policy and to sharpen the focus of the journal.Starting with volume 2, 2011, IJSE appears in more focused issues. Besides normal publications,IJSE intend to organized special issues on more focused topics. Each special issue will have adesignated editor (editors) either member of the editorial board or another recognized specialistin the respective field.

We are open to contributions, proposals for any topic as well as for editors and reviewers. Weunderstand that it is through the effort of volunteers that CSC Journals continues to grow andflourish.

IJSE LIST OF TOPICSThe realm of International Journal of Software Engineering (IJSE) extends, but not limited, to thefollowing:

Ambiguity in Software Development Application of Object-Oriented Technologyto Engin

Architecting an OO System for Size ClarityReuse E

Composition and Extension

Computer-Based Engineering Techniques Data Modeling Techniques

History of Software Engineering IDEF Impact of CASE on Software Development Life

Cycle Intellectual Property

Iterative Model Knowledge Engineering Methods andPractices

Licensing Modeling Languages Object-Oriented Systems Project Management Quality Management Rational Unified Processing

SDLC Software Components

Software Deployment

Software Design and applications in VariousDomain

Software Engineering Demographics Software Engineering Economics

Software Engineering Methods and Practices Software Engineering Professionalism

Software Ergonomics Software Maintenance and Evaluation Structured Analysis Structuring (Large) OO Systems

Systems Engineering Test Driven Development

UML


20/22

CALL FOR PAPERS

Volume: 2 - Issue: 3 - May 2011

i. Paper Submission: May 31, 2011 ii.Author Notification: July 01, 2011

iii. Issue Publication: July /August 2011


21/22


22/22

International Journal of Software Engineering (IJSE) Volume 2 Issue 1

Documents

Transcript of International Journal of Software Engineering (IJSE) Volume 2 Issue 1