International Journal of Software Engineering (IJSE) Volume 2 Issue 1

download International Journal of Software Engineering (IJSE) Volume 2 Issue 1

of 22

Transcript of International Journal of Software Engineering (IJSE) Volume 2 Issue 1

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    1/22

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    2/22

    INTERNATIONAL JOURNAL OF SOFTWARE

    ENGINEERING (IJSE)

    VOLUME 2, ISSUE 1, 2011

    EDITED BYDR. NABEEL TAHIR

    ISSN (Online): 2180-1320

    IInternational Journal of Software Engineering (IJSE) is published both in traditional paper formand in Internet. This journal is published at the website http://www.cscjournals.org, maintained by

    Computer Science Journals (CSC Journals), Malaysia.

    IJSE Journal is a part of CSC Publishers

    Computer Science Journals

    http://www.cscjournals.org

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    3/22

    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING

    (IJSE)

    Book: Volume 2, Issue 1, March 2011

    Publishing Date: 04-04-2011ISSN (Online): 2180-1320

    This work is subjected to copyright. All rights are reserved whether the whole or

    part of the material is concerned, specifically the rights of translation, reprinting,

    re-use of illusions, recitation, broadcasting, reproduction on microfilms or in any

    other way, and storage in data banks. Duplication of this publication of parts

    thereof is permitted only under the provision of the copyright law 1965, in its

    current version, and permission of use must always be obtained from CSC

    Publishers.

    IJSE Journal is a part of CSC Publishers

    http://www.cscjournals.org

    IJSE Journal

    Published in Malaysia

    Typesetting: Camera-ready by author, data conversation by CSC Publishing Services CSC Journals,

    Malaysia

    CSC Publishers, 2011

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    4/22

    EDITORIAL PREFACE

    The International Journal of Software Engineering (IJSE) provides a forum for softwareengineering research that publishes empirical results relevant to both researchers andpractitioners. It is the fourth issue of First volume of IJSE and it is published bi-monthly, withpapers being peer reviewed to high international standards.

    The initial efforts helped to shape the editorial policy and to sharpen the focus of the journal.Starting with volume 2, 2011, IJSE appears in more focused issues. Besides normal publications,IJSE intend to organized special issues on more focused topics. Each special issue will have adesignated editor (editors) either member of the editorial board or another recognized specialistin the respective field.

    IJSE encourage researchers, practitioners, and developers to submit research papers reportingoriginal research results, technology trend surveys reviewing an area of research in softwareengineering, software science, theoretical software engineering, computational intelligence, andknowledge engineering, survey articles surveying a broad area in software engineering andknowledge engineering, tool reviews and book reviews. Some important topics covered by IJSEusually involve the study on collection and analysis of data and experience that can be used to

    characterize, evaluate and reveal relationships between software development deliverables,practices, and technologies. IJSE is a refereed journal that promotes the publication of industry-relevant research, to address the significant gap between research and practice.

    IJSE give the opportunity to researchers and practitioners for presenting their research,technological advances, practical problems and concerns to the software engineering.. IJSE isnot limited to a specific aspect of software engineering it cover all Software engineering topics. Inorder to position IJSE amongst the most high quality journal on computer engineering sciences, agroup of highly professional scholars are serving on the editorial board. IJSE include empiricalstudies, requirement engineering, software architecture, software testing, formal methods, andverification.

    International Editorial Board ensures that significant developments in software engineering from

    around the world are reflected in IJSE. The submission and publication process of manuscriptdone by efficient way. Readers of the IJSE will benefit from the papers presented in this issue inorder to aware the recent advances in the Software engineering. International Electronic editorialand reviewer system allows for the fast publication of accepted manuscripts into issue publicationof IJSE. Because we know how important it is for authors to have their work published with aminimum delay after submission of their manuscript. For that reason we continue to strive for fastdecision times and minimum delays in the publication processes. Papers are indexed &abstracted with International indexers & abstractors.

    Editorial Board MembersInternational Journal of Software Engineering (IJSE)

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    5/22

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    6/22

    International Journal of Software Engineering (IJSE), Volume (2), Issue (1)

    TABLE OF CONTENTS

    Volume 2, Issue 1, March 2011

    Pages

    1 - 12 Next-Generation Search Engines for Information Retrieval

    Ranjeet Devarakonda, Les Hook, Giri Palanisamy, Jim Green

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    7/22

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    8/22

    Ranjeet Devarakonda, Les Hook, Giri Palanisamy & Jim Green

    International Journal of Software Engineering (IJSE), Volume (2) : Issue (1) : 2011 2

    1. INTRODUCTIONScientific data, in its most general context across multiple disciplines, includes measurementsand observations of natural phenomena for the purpose of explaining the behavior of or testinghypotheses about the natural systems. This includes observational data captured in real-time bysensors, surveys, and imaging devices; experimental data from laboratory instruments, forexample, gene sequences, chromatograms, and characterization of samples; simulation data

    generated from models where the model is equally important with the input and output data suchas for climate and economic models; and derived or compiled data that are the result of text anddata mining, and compiled and integrated databases from multiple sources.

    Remote sensing and high throughput scientific instruments and high spatial and temporalresolution model simulations are creating vast data repositories and as a result of increasedinvestments in engineering research, disease research , and environmental monitoring, thevolume of scientific data is approximately doubling each year [1]. Data volume requires newscientific methods to analyze and organize the data for fully exploiting the value of the data andbeyond the original collection purpose.

    The data content can be collected in any format, most is digital by necessity and to be of greatestvalue for analyses. Instrument platform and science program specific formats are oftenintegrated into the data processing stream. The data can be formatted as straightforward tabularASCII files, spatially gridded files, Geotiff image formats or formatted to be consistent withprogrammatic repository needs such as Genomes-to-Life.

    What is metadata and why is it so useful?

    The digital data are not complete without descriptive information (meta) data to tell us what itmeans. Metadata are the descriptive information about data that explains the measuredattributes, their names, units, precision, accuracy, data layout and ideally a great deal more. Mostimportantly, metadata includes the data lineage that describes how the data was measured,acquired, or computed [2]. Metadata in its simplest form, are those elements that conciselyidentify the "who, what, where, when, why, and how" of the data.

    Metadata enables data sharing and access. Sharing data with colleagues the broader scientific

    community and public is highly desirable and will result in greater advancement of science [1].

    We know data sharing is a good thing. Doesn't everyone else? Open science and new research Data longevity Data reusability Greater exposure to data Generation of Value added products Verification of published works Possibility for future research collaborations More value for the research investment

    The volume of scientific data, and the interconnectedness of the systems under study, makesintegration of data a necessity. For example, life scientists must integrate data from acrossbiology and chemistry to comprehend disease and discover cures, and climate change scientistsmust integrate data from wildly diverse disciplines to understand our current state and predict theimpact of new policies [3].

    It is critical to begin to document your data at the very beginning of your research project, evenbefore data collection begins; doing so will make data documentation easier and reduce thelikelihood that you will forget aspects of your data later in the research project.

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    9/22

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    10/22

    Ranjeet Devarakonda, Les Hook, Giri Palanisamy & Jim Green

    International Journal of Software Engineering (IJSE), Volume (2) : Issue (1) : 2011 4

    FIGURE 1:Combination of Spatial Coordinates for indexing

    The compact logic available in XPATH allows us to construct the following, which would extractthe minimum value from the embedded array of lattitude values:

    = ../preceding-sibling::Point/PointLatitude)and not(. >=../following-sibling::Point/PointLatitude)]/>Assignment of this as an indexed (searchable) value is then possible, with the result that we canthen provide exact or ranged searches within the collections. This is just one example of the typeof operation possible with the combination of types defined in the compiled code and the use ofinjection to provide new behavior at run-time. This methodology allows for growth not just of thesingle use index, but extension of the tool to associated projects via use of multiple schemadefinitions allocated on a per source basis to index and serve up related content.

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    11/22

    Ranjeet Devarakonda, Les Hook, Giri Palanisamy & Jim Green

    International Journal of Software Engineering (IJSE), Volume (2) : Issue (1) : 2011 5

    2.2 Indexing MetadataIts not uncommon to have millions of pages to index in a corporate and government circles.Normally, search engine creates the index and the database of documents that it accesses whenprocessing a user query. Indexer engine sorts every word on every page and stores the resultingindex in a huge database [6]. Example: Google. Query processor then compares the search

    query against the index and returns the documents that it considers most relevant.

    In Modern search indexers like Apache SOLR, Users can pass a number of optional Queryparameters to the request handler to control what information is returned. Solrconfig.xml containsthe default parameters values. Default parameters can be overridden during the request query-time. Schema.xml specifies all the fields that can be indexed from the metadata documents andhow those fields should be dealt with when adding documents to the index or querying thosefields (SEE figure below).

    FIGURE 2:SOLR Fields

    Common SOLR Fields:

    field nameis the actual name of the file in the SOLR index typeis the data structure type the field is stored in indexedis a Boolean type to enable the field for indexing, and hence for searching,

    sorting storedis a Boolean, used for specifying if the field should be retrievable during a search,

    Multivalve is used to specify if the field can accept multiple values per document.

    Depending on the data and required search capability, there could be a number of factors toconsider while indexing.

    2.3 Factors Affecting IndexingThe Indexing process as implemented for Mercury is comprised of the steps necessary to firstidentify the content and then construct the necessary controlling structures which will allow this

    data to be processed for inclusion within a SOLR (LUCENE) index file. Disparate sources andmisapprehensions regarding content are the major issues affecting this indexing process. Impactsof these issues on accuracy and responsiveness of the search application are minimized byproper design strategy.

    An XML schema/DTD/XSD commonly is constructed for and used to define the structure of adocument, with editors and parsers dependent upon its completeness and applicability. Using thisconstruct, the indexer process is built according to the defined structures and types. But, there isno universal enforcement via a standard editor or data logging mechanism. Metadata files might

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    12/22

    Ranjeet Devarakonda, Les Hook, Giri Palanisamy & Jim Green

    International Journal of Software Engineering (IJSE), Volume (2) : Issue (1) : 2011 6

    originate in the lab, at a field site, or be harvested from the web. As a result, varying degrees ofadherence to the structures and content rules are found, with the standard treated as a looseguide. These variances in the XML content can result in apparent errors and confusion withinterpretations. There is a tradeoff to be made between use of strict standards for validation whichcan result in rules which result in a large rejection rate, and data volume processed.

    International character sets, translation errors with http formatting, obsolete web standards andthe general evolution of the art present challenges which are best addressed in the Java parsingcode. This normalization is part of the design process for individual parser objects, and includeshandling for special or illegal characters, Character set issues (ISO 8859 variants -vs- UTF8) andforeign or extended character sets. Additionally, dealing with date format issues is a recurringproblem.

    Given that the parsing operation (Extraction of the facet data) can be computationally intensive, itmakes sense to offload this process to a scheduled operation (the harvested model). Suchharvesting serves to increase the apparent speed of the application, which is a critical concern asregards usability.

    Generation of a flexible schema for indexing and queries at design time allows translation to acommon data language. Having a fixed set of elements which are then populated based upon

    logic customized per data source minimizes time to stand-up a new instance and enhancescontent navigation.

    FIGURE 3:Metadata Indexing Flow

    3. OPERATING THE METADATA

    3.1 RequirementsThe goal of the Metadata is to make it easy for the users to learn, locate, and retrieve the desireddata. Metadata must also be compatible to add new fields to the metadata, provide user support

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    13/22

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    14/22

    Ranjeet Devarakonda, Les Hook, Giri Palanisamy & Jim Green

    International Journal of Software Engineering (IJSE), Volume (2) : Issue (1) : 2011 8

    1

    FIGURE 2:BestBuy using facetted results

    3.3 Querying Data

    Querying is the third part of the search engine. In this, user specified terms are searched andranked against the millions of pages that are recorded in the index. SOLR provides manyfunctionalities, including keyword searching, search terms highlighting, parsing of the results,

    1 http://www.bestbuy.com

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    15/22

    Ranjeet Devarakonda, Les Hook, Giri Palanisamy & Jim Green

    International Journal of Software Engineering (IJSE), Volume (2) : Issue (1) : 2011 9

    filtering and facet-based browsing. Mercurys GUI interacts with SOLRs functions through HTTP,and the responses are retrieved in its own generic XML representation.

    FIGURE 3:SOLR search response

    In the above SOLR response example, Element represent the Title of themetadata, which is of type String. Element Status, Integer type, defines the success of thequery, 0 being the successful. QTime is specified in milliseconds, it is the time taken to processthe entire query [8].

    Main search responses are captured in the result element. It specifies the total number ofdocuments matched the search parameters by numFound. Offset of the returned results isspecified by start.

    Parameters are passed to SOLR web application just like a simple HTTP GET form submission.User search terms are embed into the URL with some key required and other essential fields [8].

    Example:

    http://localhost:8080/project_solr/select/?q=soil+temperature&version=2.2&start=0&rows=10&indent=on&fl=*

    /solr/ is the web application running under

    apache Tomcat or other Application Server

    q is the actual query string. In the above example Soil temperature is search term.Operands such as and/or/andnot can be specified as q.op

    As mentioned earlier start and rows defines the offset of elements to be displayed on theresulting page. Default values start: 0 and row: 10, meaning return first ten documentsfrom the index, starting from zero position

    fl is the field list, separated by commas and spaces. [*] refers to all the fields to bereturned. Field list are the list of fields to be returned in the response

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    16/22

    Ranjeet Devarakonda, Les Hook, Giri Palanisamy & Jim Green

    International Journal of Software Engineering (IJSE), Volume (2) : Issue (1) : 2011 10

    3.4 Fielded SearchesMany search engines categories their searches into: Simple or quick search, advanced search,and command search. Simple search is like a free text search. In this, search terms are notlimited to any specific field; entire document is scanned for its occurrence. Example: Googlessimple search

    2.

    FIGURE 4:Google's simple search

    Advanced search, also called as fielded search, allows the user to limit the search to certain fieldsor elements in document. This gives the users more flexibility to narrow their results. Especially in

    scientific data, fielded searches can be useful as the data is extensively parameterized. Example:Users can search for Carbon Emissions only limiting its occurrence in the Title field rather thanan Abstract.

    Mercurys advanced search3

    allows users can do a search by Keyword, Spatial-coordinates,

    temporal data-range or by data providers [9]. In keywords search, searchable fields vary byproject to project, but some common fields include: Data set Title, Project, Site, Parameter,Sensor, Source, Term, Topic, Keyword, Investigator, and Abstract. Mercury indexes all thesefields into SOLR Index.

    Search Option Comments

    Keyword Search Allows to search for words or phrases within a specificmetadata field, restricting the results to only a handful

    Spatial Coordinate Search Restricts data to certain location or region. A area can bedefined by selecting a bounding box, this area will becompared with coordinates found in the metadata records,and information for all datasets that cover, intersect, or arecovered by the defined area will be included in the results

    Temporal Search Allows to restrict data only from certain time periods

    Since Indexed Fields are searchable and sortable. You also can run a SOLR query on indexedFields, which can alter the content to improve or change results. Mercury also provides data

    2 http://www.google.com3 http://mercury.ornl.gov/ornldaac

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    17/22

    Ranjeet Devarakonda, Les Hook, Giri Palanisamy & Jim Green

    International Journal of Software Engineering (IJSE), Volume (2) : Issue (1) : 2011 11

    sharing capabilities across data providers using Open Archive Initiatives Protocol for MetadataHandling (OAI-PMH) [10].

    FIGURE 5:Mercury's advanced search

    4. CONCLUSIONThe volume of scientific data, and the interconnectedness of the systems under study, makesintegration of data a necessity. Also, the file format in which we keep the data is a primary factor

    for data reusability. As technology continually changes, researchers should plan for bothhardware and software obsolescence. SOLR is one of the popular OpenSource libraries forscientific data management, and is gaining recognition for its scalability and extendibility inhandling varied combinations in XML.

    5. ACKNOLEDGEMENTSOak Ridge National Laboratory is managed by the UT-Battelle, LLC, for the U.S. Department ofEnergy under contract DE-AC05-00OR22725.

    6. REFERENCES[1] Gray J. et al, Scientific Data Management in the Coming Decade, Technical Report,

    Microsoft Research, MSR-TR-2005-10, 2005

    [2] Mayernik, M.S., Metadata Realities for Cyberinfrastructure: Data Authors as MetadataCreators. In Proceedings of the iConference 2010

    [3] Creative Commons (last accessed February 2011) Protocol for implementing open access

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    18/22

    Ranjeet Devarakonda, Les Hook, Giri Palanisamy & Jim Green

    International Journal of Software Engineering (IJSE), Volume (2) : Issue (1) : 2011 12

    data. http://sciencecommons.org/projects/publishing/open-access-data-protocol/

    [4] UC3: Data Management Guidelines, Organizing your datahttp://www.cdlib.org/services/uc3/datamanagement/organizing.html (last accessed February2011)

    [5] Markus Scherer, UTF-16 FOR PROCESSING Technical Note #12 Unicode, Inc., 2004

    [6] Blachman, N. and Peek, J. How Google Works Retrieved Feb 2011 from Googleguide Website: http://www.googleguide.com/google_works.html

    [7] Ricardo Baeza- Yates and Bethier Ribeiro- Neto,Modern Information Retrieval ACM pp. 73 108, 1999

    [8] David Smiley and Eric Pugh, Solr 1.4 Enterprise Search Server PACKT Publishing Ltd.,Birmigham, UK, ISBN 978-1-847195-88-3, pp. 89 156, 2009

    [9] Devarakonda R, Palanisamy G, Wison B, Green J Mercury: reusable metadatamanagement, data discovery and access system, Earth Science Informatics, 3(1):87-94,2010

    [10] Devarakonda R, Palanisamy G, Green J, Wison B, (2010) Data sharing and retrieval usingOAI-PMH, Earth Science Informatics, 4(1):1-5, 2010

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    19/22

    INSTRUCTIONS TO CONTRIBUTORS

    The International Journal of Software Engineering (IJSE) provides a forum for softwareengineering research that publish empirical results relevant to both researchers and practitioners.IJSE encourage researchers, practitioners, and developers to submit research papers reportingoriginal research results, technology trend surveys reviewing an area of research in software

    engineering and knowledge engineering, survey articles surveying a broad area in softwareengineering and knowledge engineering, tool reviews and book reviews. The general topicscovered by IJSE usually involve the study on collection and analysis of data and experience thatcan be used to characterize, evaluate and reveal relationships between software developmentdeliverables, practices, and technologies. IJSE is a refereed journal that promotes the publicationof industry-relevant research, to address the significant gap between research and practice.

    The initial efforts helped to shape the editorial policy and to sharpen the focus of the journal.Starting with volume 2, 2011, IJSE appears in more focused issues. Besides normal publications,IJSE intend to organized special issues on more focused topics. Each special issue will have adesignated editor (editors) either member of the editorial board or another recognized specialistin the respective field.

    We are open to contributions, proposals for any topic as well as for editors and reviewers. Weunderstand that it is through the effort of volunteers that CSC Journals continues to grow andflourish.

    IJSE LIST OF TOPICSThe realm of International Journal of Software Engineering (IJSE) extends, but not limited, to thefollowing:

    Ambiguity in Software Development Application of Object-Oriented Technologyto Engin

    Architecting an OO System for Size ClarityReuse E

    Composition and Extension

    Computer-Based Engineering Techniques Data Modeling Techniques

    History of Software Engineering IDEF Impact of CASE on Software Development Life

    Cycle Intellectual Property

    Iterative Model Knowledge Engineering Methods andPractices

    Licensing Modeling Languages Object-Oriented Systems Project Management Quality Management Rational Unified Processing

    SDLC Software Components

    Software Deployment

    Software Design and applications in VariousDomain

    Software Engineering Demographics Software Engineering Economics

    Software Engineering Methods and Practices Software Engineering Professionalism

    Software Ergonomics Software Maintenance and Evaluation Structured Analysis Structuring (Large) OO Systems

    Systems Engineering Test Driven Development

    UML

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    20/22

    CALL FOR PAPERS

    Volume: 2 - Issue: 3 - May 2011

    i. Paper Submission: May 31, 2011 ii.Author Notification: July 01, 2011

    iii. Issue Publication: July /August 2011

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    21/22

  • 8/6/2019 International Journal of Software Engineering (IJSE) Volume 2 Issue 1

    22/22