Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine...

27
Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services [email protected]
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    2

Transcript of Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine...

Page 1: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Searching uPortal with a third party Search Engine

Katya SadovskyUniversity of California, Irvine

Administrative Computing Services

[email protected]

Page 2: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Agenda

Our goalsOur current setupBuilt-in vs. Third Party Search EngineDynamic vs. Static ContentIssues in combining uPortal with a search

engineDemonstrationQuestions & Answers

Page 3: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Our goals

Use the portal as a “gateway” to information

Allow users to search for pertinent portal content

Present users with integrated search results (portal and non-portal content)

Aid the search engine in weighing the results (meaningful page title, metadata, etc.)

Page 4: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Our current setup

uPortal 2.0.3Verity Ultraseek Search Engine (formerly

Inktomi)Tomcat 4.0.6

Page 5: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Built-in vs. Third Party Search Engine

Pros to using a built-in search engine: Ensure generation of correct links to content Present users with customized (user-specific)

result sets Ability to fully utilize channel metadata Employ portal’s authorization infrastructure

Page 6: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Built-in vs. Third Party Search Engine

Pros to using a third party search engine Well tested mature functionality Well developed dictionary and thesaurus Ability to search content beyond uPortal and

present users with integrated search results URL filtering capabilities Useful but optional: nice administrative GUI,

quick link definitions

Page 7: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Dynamic v.s. Static ContentuPortal generates dynamic content that

depends on user's preferences, security level, browser and operating system

Most search engines are designed to work with static content: Search engines index content on a periodic

basis and use cached/stored index to present user with search results

Search results are not user-specific Only public content is indexed

Page 8: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Issues/Areas of difficulty

User Agent settingFiltering out certain URLsDeciding what to search:

Search index/start page Searchable v.s. non-searchable content

Generating links to channels using: global (published) vs. instance (subscribed) ID functional names

Page title used in search results

Page 9: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

User Agent Issues:

uPortal needs to know the mapping between a user agent and a MIME type/output type

When user agent is not recognized, uPortal will display a screen allowing users to choose a profile to use

Solutions: If you know the user agent reported by the search

engine – add a mapping to the UP_USER_UA_MAP table

Choose a search engine that allows you to specify a user agent

Page 10: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Example: setting a search engine user agent

Page 11: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Filtering out certain URLs

Issues: A search engine may follow a link that includes

a channel option or command uPortal URL tags:

• Dynamically generated for each URL hit

• Tags, other than 'idempotent' make search result senseless

• While indexing content, a search engine may enter a loop referencing the same page with different tags

Page 12: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Filtering out certain URLs (cont’d)

Solutions: acquire a search engine that allows URL

filtering and filter out all “offending” URLs If available with the search engine, use

advanced URL “de-duping”

Page 13: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Example: Filtering out certain URLs

Page 14: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Example: using URL filters

Page 15: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

What to search: index/start page

Issues: A user layout may not be used as a starting

point for a search engine: a typical layout doesn't contain all the channels

Need a page with 'idempotent' links to all the searchable channels

Solutions: Searchable Channel Index channel

Page 16: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

What to search: searchable v.s. non-searchable content

Issue: not all channels needed

to be included in the search

Solution: added a 'searchable'

attribute to all the channels

Page 17: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

CSearchRegistry channel

Page 18: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

CSearchRegistry: stylesheet

Page 19: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Generating links to channelsProblem: channel instance (subscribed) IDs vary

from user to user, so the search result links are inconsistent

Solutions: link to channels using global (published) IDs -- involves code changes functional names (fname) -- this is a new functionality,

available in CVS (Concurrent Versions System)

Page 20: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Linking to channels via their published IDs: implementation plan

Modified org/jasig/portal/UserInstance.java to recognize that user is asking for a published channel that may not be in user’s layout

Create a temporary hidden folder in user’s layout to store “temporary” channels (make sure to delete this folder before layout is saved to the database)

Add XML channel definitions to this hidden folder

Proceed to render as usual

Page 21: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Page titles used in search resultsIssues:

Out of the box, uPortal has a statically set page title (no matter what channel is viewed)

Search engines generally use page titles (or other metadata) for:

• search result titles• result ranking• de-duping

Users have to be trained to enter meaningful page titles when creating documents/channels (e.g. do not start each page title with UCIrvine)

Page 22: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Page titles used in search resultsSolution:

when channels are rendered in 'focused’ or ‘detached’ mode, add channel title to the default page title (following is a fragment of webpages/stylesheets/org/jasig/portal/layout/tab-column/nested-tables/nested-

tables.xsl): <xsl:template match="layout_fragment">

... <title><xsl:value-of select="$windowTitle"/> <xsl:value-of select="concat(': ',content//channel/@description)"/> </title> ...</xsl:template>

<xsl:template match="layout">... <title><xsl:value-of select="$windowTitle"/> <xsl:if test="//focused"> <xsl:value-of select="concat(': ',//focused/channel/@description)"/> </xsl:if> </title>...</xsl:template>

Page 23: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Example: page titles

Page 24: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Conclusions

There are tradeoffs when using either a built-in or a third-party search engine

We have yet to address the following issues: searching restricted content creating META data tags to help the search

engine with content ranking

Overall, our portal project could not succeed without a search function

Page 26: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Demo

Page 27: Searching uPortal with a third party Search Engine Katya Sadovsky University of California, Irvine Administrative Computing Services katya@uci.edu.

Questions?