UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline...

21
UIMA SHARP 4 - NLP May 25, 2010

Transcript of UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline...

Page 1: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

UIMA

SHARP 4 - NLP

May 25, 2010

Page 2: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Outline

• UIMA Terminology (not just TLAs)

• Parts of a UIMA pipeline

• Running a pipeline

• Viewing annotations

• Creating a new annotator

Page 3: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

UIMA terminology

• CAS XCAS JCAS View

• Analysis Engine (AE) / Annotator– Aggregate Analysis Engine

• XML output: XCAS XMI

• Type System JCasGen

• CAS Visual Debugger (CVD)

• CPE (Collection Processing Engine)

Page 4: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

UIMA and Eclipse

• UIMA plugin for Eclipse requires EMF

• UIMA plugin provides visual editors for descriptors

• An “Update site” exists for installing plugin

Page 5: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

UIMA Pipeline Flow

• Collection Reader• (CAS Initializer - deprecated)

• Analysis Engine (AE) / Annotator

• CAS Consumer

Page 6: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Pipeline Example

Example

Read files from a dir

Sentence annotator

Tokenizer annotator

Output tokens to a DB

UIMA term

Collection Reader

Analysis Engine

Analysis Engine

CAS Consumer

Page 7: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Options for running UIMA tools

• Tools:

– CPE Configurator

– CVD

• Options:

– Command line scripts/.bat files

– Run within Eclipse

Page 8: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Tying together a UIMA pipeline

• Type System

– Defines the data types passed along

• CAS (Common Analysis Structure)

– Container for the data

Page 9: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Tying together a UIMA pipeline

• CPE descriptor – select the parts– Collection Reader

– Analysis Engine(s)

– CAS Consumer

• Aggregate analysis engine– Multiple Analysis Engines and their order

Page 10: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Options for running a pipeline

• CVD GUI– Single Aggregate Analysis Engine

– No Collection Reader

• CPE GUI

• Instantiate a CpeDescription and invoke the process() method2.3. Running a CPE from Your Own Java Application

Page 11: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Example: Running a pipeline

Running cTAKES within Eclipse using a CPE

Use run configuration

UIMA_CPE_GUI--clinical_documents_pipeline

CPE

test1.xml

from clinical documents pipeline\desc\collection_processing_engine

Page 12: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Options for viewing annotations

• CVD

• Annotation viewer

• XML viewer

• Text editor

Page 13: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Example: Viewing annotations

Viewing annotations using the CVD

• Load the Type System• Load the XCAS or XMI

Page 14: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Example: Running an AE in CVD

Using CVD to run an Analysis Engine– No Collection Reader– Single Analysis Engine (can be an aggregate)– No CAS Consumer

– Just paste/type in text to processFamily history of hyperlipidemia.

Page 15: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Creating a New Annotator

• Create Java project

• Right click -> Add UIMA Nature

• Add UIMA jars to .classpath (Build Path)

• Create Analysis Engine (AE) descriptor

• Add types to AE descriptor, or optionally create separate Type System descriptor

• Write code!

Page 16: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Questions?

Page 17: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Supplemental slides follow

Page 18: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Example: Creating a PEAR file

• Right click -> Add UIMA Nature

• Right click -> Generate Pear

• Select Analysis Engine descriptor

• Select OS and JDK

• Modify Properties if needed

• Select what to include

Page 19: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Example: Modifying a parameter

UIMA’s descriptor editors allow you to modify most parameters without looking at the XML itself.

Page 20: UIMA SHARP 4 - NLP May 25, 2010. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations Creating a new.

Links

• Getting started with UIMAhttp://uima.apache.org/doc-uima-annotator.html

• UIMA Update site for use in Eclipsehttp://www.apache.org/dist/incubator/uima/eclipse-update-site/