IAC (ACCESS INTERFACE CORPUS)
DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA
TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA)JUDITH DOMINGO (BARCELONA MEDIA)CARME COLOMINAS (UNIVERSITAT POMPEU FABRA)
UCCTS, 2010 (Omskrik)
IAC CORPORA USE: REQUIREMENTS
It’s easy to build corpus from the web but difficult to search
We need tools that allow frequency statistics, sorting results, linguistically-annotated sequences, etc.
Concordances software (MonoConc, Concordance)
Databases
Corpus query systems (ie.CQP, EMDROS)Useful but tough to learnNot useful for training as students spend too much
time to learn the query system
IAC CORPORA: SEARCHING METHODS
IAC CORPORA: INTERFACES (SEARCHING METHODS)
DISADVANTAGESLearn more than 1 interface
from the user point of viewProgramming and design
interfaces background needed (external resources)
If different attribute types are added > new design of the interface > new founding needed
Usually, more expensive than other options
ADVANTAGESUser-friendly
Not necessary training
IAC (ACCESS INTERFACE CORPUS)
Translation Department (UPF) had many corpus (changing and growing constantly)
IAC was born (developed by Barcelona Media and UPF)
GOALSMonolingual and aligned corporaFast and easy creation of interfaces for corpora One interface design for all the corpora
IAC INTERFACES
Simple : Key Words Out of Context
Advanced : Key Words In Context
Statistics: KWIC and frequency-based results
*** For corpus searching and indexation, IAC uses Corpus WorkBench (CWB) developed by IMS Stuttgart
EXAMPLES IAC
IAC CORPUS FORMAT
<metadata title = “Demo” year=“2010”>
<func=subj>
The Det sg
boy Noun sg
</func>
buysVerbsg
<func=DO>
pencils Noun pl
</func>
</metadata>
Tabular
xml for metadata
Verticalized
xml for structural data
IAC CORPORA: INSERTING A CORPUS INTO IAC
Upload the corpus (txt file) at the server
Searching interface design through a graphical tool (included in IAC) according to the corpus type and the linguistic annotation added
IAC is a flexible and powerful tool that goes beyond current corpora interfaces limitations
User-friendly toolAccess to multiple corpus from the same
platformNo need of external developer or
programming backgroundFast interface creation that can be modified
easily
IAC CONCLUSIONS
SOME EXAMPLES…
ADVANCED SEARCH
To show the advanced search, we use an annotated corpus with translation.
Let's look at examples of sequences with 1 or more words with syntax errors.
ADVANCED SEARCH
ADVANCED SEARCH
ALIGNED CORPORA WITH METADATA
As example of aligned corpora, a Spanish > English corpus
Can
Could
May
Might
Poder (verb)
Our goal is to get examples of poder (Verb) translated as may or might in Economics texts.
ALIGNED CORPORA WITH METADATA
ALIGNED CORPORA WITH METADATA
STATISTICS
Statistics are useful to get quantitative results of sequences. Our goal in this case is to get quantitative results of the prepositions that follow the verb pensar (to think) in Spanish
STATISTICS
STATISTICS
Back
Top Related