ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and...
-
Upload
hester-daniel -
Category
Documents
-
view
215 -
download
0
Transcript of ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and...
![Page 1: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/1.jpg)
ONDUXONDUXOn-Demand Unsupervised Learning On-Demand Unsupervised Learning
for Information Extractionfor Information Extraction
Eli Cortez, Altigran da Silva and Edleno de Moura
Federal University of Amazonas (UFAM) - BRAZIL
Marcos GonçalvesFederal University of Minas Gerais (UFMG) - BRAZIL
UFMG
![Page 2: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/2.jpg)
AgendaAgenda
Introduction
Information Extraction by Text
Segmentation
◦ Challenges
Related Work
ONDUX
Experiments
Conclusions and Future Work
![Page 3: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/3.jpg)
Introduction (1)Introduction (1)Abundance of on-line sources of
text documents containing implicit semi-structured data records
Addresses Bibliographic References Classified Ads Product Descriptions
![Page 4: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/4.jpg)
Introduction (1I)Introduction (1I)
Regent Square $228,900 1028 Mifflin Ave.; 6 Bedrooms; 2 Bathrooms. 412-638-7273
Classified Ad
Dr. Robert A. Jacobson, 8109 Harford Road, Baltimore, MD 21214
Address
Pável Calado, Marco Cristo, Marcos André Gonçalves, Edleno S. de Moura, Berthier Ribeiro-Neto, Nivio Ziviani. Link-based
similarity measures for the classication of Web documents. JASIST, v. 57 n.2, p. 208-221,
January 2006
Bibliographic Reference
![Page 5: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/5.jpg)
Introduction (III)Introduction (III)Why extracting information?
Database Storage, Query… Data Mining Record Linkage.
Regent Square
$228,900 1028 Mifflin
Ave.; 6 Bedrooms; 2
Bathrooms. 412-638-
7273
Classified Ad
<Neighboorhood> :
Regent Square
<Price> :
$228,900
<No.> : 1028
<Street> :
Mifflin Ave,
<Bed.> : 6 Bedrooms
<Bath..> : 2
Bathrooms
<Phone> : 412-
638-7273
![Page 6: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/6.jpg)
IETS – Challenges(I)IETS – Challenges(I)Information Extraction by Text
Segmentation (IETS)
◦ Borkar@SIGMOD'01, McCallum@ICML'01,
Agichtein@SIGKDD'04, Mansuri@ICDE'06,
Zhao@SICDM'08, Cortez@JASIST'09
Diversity of templates and styles Attribute Ordering Capitalization Abbreviations.
Different applications share similar domains Ex.: Address and Ads
Records from both domains contain address information
![Page 7: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/7.jpg)
IETS – Challenges(II)IETS – Challenges(II)Diversity of templates and styles
Attribute Ordering; Capitalization; Abbreviations.
HomePage
DBLP
ACM
Link-based similarity measures for the classication of Web documents. Pável Calado. Journal of the American Society for the Information Science and Technology – 57(2) 2006
Pável Calado, Marco Cristo, Marcos André Gonçalves, Edleno Silva de Moura, Berthier A. Ribeiro-Neto, Nivio Ziviani. Link-based similarity measures for the classication of Web documents. JASIST 57 (2) 208-221(2006)
Pável Calado, Marco Cristo, Marcos André Gonçalves, Edleno S. de Moura, Berthier Ribeiro-Neto, Nivio Ziviani. Link-based similarity measures for the classication of Web documents. JASIST, v. 57 n.2, p. 208-221, January 2006
![Page 8: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/8.jpg)
Existing approaches deal with this problem use Machine Learning techniques
Hidden Markov Models (HMM) Conditional Random Fields (CRF) Structured Support Vector Machines
(SSVM)• (semi) Supervised approaches require a hand-
labeled training set created by an expert.
• Each generated model is particular to a given
application
• High computational cost
IETS – Challenges(III)IETS – Challenges(III)
![Page 9: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/9.jpg)
Related WorkRelated Work [Borkar et. al @ SIGMOD 2001]
◦ Supervised extraction method based on Hidden Markov Models (HMM)
[McCallum et. al @ ICML 2001]◦ Proposed the usage of Conditional Random Fields
(CRF), a supervised model – (S-CRF)
[Mansuri et. al @ ICDE 2006]◦ Semi-supervised approach based on CRF models
All of these approaches require an expert to create a hand-labeled training set for each application.
![Page 10: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/10.jpg)
Related Work (II)Related Work (II) [Agichtein et. al @ SIGKDD 2004]
◦ Usage of Reference Tables to create an unsupervised model using Hidden Markov Models (HMM)
[Zhao et. al @ SIAM ICDM 2008]◦ Usage of reference tables to create unsupervised
CRF models - (U-CRF)
[Cortez et. al @ JASIST 2009]◦ Unsupervised method to extract bibliographic
information Domain-specific heuristics, not general application.
Both models assume single positioning and ordering of attributes in all test instances. (Distinct Orderings ?)
![Page 11: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/11.jpg)
ContributionsContributionsProposal of extraction method based on
information retrieval to perform IETS tasks;
◦ Eliminate the need of a user involved in any source specific training process;
◦ Flexible in the sense that do not rely on any particular style to perform the extraction
◦ Unsupervised Reinforcement Phase Attribute ordering and positioning learned On-Demand
Experimental comparison with the state-of-art information extraction approach (CRF).
![Page 12: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/12.jpg)
Basic Concepts(1)Basic Concepts(1)
Given an input string I representing an implicit textual record (e.g. classified ad), the IETS task consists in:
1.Segmenting
2.Assigning to each segment a label corresponding to an attribute a
I
![Page 13: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/13.jpg)
Basic Concepts(I1)Basic Concepts(I1)
Knowledge Base◦Set of pairs KB =◦Easily built from pre-existing sources
◦ Bibliographic DBs, Freebase, Google Fusion Tables, etc.
)},(),...,,{( 11 nn OmOm
KB= { (Neighboorhhod, O ), (Street, O ), (Phone, O )}
O = { “Regent Square”, “Milenight Park”}
O = { “Regent St.”, “Morewood Ave.”, “Square Ave. Park”}
O = { “323 462-6252”, “(171) 289-7527”}
Neigh. Street
Neigh.
Street
Phone
Phone
![Page 14: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/14.jpg)
ONDUX (I)ONDUX (I)Three main steps
◦Blocking
◦Matching
◦Reinforcement
![Page 15: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/15.jpg)
ONDUX (II)ONDUX (II)General View
1
![Page 16: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/16.jpg)
ONDUX (III)ONDUX (III)Blocking
◦ Split the input text in substrings called blocks;
◦ Consider the co-occurrence of consecutive terms based on the KB
Regent Square $228,900 1028 Mifflin Ave.;
6 Bedrooms; 2 Bathrooms. 412-638-7273
Co-occur in the KB
(Neighborhood)
Left separated (no presence in the KB)
![Page 17: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/17.jpg)
ONDUX (IV)ONDUX (IV)General View
12
![Page 18: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/18.jpg)
ONDUX (V)ONDUX (V)Matching
◦Associate each block generated in the previous phase with an attribute according to the Knowledge Base
◦Use distinct functions to compute the similarity between a block and the know values of the attributes in in the KB
![Page 19: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/19.jpg)
ONDUX (VI)ONDUX (VI)Matching
Textual Values: FF Function (Field Frequency) Similarity between the terms on the block and the terms of a given attribute of the KB
Numeric Values : NM Function (Numeric Matching) [Agrawal @ CIDR 2003] Similarity between the value on the block, the mean and the standard deviation of a numeric attribute in the KB
![Page 20: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/20.jpg)
ONDUX (VI)ONDUX (VI)Matching
Regent Square $228,900 1028 Mifflin Ave.;
6 Bedrooms; 2 Bathrooms. 412-638-7273
Street Price No. ??? Street
Bed. Bath. Phone
![Page 21: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/21.jpg)
ONDUX (VII)ONDUX (VII)How can we deal with blocks that
were incorrectly labeled or were not associated to any attribute?
Regent Square $228,900 1028 Mifflin Ave.;
6 Bedrooms; 2 Bathrooms. 412-638-7273
Street Price No. ??? Street
Bed. Bath. Phone
![Page 22: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/22.jpg)
ONDUX (VIII)ONDUX (VIII)Reinforcement
◦ Review the labeling task performed in the Matching step
Unmatched blocks must receive a label of a given attribute
Mismatched blocks must be correctly labeled
◦How to handle these cases? Using positioning and sequencing
information that are obtained On-Demand.
![Page 23: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/23.jpg)
ONDUX (IX)ONDUX (IX)General View
2
3
![Page 24: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/24.jpg)
ONDUX (X)ONDUX (X)Reinforcement
◦ Given the extraction output of the matching step ONDUX automatically build a
graphical structure, the PSM.
PSM: Positioning and Sequencing Model.
![Page 25: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/25.jpg)
ONDUX (XI)ONDUX (XI)Reinforcement – PSM
Ordering and Positioning Probabilities are learned On-Demand based on the test instances trough the
Matching Phase
In the PSM, each state represents
attributes of the KB plus special states
start and endEdges representtransition probabilities
![Page 26: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/26.jpg)
ONDUX (XII)ONDUX (XII)Reinforcement
◦Remarks The PSM is automatically learned On-
Demand from test instances No a priori training required No assumptions regarding a particular order of
attribute values Relies on the very effective strategies deployed
in the Matching Step
![Page 27: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/27.jpg)
ONDUX (XIII)ONDUX (XIII)Reinforcement
◦Once the PSM is built, we combine the matching, positioning and sequencing evidences using the Bayesian operator OR.
))1()1()),(1((1),( ,, kiijii ptaBMaBFS
Matching Sequence Positioning
![Page 28: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/28.jpg)
ONDUX (XIV)ONDUX (XIV)Reinforcement
◦Extraction Result
Regent Square $228,900 1028 Mifflin Ave.;
6 Bedrooms; 2 Bathrooms. 412-638-7273
Price No.
Bed. Bath. Phone
Street
???
Neighborhood
Street
Street
![Page 29: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/29.jpg)
ONDUX (XV)ONDUX (XV)Overview
3
12
![Page 30: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/30.jpg)
Experiments (1)Experiments (1)Setup
◦We tested our proposed approach with several sources from 3 distinct domains: Addresses
BigBook, Restaurants [RISE] Bibilographic Data
CORA [Peng@IPM’ 06], PersonalBib [Mansuri@ICDE’ 06] Classified Ads
7 distinct newspaper sites[Oliveira@SBBD’ 06]
◦We limited the presentation to one experiment per domain. More on the paper
![Page 31: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/31.jpg)
Experiments (II)Experiments (II)Evaluation
◦Metrics Precision, Recall and F-Measure
T-Test for the statistical validation of the results
◦Baselines Conditional Random Fields (CRF)
U-CRF (Unsupervised method) [Zhao@SICDM’ 08]
S-CRF (Classical supervised method) [Peng@IPM’ 06]
![Page 32: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/32.jpg)
Experiments (III)Experiments (III)Extraction Quality
U-CRF results similar to Zhao@SICDM (validation)
Dataset follows the single order assumption
After Reinforcement ONDUX achieved similar quality
![Page 33: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/33.jpg)
Experiments (IV)Experiments (IV)Extraction Quality
S-CRF achieved results higher than U-CRF due to the hand-labeled training
CORA includes a variety of citation styles (conference, journal, books, etc,)
In general, ONDUX outperformed CRF models
![Page 34: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/34.jpg)
Experiments (V)Experiments (V)Extraction Quality
Due to the Matching Phase and the PSM that is learned On-Demand, ONDUX achieve very high quality results
U-CRF presented a poor performance (very heterogeneous dataset)
![Page 35: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/35.jpg)
Experiments (VI)Experiments (VI)Varying the number of terms common
to test instances and the KB
◦Determine how dependent the quality of results is from the overlap between the previously known data and the text input.
These experiments were conducted with the BigBook dataset.
![Page 36: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/36.jpg)
Experiments (VII)Experiments (VII)Varying the number of shared terms
Even presenting a poor quality in the Matching Phase, the PSM is able to increase ONDUX’s quality in the Reinforcement Step
Starting with a batch of 500 input strings, after having an overlap of 500 terms, ONDUX achieved high quality results
![Page 37: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/37.jpg)
Experiments (VIII)Experiments (VIII)Varying the number of shared terms
As the number of shared terms increases, the best quality the Mathching phase achieves
![Page 38: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/38.jpg)
Conclusions andConclusions andFuture Work (I)Future Work (I)New approach for information
extraction independent of the style of the data records
ONDUX◦ Flexible: Do not consider any particular style◦ Unsupervised: Do not require any human
effort to create a training set◦ On-Demand: Ordering and Positioning
Information are learned trough the Matching Phase
![Page 39: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/39.jpg)
Proposed strategy achieve good results of precision and recall◦Small size of the Knowledge Base◦Comparison with the state-of-art
As a Future Work◦Investigate different matching
functions;◦Nested structures?
Conclusions and Conclusions and Future Work (II)Future Work (II)
![Page 40: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/40.jpg)
Acknowledgements
UFMG
![Page 41: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/41.jpg)
Questions?
![Page 42: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/42.jpg)
Setup
ExperimentesExperimentes
Experiment Dataset (records) # Source (records)
BigBook X BigBook
2000 2000
CORA X CORA 150 350
Folha X Web Ads 500 125
![Page 43: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/43.jpg)
ExperimentesExperimentes
![Page 44: ONDUX On-Demand Unsupervised Learning for Information Extraction Eli Cortez, Altigran da Silva and Edleno de Moura Federal University of Amazonas (UFAM)](https://reader036.fdocuments.in/reader036/viewer/2022062423/56649ddf5503460f94ad8f74/html5/thumbnails/44.jpg)
ExperimentesExperimentes