T EXT M INING – MP1 Prepared by: Mohammad Al Boni.
-
Upload
penelope-eustacia-wilcox -
Category
Documents
-
view
217 -
download
1
Transcript of T EXT M INING – MP1 Prepared by: Mohammad Al Boni.
![Page 1: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/1.jpg)
TEXT MINING – MP1Prepared by: Mohammad Al Boni
![Page 2: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/2.jpg)
2
TASKS & IMPLEMENTATION STRATEGIES
Some Implementation tips before you start! 1.1 Understand Zipf's Law. 1.2 Construct a Controlled Vocabulary. 1.3 Compute similarity between documents. 2.1 Maximum likelihood estimation for
statistical language models with proper smoothing.
2.2 Generate text documents from a language model.
2.3 Language model evaluation.
![Page 3: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/3.jpg)
3
TASKS & IMPLEMENTATION STRATEGIES
Some Implementation tips before you start! 1.1 Understand Zipf's Law. 1.2 Construct a Controlled Vocabulary. 1.3 Compute similarity between documents. 2.1 Maximum likelihood estimation for
statistical language models with proper smoothing.
2.2 Generate text documents from a language model.
2.3 Language model evaluation.
![Page 4: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/4.jpg)
4
IMPLEMENTATION TIPS
Use IDEs such as eclipse or netbeans. Divide and conquer!
Parallel computing vs. multi-threadingArrayList<Thread> threads = new ArrayList<Thread>();
for (int j = 0; j + core <FilesSize; j +=NumberOfProcessors)
analyzer.analyzeDocumentDemo(analyzer.LoadJson(Files.get(j+core)),core);
Use separate code files for separate problems. Save and load intermediate results. Always test your code on a small data sample.
![Page 5: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/5.jpg)
5
TASKS - 1.3 COMPUTE SIMILARITY BETWEEN DOCUMENTS
Approach: Load the controlled vocabulary from part 1.2 Load test documents Load the reviews from query.json Compute similarities and get the top 3 similar
reviews
![Page 6: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/6.jpg)
6
TASKS - 1.3 COMPUTE SIMILARITY BETWEEN DOCUMENTSCompute similarities and get the top 3 similar
reviews.
![Page 7: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/7.jpg)
7
TASKS - 1.3 COMPUTE SIMILARITY BETWEEN DOCUMENTSCompute similarities and get the top 3 similar
reviews.
![Page 8: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/8.jpg)
8
TASK 2.1 LM SMOOTHING
![Page 9: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/9.jpg)
9
TASK 2.1 LM SMOOTHING
![Page 10: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/10.jpg)
10
TASK 2.1 LM SMOOTHING
![Page 11: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/11.jpg)
11
TASK 2.1 LM SMOOTHING
![Page 12: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/12.jpg)
12
TASK 2.1 MAXIMUM LIKELIHOOD ESTIMATION
Figure 3. Absolute discounting smoothing Figure 2. Linear interpolation smoothing
![Page 13: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/13.jpg)
13
Figure 4. Linear interpolation smoothingFigure 5. Absolute discounting smoothing
TASK 2.1 MAXIMUM LIKELIHOOD ESTIMATION
![Page 14: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/14.jpg)
14
Figure 4. Linear interpolation smoothingFigure 5. Absolute discounting smoothing
TASK 2.1 MAXIMUM LIKELIHOOD ESTIMATION
![Page 15: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/15.jpg)
15
TASK 2.2 GENERATE TEXT DOCUMENTS FROM A LANGUAGE MODEL.
![Page 16: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/16.jpg)
16
TASK 2.2 GENERATE TEXT DOCUMENTS FROM A LANGUAGE MODEL.
![Page 17: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/17.jpg)
17
TASK 2.2 GENERATE TEXT DOCUMENTS FROM A LANGUAGE MODEL.
![Page 18: T EXT M INING – MP1 Prepared by: Mohammad Al Boni.](https://reader037.fdocuments.in/reader037/viewer/2022103005/56649f4f5503460f94c70c38/html5/thumbnails/18.jpg)
18
THANK YOU!