Lecture2notes: Summarizing Lecture Videos by Classifying ...

61
Lecture2notes: Summarizing Lecture Videos by Classifying Slides and Analyzing Text with Machine Learning Hayden Housen https://haydenhousen.com [email protected] GitHub: @HHousen Mentor: Dhiraj Joshi @ IMB All Images Without Citations Are My Own Background: https://unsplash.com/photos/ieic5Tq8YMk Natural Language Processing • Computer Vision • Machine Learning

Transcript of Lecture2notes: Summarizing Lecture Videos by Classifying ...

Page 1: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Lecture2notes: Summarizing Lecture

Videos by Classifying Slides and

Analyzing Text with Machine Learning

Hayden Housen

https://[email protected]: @HHousenMentor: Dhiraj Joshi @ IMB

All Images Without Citations Are My OwnBackground: https://unsplash.com/photos/ieic5Tq8YMk

Natural Language Processing • Computer Vision • Machine Learning

Page 2: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Phone

Presentation

Goal/Focus

Page 3: Lecture2notes: Summarizing Lecture Videos by Classifying ...

AI Model

Lecture Video Detailed Notes

Goal/Focus

Page 4: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Slide ClassifierAI Summarization

Models

Website

Gap/Hypothesis

End-to-End Process

Presenter SlideSlide

Page 5: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Lecture Video

Video

Audio

Summarization Algorithm

Detailed Notes

Brief Methodology

Combination Algorithm

StructuredContent

Transcript

Render

Page 6: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Computer Vision

“Teaching Computers To See”•Face

Recognition Emotion Analysis

Crowd Analytics

Sports: draw lines on field& highlights

Medical Imaging

Movie Special Effects

Self Driving Cars

Page 7: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Computer Vision

“Teaching Computers To See”

•Face Recognition

Emotion Analysis

Crowd Analytics

Sports: draw lines on field& highlights

Medical Imaging

Movie Special Effects

Self Driving Cars

75%

Humans generate massive amounts of video (Video data

accounted for 75% of the total internet traffic in 2017) [23]

Page 8: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Literature Review – Note Taking

Note taking is almost a universal activity among students.[27]

Students’ notes are generally incomplete, and thus not adequate for reviewing the material.[28]

Students prefer guided notes[29] and course final exam performance higher for guided notes.[30]

Page 9: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Estimated Effects

Previewing: [31] suggests that summarized lecture slides reduce the amount of previewing time required without impacting quiz scores

MOOCs: Summaries enable quick skimming of the main points.

Automated summaries will

Decrease time spent creating notes

Increase quiz scores (content knowledge)

Enable faster learning

Page 10: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Literature Review – Lecture Summarization

TalkMiner22 – Web scraping and OCR used to index online lecture videos. We implement the

duplicate alignment algorithm suggested by TalkMiner.

Summary of TalkMiner [22] Summary of [20]

“Lecture Video Indexing20 – Inspired our slide structure analysis algorithm. We use the same

title identification and text classification thresholds.

Page 11: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Literature Review – Lecture Summarization

BERT Text Summarization Lectures [23] is most related since it applies deep learning to lecture summarization.

Approach is limited because:

It only tests the BERT model

Does not consider the information on the slides

Does not fine-tune BERT and merely uses it for embeddings

Has no STT component

Page 12: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Gap/HypothesisMain Components

Slide ClassifierAI Summarization

Models

WebsiteEnd-to-End

Process

Presenter SlideSlide

Page 13: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Lecture Video DatasetAI

Summarization Models

WebsiteEnd-to-End

Process

Main Components

AI Slide Classifier Model

Page 14: Lecture2notes: Summarizing Lecture Videos by Classifying ...

1. Slide Classification Model

Classify frames of input video into 9 classes

Allows system to identify images containing slides for additional processing

Means system can ignore useless frames (containing only the presenter or audience)

Page 15: Lecture2notes: Summarizing Lecture Videos by Classifying ...

SortingAuto Sort (AI Classification)

Manual Sorting Repairs

1.1. Lecture Video Dataset – Diagram

WebsiteScraper

Video Downloader

Frame Extractor

YouTubeScraperHuman

Interaction

Compile Data

Videos CSV

Train Slide

Classifier

Bootstrapping

MITOpenCourseWare

Final dataset contains 15599 images extracted

from 78 videos manually classified into 9

categories.

Page 16: Lecture2notes: Summarizing Lecture Videos by Classifying ...

1.2. Slide Classifier – Dataset Modifications

Train-test Train-test-three CV

Page 17: Lecture2notes: Summarizing Lecture Videos by Classifying ...

1.2. Slide Classifier – Dataset Modifications

Train-test

• Split dataset evenly (by category) into train (80%) & test (20%) sets.

• Average deviation = 0.0929.• Random subset would cause

class imbalance.

Train-test-three

• Only the “slide” and “presenter_slide” classes used in end-to-end process, so create dataset with only those classes.

• “Other” class contains remaining7 classes

• Average deviation = 0.001104.

Page 18: Lecture2notes: Summarizing Lecture Videos by Classifying ...

1.3. Slide Classifier – Squished Images

After hyperparameters found (by training 262 models in CV using optimization framework), trainfinal-general and three-category models.

Frames from videos have 16:9 aspect ratio but ImageNet CNNs accept images with 1:1 aspect ratio.

Center cropping and scaling can remove important data.

Train a model by simply rescaling.Misclassified as “slide”

because presenter cropped out.

Page 19: Lecture2notes: Summarizing Lecture Videos by Classifying ...

1.4. Slide Classifier – Results

Model Dataset Accuracy Accuracy (train) F-score Precision Recall

Final-general train-test 82.62 98.58 87.44 97.73 82.62

Three-category train-test-three 89.97 99.72 93.82 99.95 89.97

train-test 83.86 97.16 88.16 97.72 83.86

train-test-three 87.21 100 91.57 99.8 87.21Squished-image

For each model configuration, trained 11 models and report the average metrics.

In the final pipeline, the three-category model is used.

Slide classifier can be used in other research to easily identify slides.

Page 20: Lecture2notes: Summarizing Lecture Videos by Classifying ...

1.4 Slide Classifier – Results

Final-generalMedian (by accuracy)

Three-categoryMedian (by accuracy)

Page 21: Lecture2notes: Summarizing Lecture Videos by Classifying ...

1.5 Slide Classifier – Discussion

Incorrectly classifying “slide” frames as “presenter_slide” will have minimal impact on final summary will be ignored during processing.

Incorrectly classifying “presenter_slide” frames as “slide” will impact final summary will not be cropped.

Page 22: Lecture2notes: Summarizing Lecture Videos by Classifying ...

2.1 End-to-End Approach – Overview

ffmpeg Slide Classifier that uses the ML model from

previous slide trained on

the dataset

Crop slides captured by

video camera

Extract Frames Classify Frames Perspective Crop Cluster Slides

Segment or Normal

Page 23: Lecture2notes: Summarizing Lecture Videos by Classifying ...

2.1 End-to-End Approach – Overview

Tesseract OCRStroke Width

Title IdentificationBold Identification

Segment or DeepSpeechCMU Sphinx

VoskGoogle Cloud Speech API

CombinationModifications

ExtractiveAbstractive

Slide Structure Analysis & OCR Figure Extraction Transcribe Audio SummarizationCluster Slides

Text Detection w/EAST

Color DetectionShannon Entropy

Page 24: Lecture2notes: Summarizing Lecture Videos by Classifying ...

2.2. Perspective Cropping Algorithm

Frames Unique SlidesAlgorithm

Page 25: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Feature Matching

No Yes

RANSAC Transform

Remove Duplicates

Match Features

Corner Crop Transform

2.2. Perspective Cropping Algorithm

Frames

SlidesScreen

Capture

Video Camera

Classifier

Cluster Together

Unique Slides

Camera Motion

Page 26: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Feature Matching

No Yes

RANSAC Transform

Remove Duplicates

Camera Motion

Match Features

Corner Crop

Transform

2.3. Perspective Cropping – CCT

CCT

Contours

Hough Lines

Page 27: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Feature Matching

No Yes

RANSAC Transform

Remove Duplicates

Camera Motion

Match Features

Corner Crop

Transform

2.3. Perspective Cropping – Feature Match

Iterates through the “screen capture” and “video camera” slides in chronological order.

When this category switches, begins matching using ORB & FLANN until number of matched features below threshold.

Goal: find “video camera” within “screen capture”

Page 28: Lecture2notes: Summarizing Lecture Videos by Classifying ...

2.3. Perspective Cropping – Feature Match

If number of matches is above a threshold then the slides are considered identical.

Page 29: Lecture2notes: Summarizing Lecture Videos by Classifying ...

2.3. Perspective Cropping – Feature Match

Page 30: Lecture2notes: Summarizing Lecture Videos by Classifying ...

2.4. Perspective Cropping – More Content?

124240 content units addedAlgorithmThe slide with the most content is kept.

Page 31: Lecture2notes: Summarizing Lecture Videos by Classifying ...

2.4. Slide Structure Analysis (SSA)

Title: 2.4. Perspective Cropping – Feature MatchBullet 1: **Iterates** through the “screen capture” and “video camera” slides in **chronological** order.Bullet 2: When this **category switches**, begins matching using ORB & FLANN until number of matched features below threshold.Bullet 3: **Goal:** find “video camera” within “screen capture”No footer text

Goal: Copy and paste formatted text from slides while not having access to the text, only an image of the text.

Page 32: Lecture2notes: Summarizing Lecture Videos by Classifying ...

2.4. Slide Structure Analysis – 4 Algorithms

Stroke Width1. Distance Transform2. Peak Local Max – Find peaks in color

Identify Title [20]

Criteria for line to be a title:1. In upper third of image2. In first block and first

paragraph3. Stroke width > 1

standard deviation from average stroke width

4. Line height > average5. x position of top-left

corner < image_width*0.77

6. > 3 characters

Categorize Text [20]

Based on stroke width and height of text bounding boxes.• Bold text• Small text• Normal text

Figure Extraction1. Detects objects that are

close together & meet a size and aspect ratiothreshold

2. Checks that each potential figure contains a minimal amount of text and is color

3. Removes overlappingfigures

Page 33: Lecture2notes: Summarizing Lecture Videos by Classifying ...

2.5. Figure Extraction

Page 34: Lecture2notes: Summarizing Lecture Videos by Classifying ...

2.5. Figure Extraction

Detects objects that are close together &

meet a size and aspect ratio

threshold

Checks that each potential figure

contains a minimal amount of text and is

color

Removes overlappingfigures

Page 35: Lecture2notes: Summarizing Lecture Videos by Classifying ...

2.6. Transcribe Audio – Speech-To-Text

Tested performance of STT models on 43 lecture videos from slide classifier dataset.

Chunking by Voice Activity: Uses WebRTC Voice Activity Detector (VAD) – increases the speed of STT by reducing the amount of audio without speech.

STT ModelsDeepSpeech

SphinxVosk small-0.3

Vosk daanzu-20200905Vosk aspire-0.2

Google’s paid STT API

Page 36: Lecture2notes: Summarizing Lecture Videos by Classifying ...

2.6. Transcribe Audio – Results

Model WER MER WIL LS TC WER Processing Time

DeepSpeech (chunking) 43.01 41.82 59.04 5.97 ≈4 hours

DeepSpeech 44.44 42.98 59.99 5.97 ≈20 hours

Google STT ("default" model) 34.43 33.14 49.05 12.23 ≈20 minutes

Vosk small-0.3 35.43 33.64 50.84 15.34 ≈8.5 hours

Vosk daanzu-20200905 33.67 31.87 48.28 7.08 ≈5.5 hours

Vosk aspire-0.2 41.38 38.44 56.35 13.64 ≈19 hours

Vosk aspire-0.2 (chunking) 41.45 38.65 56.56 13.64 ≈19 hours

"LS TC WER" stands for LibriSpeech test-clean WER (Word Error Rate)Sphinx was not tested because it is between 6x-18x slower than other models.

Page 37: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Extract Frames Classify Frames

Perspective Crop Cluster Slides

Slide Structure Analysis &

OCR

Figure Extraction

Transcribe AudioSummarization

3.1. End-to-End Approach – Slides/Videos

= Novel ML Models= Novel Approaches

Page 38: Lecture2notes: Summarizing Lecture Videos by Classifying ...

3.1. Text Summarization

Extractive SummarizationIdentifies important sections of the text and generates them verbatim producing a subset of the sentences from the originaltext

Abstractive SummarizationReproduces important material in a new way after interpretationand examination of the textusing advanced naturallanguage

Page 39: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Summarization Model

3.1. Text Summarization

Slides Transcript

Audio Transcript

Combination Algorithm

Page 40: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Combination Algorithm• only_asr – only uses the audio transcript (deletes the

slide transcript)• only_slides – reverse of only_asr• concat – appends audio transcript to slide transcript• full_sents – audio transcript appended to only the

complete sentences of the slide transcript• keyword_based (most advanced) – selects a certain

percentage of sentences from the audio transcript based on keywords found in the slides transcript

3.1. Text Summarization

Slides Transcript

Audio Transcript

Summarization Model

Combination Algorithm

Page 41: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Summarization Model

3.1. Text Summarization

Slides Transcript

Audio Transcript

Combination Algorithm

Page 42: Lecture2notes: Summarizing Lecture Videos by Classifying ...

3.1. Text Summarization

Slides Transcript

Audio Transcript

Combination Algorithm

Summarization Model

Summarization Model1. Extractive Summarization

• Cluster – groups the lecture transcript into categories by topic and summarizes each topic using “generic”

• Generic (non-neural) – standard summarization heuristics

2. Abstractive Summarization• PreSumm or BART or TransformerSum or

PEGASUS

Page 43: Lecture2notes: Summarizing Lecture Videos by Classifying ...

3.1. Text Summarization – SOTA Models

PreSumm: Applied BERT to text summarization

BART: standard seq2seq architecture with a bidirectional encoder (like BERT) and a left-to-right decoder (like GPT)

PEGASUS: pretraining task is intentionally similar to summarization

[24]

Page 44: Lecture2notes: Summarizing Lecture Videos by Classifying ...

3.2. TransformerSum – Overview

Summarization training and inference library created by Hayden Housen

Improves the architecture of BertSum(extractive component of PreSumm)

TransformerSum was used by a company that serves industry-leading enterprises, which engage with millions of customers daily.

Page 45: Lecture2notes: Summarizing Lecture Videos by Classifying ...

3.2. TransformerSum – Diagram

S = SentenceT = TokenE = EmbeddingR = Sentence Ranking

Tokenizer

T 1T 2T 3T 4…

Word EmbeddingModel

TE 1TE 2TE 3TE 4

Pooling

SE 1SE 2

Input Document

S 1S 2…

Classifier

R 1R 2…

Page 46: Lecture2notes: Summarizing Lecture Videos by Classifying ...

3.2. TransformerSum – Pooling Modes

Test all three pooling modes (mean_tokens, sentence_rep_tokens, and max_tokens) using DistilBERT and DistilRoBERTa

Across all datasets and models, pooling mode has no significant impacton the final ROUGE scores.

Model Pooling Method CNN/DM WikiHow ArXiv/PubMed

sent_rep 42.71/19.91/39.18 30.69/08.65/28.58 34.93/12.21/31.00

mean 42.70/19.88/39.16 30.48/08.56/28.42 34.48/11.89/30.61

max 42.74/19.90/39.17 30.51/08.62/28.43 34.50/11.91/30.62

sent_rep 42.87/20.02/39.31 31.07/08.96/28.95 34.70/12.16/30.82

mean 43.00/20.08/39.42 30.96/08.93/28.86 34.24/11.82/30.42

max 42.91/20.04/39.33 30.93/08.92/28.82 34.28/11.82/30.44

distilbert-base-uncased

distilroberta-base

Page 47: Lecture2notes: Summarizing Lecture Videos by Classifying ...

3.2. TransformerSum – Classifiers

Test all four classification layers.

No significant difference in ROUGE scores between classifiers.

Model Classifier CNN/DM WikiHow ArXiv/PubMed

simple_linear 42.71/19.91/39.18 30.69/08.65/28.58 34.93/12.21/31.00

linear 42.70/19.84/39.14 30.67/08.62/28.56 34.87/12.20/30.96

transformer 42.78/19.93/39.22 30.66/08.69/28.57 34.94/12.22/31.03

transformer_linear 42.78/19.93/39.22 30.64/08.64/28.54 34.97/12.22/31.02

simple_linear 42.87/20.02/39.31 31.07/08.96/28.95 34.70/12.16/30.82

linear 43.18/20.26/39.61 31.08/08.98/28.96 34.77/12.17/30.88

transformer 42.94/20.03/39.37 31.05/08.97/28.93 34.77/12.17/30.87

transformer_linear 42.90/20.00/39.34 31.13/09.01/29.02 34.77/12.18/30.88

distilbert-base-uncased

distilroberta-base

Page 48: Lecture2notes: Summarizing Lecture Videos by Classifying ...

3.2. TransformerSum – Results

R1/R2/RL-Sum CNN/DM WikiHow ArXiv/PubMed

distilbert-base-uncased 42.71/19.91/39.18 30.69/08.65/28.58 34.93/12.21/31.00

distilroberta-base 42.87/20.02/39.31 31.07/08.96/28.95 34.70/12.16/30.82

bert-base-uncased 42.78/19.83/39.18 30.68/08.67/28.59 34.80/12.26/30.92

roberta-base 43.24/20.36/39.65 31.26/09.09/29.14 34.81/12.26/30.91

mobilebert-uncased 42.01/19.31/38.53 30.72/08.78/28.59 33.97/11.74/30.19

BertSumExt 43.25/20.24/39.63 None None

BertSumExt-large 43.85/20.34/39.90 None None

MatchSum bert-base 44.22/20.62/40.38 31.85/08.98/29.58 None

MatchSum roberta-base 44.41/20.86/40.55 None None

PEGASUS-base 41.79/18.81/38.93 36.58/15.64/30.01 37.39/12.66/23.87

PEGASUS-large HN 44.17/21.47/41.11 41.35/18.51/33.42 44.88/18.37/26.58

PEGASUS-large C4 43.90/21.20/40.76 43.06/19.71/34.80 45.10/18.59/26.75

Smaller: Mobilebert-uncased-ext-sum model achieves 96.59% of the performance of BertSumwhile containing 4.45x fewer parameters.

Faster: Distilroberta-base-ext-sum model takes 74x less time to train than MatchSum.

My

Res

ults

Page 49: Lecture2notes: Summarizing Lecture Videos by Classifying ...

4. Lecture2Notes – Website

Page 50: Lecture2notes: Summarizing Lecture Videos by Classifying ...
Page 51: Lecture2notes: Summarizing Lecture Videos by Classifying ...
Page 52: Lecture2notes: Summarizing Lecture Videos by Classifying ...
Page 53: Lecture2notes: Summarizing Lecture Videos by Classifying ...
Page 54: Lecture2notes: Summarizing Lecture Videos by Classifying ...
Page 55: Lecture2notes: Summarizing Lecture Videos by Classifying ...
Page 56: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Summary – Five-Fold Contribution

Lecture Video

Dataset

• Appropriate categories

• Large variety

Slide Classifier

• Identify important frames in slide presentations

•Summarization Models

• Improve state-of-the-art in text summarization

• Novel approaches

•End-To-End

• Multimodal approach

• Convert lecture videos to notes

Online Web Service

• Allows people to easily use the created lecture summarizer to create notes based on their own lectures

Page 57: Lecture2notes: Summarizing Lecture Videos by Classifying ...

https://unsplash.com/photos/OyCl7Y4y0Bk https://www.advectas.com/en/blog/what-is-machine-learning/

Machine LearningAssess Impact on Education

Future Research

Summaries of lecture slide sets enhanced likelihood students would preview material & sometimes led

to better quiz scores [21]

Slide classifier (makes future slide analysis easier), dataset of lecture

videos, TransformerSum

Page 58: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Acknowledgements

Dr. Dhiraj Joshi for giving me advice on methodology

Gillian Rinaldo for guiding me

Classmates/Peers for supporting me

Parents for helpful suggestions

Page 59: Lecture2notes: Summarizing Lecture Videos by Classifying ...

References

[1] Szeliski, R. (2010). Computer Vision: Algorithms and Applications. Springer.

[2] Boden, M. (2008). Mind as machine (p. 781). Oxford: Clarendon Press.

[3] Dragonette, L. (2018, April 19). Artificial Intelligence - AI. Retrieved December 6, 2018, from https://www.investopedia.com/terms/a/artificial-intelligence-ai.asp

[4] Machine learning. (2018, December 04). Retrieved from https://en.wikipedia.org/wiki/Machine_learning

[5] Feature (computer vision). (2019). Retrieved from https://en.wikipedia.org/wiki/Feature_(computer_vision)

[6] Ripley, B. (2009). Pattern recognition and neural networks (p. 354). Cambridge: Cambridge University Press.

[7] What is the difference between labeled and unlabeled data?. (2019). Retrieved from https://stackoverflow.com/questions/19170603/what-is-the-difference-between-labeled-and-unlabeled-data/19172720#19172720

[8] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. doi: 10.1007/bf00994018

[9] A Beginner's Guide to Neural Networks and Deep Learning. (2019). Retrieved from https://skymind.ai/wiki/neural-network

[10] Le, J. (2018). A Tour of The Top 10 Algorithms for Machine Learning Newbies. Retrieved from https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11

[11] Brownlee, J. (2017). What is the Difference Between Test and Validation Datasets?. Retrieved from https://machinelearningmastery.com/difference-test-validation-datasets/

[12] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 1097-1105. Retrieved February 5, 2019.

[13] Image Classification. (n.d.). Retrieved February 5, 2019, from https://cs231n.github.io/classification/

[14] Seif, G. (2018, December 13). How to do everything in Computer Vision. Retrieved February 3, 2019, from https://towardsdatascience.com/how-to-do-everything-in-computer-vision-2b442c469928

[15] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.91

[16] Chablani, M. (2017, August 21). YOLO - You only look once, real time object detection explained. Retrieved February 5, 2019, from https://towardsdatascience.com/yolo-you-only-look-once-real-time-object-detection-explained-492dc9230006

[17] Gandhi, R. (2018, July 09). R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms. Retrieved February 5, 2019, from https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e

Page 60: Lecture2notes: Summarizing Lecture Videos by Classifying ...

References

[18] Krizhevsky, A. (n.d.). Google Scholar. Retrieved February 5, 2019, from https://scholar.google.com/citations?user=xegzhJcAAAAJ

[19] Mahasseni, B., Lam, M., & Todorovic, S. (2017). Unsupervised Video Summarization with Adversarial LSTM Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2017.318

[20] Yang, H., Siebert, M., Luhne, P., Sack, H., & Meinel, C. (2012). Lecture Video Indexing and Analysis Using Video OCR Technology. Journal of Multimedia Processing and Technologies, 2(4), 176-196. doi:10.1109/sitis.2011.20

[21] Shimada, A., Okubo, F., Yin, C., & Ogata, H. (2017). Automatic Summarization of Lecture Slides for Enhanced Student Preview Technical Report and User Study. IEEE Transactions on Learning Technologies, 11(2), 165-178. doi:10.1109/tlt.2017.2682086

[22] Adcock, J., Cooper, M., Denoue, L., Pirsiavash, H., & Rowe, L. A. (2010). TalkMiner. MM'10 - Proceedings of the ACM Multimedia 2010 International Conference, 241-250. doi:10.1145/1873951.1873986

[23] “Cisco Visual Networking Index: Forecast and Trends, 2017–2022 White Paper.” VNI Global Fixed and Mobile Internet Traffic Forecasts. Cisco, 27 February 2019, https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white-paper-c11-741490.html

[24] Liu, Y., & Lapata, M. (2019). Text Summarization with Pretrained Encoders. Retrieved from https://arxiv.org/abs/1908.08345

[25] Sciforce. (2019, January 23). Towards Automatic Text Summarization: Extractive Methods. Retrieved December 27, 2019, from https://medium.com/sciforce/towards-automatic-text-summarization-extractive-methods-e8439cd54715.

[26] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi: 10.1109/cvpr.2016.90

[27] Pauline A. Nye, Terence J. Crooks, Melanie Powley, and Gail Tripp. Student note-taking related to university examination performance. Higher Education, 13(1):85–97, February 1984. ISSN 0018-1560, 1573-174X. doi: 10.1007/BF00136532. URL http://link.springer.com/10.1007/BF00136532.

[28] Kenneth A. Kiewra. Providing the Instructor’s Notes: An Effective Addition to Student Notetaking. Educational Psychologist, 20(1):33–39, January 1985. ISSN 0046-1520, 1532-6985. doi: 10.1207/s15326985ep2001_5. URL https://www.tandfonline.com/doi/full/10.1207/s15326985ep2001_5.

[29] Jennifer L Austin, Matthew D Thibeault, and Jon S Bailey. Effects of Guided Notes on University Students’ Responding and Recall of Information. Journal of Behavioral Education, page 12, 2002.

[30] Anneh Mohammad Gharravi. Impact of instructor-provided notes on the learning and exam performance of medical students in an organ system-based medical curriculum. Advances in Medical Education and Practice, Volume 9:665–672, September 2018. ISSN 1179-7258. doi: 10.2147/AMEP.S172345. URL https://www.dovepress.com/impact-of-instructor-provided-notes-on-the-learning-and-exam-performan-peer-reviewed-article-AMEP.

[31] Atsushi Shimada, Fumiya Okubo, Chengjiu Yin, and Hiroaki Ogata. Automatic Summarization of Lecture Slides for Enhanced Student PreviewTechnical Report and User Study. IEEE Transactions on Learning Technologies, 11(2):165–178, April 2018. ISSN 1939-1382, 2372-0050. doi: 10.1109/TLT.2017.2682086. URL https://ieeexplore.ieee.org/document/7879355/.

Page 61: Lecture2notes: Summarizing Lecture Videos by Classifying ...

Lecture2notes: Summarizing Lecture

Videos by Classifying Slides and

Analyzing Text with Machine Learning

Hayden Housen

https://[email protected]: @HHousenMentor: Dhiraj Joshi @ IMB

All Images Without Citations Are My OwnBackground: https://unsplash.com/photos/ieic5Tq8YMk

Natural Language Processing • Computer Vision • Machine Learning