CS 4624 - Transcription Research.docx  · Web viewGoogle WebSpeech API. ... (in a wide variety of...

11
CS 4624 - Transcription Research Quick Nav Speech to Text Candidates Sphinx Adobe's Speech Analysis Helpful Topics Scratch Workflow Pros and Cons Ways we could augment this Macspeech Scribe uSubtitle.tv Dictation Dragon Dictation Google WebSpeech API Transcription Suites INQScribe Conversion of Subtitle Formats Subtitle Edit Concerns on Adobe Premiere Workflow Manual Dictation Amazon Mechanical Turk CrowdFlower Clickworker Speech to Text Candidates Although none of these will be very successful. Perhaps there is a way to augment the transcription process with this. Sphinx Terrible recognition, only works well when given set grammars. However works well on the backend. It may be possible to make this better in settings… there’s a lot of them. I have no cluse. More research and testing will be needed as I’ve only used it for one video.

Transcript of CS 4624 - Transcription Research.docx  · Web viewGoogle WebSpeech API. ... (in a wide variety of...

Page 1: CS 4624 - Transcription Research.docx  · Web viewGoogle WebSpeech API. ... (in a wide variety of ways useful for different analytic purposes) ... CS 4624 - Transcription Research.docx

CS 4624 - Transcription Research Quick Nav

Speech to Text CandidatesSphinxAdobe's Speech Analysis

Helpful TopicsScratch WorkflowPros and ConsWays we could augment this

Macspeech ScribeuSubtitle.tv

DictationDragon DictationGoogle WebSpeech API

Transcription SuitesINQScribe

Conversion of Subtitle FormatsSubtitle Edit

Concerns on Adobe Premiere WorkflowManual Dictation

Amazon Mechanical TurkCrowdFlowerClickworker

Speech to Text CandidatesAlthough none of these will be very successful. Perhaps there is a way to augment the

transcription process with this.

SphinxTerrible recognition, only works well when given set grammars. However works well on the backend. It may be possible to make this better in settings… there’s a lot of them. I have no cluse. More research and testing will be needed as I’ve only used it for one video.

Adobe's Speech Analysis Helpful Topics

http://forums.adobe.com/message/3122909 http://forums.adobe.com/message/4831975http://forums.adobe.com/message/5202092http://forums.adobe.com/message/5986007

Tucker Legard, 02/18/14,
Use headings to make shit show up in this.
Page 2: CS 4624 - Transcription Research.docx  · Web viewGoogle WebSpeech API. ... (in a wide variety of ways useful for different analytic purposes) ... CS 4624 - Transcription Research.docx

Scratch Workflow// TODO : Figure out a way to automate thisImport Video into Premiere File >> ImportRight Click on Clip that was importedSelect Analyze Content

Ensure Identify Speakers is Checked

Allow videos to be transcribed “overnight” (any time really).

Clean up in suite

Page 3: CS 4624 - Transcription Research.docx  · Web viewGoogle WebSpeech API. ... (in a wide variety of ways useful for different analytic purposes) ... CS 4624 - Transcription Research.docx

Jump around, fix words…

// TODO : See if there’s a better less hacky way to do thisCopy the XML file with the perfected transcript to a better location. Right now I found it inC:\Users\Tucker\AppData\Local\Temp\(garbage id).sub.xml

Convert the XML file to a VTT.

Pros and Cons// TODO : Research how accurate it is with responding to words

Pro: marginally good speech recognition gives confidence on how certain it is on wordsgives accurate time codesprovides interface for changing words and modifying

Cons: interface to change words is not very easy to use or intuitive

Ways we could augment thisDefinitely getting ahead of myself here… but since we’re given a confidence value in the XML maybe we work on a way of visualizing how confident the program is on a given word. Perhaps change the color or font size of the words it’s not comfortable with. Have a slider for what confidence values are acceptable.

Macspeech ScribeSeems to have more of a front end.

uSubtitle.tvLink: https://usubtitle.tv/

Page 4: CS 4624 - Transcription Research.docx  · Web viewGoogle WebSpeech API. ... (in a wide variety of ways useful for different analytic purposes) ... CS 4624 - Transcription Research.docx

Pros: Appears to be fairly accurate - 60% - 80%Can automatically download vtt filePretty easy to use

Cons: Not free (plans seem fairly expensive)Probably be cheaper to just have someone transcribe the video or use Adobe PPro

Screenshots:

Page 5: CS 4624 - Transcription Research.docx  · Web viewGoogle WebSpeech API. ... (in a wide variety of ways useful for different analytic purposes) ... CS 4624 - Transcription Research.docx

DictationMay require the person to have headphones and orally recite what is being heard. Or

have it live recording from some sort of played audio.

Dragon Dictation is trained to the speaker.

Page 6: CS 4624 - Transcription Research.docx  · Web viewGoogle WebSpeech API. ... (in a wide variety of ways useful for different analytic purposes) ... CS 4624 - Transcription Research.docx

Google WebSpeech API similar to Dragon Dictation.

Transcription SuitesThese are media players that make it easy to seek along video and modify captions

inline.

INQScribehttp://www.inqscribe.com/

Transanahttp://www.transana.org/index.htm

About:-Transcribe video and audio files (in a wide variety of ways useful for different analytic purposes)-cross platform-a lot of features that are cool but we do not need for this project-side by side transcribing with video:

Download/Purchase: -FREE demonstration version available

Page 7: CS 4624 - Transcription Research.docx  · Web viewGoogle WebSpeech API. ... (in a wide variety of ways useful for different analytic purposes) ... CS 4624 - Transcription Research.docx

-standard version (single-user): $75 per user/ per computer-multi-user version: $795 per project

*Express Scribe Transcription Softwarehttp://www.nch.com.au/scribe/

About:-professional audio player software for PC or Mac designed to assist the transcription of audio recordings-A typist can install it on their computer and control audio playback using a transcription foot pedal or keyboard (with 'hot' keys). -Foot pedal:

-Increase your words per minute by giving your feet control of playback, leaving your fingers free to type

-three controls which are usually set up for rewind, play/pause and fast-forward.-Works with Microsoft Word and all major word-processors

Download/Purchase:-Pro version is $40 but on sale in February for $19

Audiotranskription.de's f4 (Windows) and f5 (Mac)http://www.audiotranskription.de/english/f4.htm

About-can be controlled via the keyboard (instead of using the mouse)

Page 8: CS 4624 - Transcription Research.docx  · Web viewGoogle WebSpeech API. ... (in a wide variety of ways useful for different analytic purposes) ... CS 4624 - Transcription Research.docx

-automated short rewind upon pausing the recording-f4 automatically inserts time stamps and speaker tokens – this saves time.

Download/Purchase:-free version only plays first 10 minutes of file-can purchase 6 month or full-time license -prices (including foot pedals): http://www.audiotranskription.de/shop/?___store=english&___from_store=default

ResearchWare, Inc.'s HyperTRANSCRIBEhttp://www.researchware.com/products/hypertranscribe.html-open and play most popular audio and video formats, and provides both graphical and keyboard control to play, pause, and loop playback so your hands never have to leave the keyboard.-uses QuickTime-free demo version-not sure about full version… links on website are broken

Conversion of Subtitle FormatsSubtitle Edit gotten from http://www.nikse.dk/ with context from http://forums.adobe.com/message/5986007

Can easily convert the format to a plain txt file for searching if needed

// TODO : See if VTT is an option for this

Concerns on Adobe Premiere WorkflowWhen converting the Adobe XML file to another subtitle format, captions are displayed word for word. Two possible options for fixing this

Manually Grouping Words together in PremiereBasically someone manually goes in and merges one word with the other…

Page 9: CS 4624 - Transcription Research.docx  · Web viewGoogle WebSpeech API. ... (in a wide variety of ways useful for different analytic purposes) ... CS 4624 - Transcription Research.docx

Pros: Gives a more clean and artistic interpretation of the captions Tediousness might be mitigated by the fact that we have to fix all the errors anyways.

Cons: Tedious manual labor.

Programmatic Combining WordsWe take a converted file (find one that’ll be easy to parse). And examine both the times and number of words / tokens.Group words to up a max number of time. Or max delay away.

Manual Dictation

Amazon Mechanical Turk http://www.mturk.com

● Leader in human transcription● Extremely large user base● Quick response rate● Can choose not to pay if not properly transcribed● Has API can use to integrate into project

CrowdFlowerhttp://crowdflower.com

● Takes complicated tasks and breaks them down for users to do (i.e. makes the tasks simpler)

Page 10: CS 4624 - Transcription Research.docx  · Web viewGoogle WebSpeech API. ... (in a wide variety of ways useful for different analytic purposes) ... CS 4624 - Transcription Research.docx

● Crowdsources from various partners – one of (many of) them being Amazon Mechanical Turk; very, very large user base

● Has a system of peer reviewing (meaning high accuracy levels!)

Clickworkerhttp://www.clickworker.com/enNot much positive to say about this one in comparison to the others...