Overview Transcription Detail Another Transcription Animation.
Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data...
Transcript of Automated Transcription and Learn... · 2020. 8. 11. · Lunch & Learn Jake Surman Research Data...
Automated TranscriptionLunch & Learn
Jake SurmanResearch Data Specialist, Research Technology Services
Automated Transcription
• Considerations
• Transcription options
• Automated Transcription Services
– Microsoft Stream/Teams
– Zoom
– AWS
– Google Cloud
• Quality comparison
UNSW RESEARCH INFRASTRUCTURE
Research Technology Services
Compute · Data · CommunityA
bo
ut
Us
The ResTech compute team procides support for those who have problems too big for the
computer at your desk.
UNSW provides a number of platforms for storing, capturing
and sharing your research
data.
The ResTech Community team aims to build a strong
and connected research
network within UNSW.
www.restech.unsw.edu.au
Level 3, Chemical Science (F10)
Sign up to our mailing list
Compute Data Community
High Performance Computing (HPC)• Free for researchers and HDR candidates• As a service: NCI – Gadi (100 million compute hours)• Katana – local HPC cluster (24 million compute hours)
Cloud Computing• Cloud services: Amazon AWS, Microsoft Azure, NECTAR • Seed money for exploring research in the cloud
Research Data
• Help with Data Management training, issues, information
• Assistance with data moves, storage, planning
Research Technology Services
Research technology training• 40+ courses per year on campus and online• Free to researchers and HDR candidates
Consulting• Help with code and using HPC • Data Classification, Management, and tools help• Advising on, purchasing and configuring HPC equipment
Hacky Hour• Casual meetup 3pm every Thursday in Penny Lane (currently on Teams)• Bring your problems with code, HPC, data• Presentations about research technologies
RDM InitiativeDivision of Research
• Researcher Development
(Training + Engagement)
• Researcher Technology Services
(Data Team)
• PVC-RI
• IT
• Library
• Data Governance
• Research Integrity
PVC – Research Infrastructure
(Initiative Owner)
People
Tools
Policy
What data do you have?
In the Chat:
Do you have Audio? Video?
How many files, and how long?
Is your data sensitive? (Medical, Children, Identifying)
CONSIDERATIONS
Security
• What is the classification of your data?
• Where is your data is being held?
• Who has access to your video/audio upload?
• Who has access to your transcript?
• How long do you need to keep these files?
• Is the data encrypted on disk and in transit?
• Will they use your data or share it with others?
Functionality
• How good is the transcription quality?
• How expensive is it?
• How easy is it to use?
• What format is the transcription in?
• What options do you have?
• When will you get the transcription?
TRANSCRIPTION OPTIONS
Human transcription
Positives:
• Generally good quality transcription
• Fast enough? Hours to weeks to get results
Negatives:
• Can be expensive ($60+/hour)
• Need to be careful about who you give your files to, where do they store your data,
what tools do they use?
• Confidentiality agreement needed - https://research.unsw.edu.au/forms-and-
templates
• At least one human sees your data
Machine transcription
Positives:
• Very fast, results in minutes to hours
• No humans involved
• Cheap or free
Negatives:
• Quality highly variable
• Can be fiddly to use
• Need to be careful about security, locality, fine print
AUTOMATED TRANSCRIPTION SERVICES
Microsoft Stream/Teams
Positives:
• Easy to get the transcript
• Free
• Fast
• Video and transcript are in a secure location
• Can upload your own files as well as recorded Teams meetings
Negatives:
• Transcript is in a strange format
• Quality OK
Zoom
Positives:
• Free
Negatives:
• Slow (3 days)
• Stored in the USA
• Quality worst of these three options
• Only for meetings in Zoom that you record (can’t upload files)
Amazon Web Services Transcribe
Positives:
• Cheap (1 hour free/month for 12 months, then $1.5/hour)
• Can upload lots of files at once
• Web interface
• Can upload custom dictionary and use “medical” version
Negatives:
• Quality OK
• Takes a bit of work to set up, needs a credit card.
• Need to send a request to opt out of re-using your data for training
Positives:
• Cheap (1 hour free/month for 12 months, then $1.5/hour)
• Can upload lots of files at once
• Web interface
• Can upload custom dictionary and use “medical” version
Negatives:
• Quality OK
• Takes a bit of work to set up, needs a credit card.
• Need to send a request to opt out of re-using your data for training
Google Cloud Speech to Text
Positives:
• Cheap ($1.5/hour)
• Lots of options
Negatives:
• Quality OK
• Takes a lot of work to set up, needs a credit card.
• No web interface, only command-line and programmer API
• Only for audio, mainly FLAC and WAV
Quality ComparisonAmazon Transcribe Stream auto-transcription Zoom auto-transcription
U. N s W supported data platforms for research data. Boot data management is fundamental. When conducting research you NSW provides a number of approved data storage platforms for your research data. Different platforms are suitable for different classifications of data. Choosing a storage platform should depend on the classifications of your data to ensure your research data is secured. Rdm at u W is here to assist contact your friendly rdm at U. N s W team.
Use Unsw supported dataplatforms for research data who data management is fundamentalwhen conducting research. Unsw provides a number of approveddata storage platforms for your research data. Differentplatforms are suitable for different classifications ofdata. Choosing a storage platform should depend on theclassification of your data. To ensure your research data issecured. RDM at Unsw is here to assist. Contact your friendlyRDM at Unsw team.
He's UFW supported data platforms for research data. Data Management is fundamental. When conducting research us who provides a number of approves data storage platforms for your research data. different platforms are suitable for different classifications of data choosing a storage platform should depend on the classification of your data to ensure your research data is secure our DM at us, who is he to assist contact your friendly our DM at the UN FW team.
Quality ComparisonGoogle speech to text Azure Transcription
Use unsw supported data platforms for
research data good data management is
fundamental when conducting research
unsw provides a number of approved
data storage platforms for your research
data different platforms are suitable for
different classifications of data choosing
a storage platform should depend on the
classification of your data to ensure your
research data is secure.
Third RDM at unsw is here to assist contact your friendly RDM at unsw team.
Use Unsw supported data platforms for research data. Good data management is fundamental when conducting research. Unsw provides a number of approved data storage platforms for your research data. Different platforms are suitable for different classifications of data. Choosing a storage platform should depend
on the classification of your data to ensure your research data is secured. Rdm at Unsw. Is here to assist. Contact your friendly rdm at Unsw team.
Conclusion, Q&A
In the chat:Would you try automated transcription for your research? (Have you already?)
Question Time.
Contact us: [email protected]