MobileASL: Making Cell Phones Accessible to the Deaf Community Anna Cavender Richard Ladner, Eve...
-
date post
22-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of MobileASL: Making Cell Phones Accessible to the Deaf Community Anna Cavender Richard Ladner, Eve...
MobileASL: Making Cell Phones Accessible to the
Deaf Community
Anna CavenderRichard Ladner, Eve Riskin
University of Washington
2
Two Themes
• MobileASL
• Cyber-Community for Advancing Deaf and Hard of Hearing in STEM (Science Technology Engineering and Math)
3
Challenges:• Limited network bandwidth• Limited processing power on cell phones
Our goal:• ASL communication using video cell phones over
current U.S. cell phone network
4
Cell Phone Network Constraints• Low bit rate goal
– GPRS (General Packet Radio Service)• Ranges from 30kbps to 80kbps (download)• Perhaps half that for upload• Unpredictable variation and packet loss
• 3G = 3rd Generation– Special service– Not yet widespread– Will still have congestion
• Service providers more likely to offer services if throughput can be minimized.
5
MobileASL Network Goals
• Sign language presents a unique challenge:– Not just appearance of video, intelligibility too!– If it works for sign language, other video applications
benefit too.
• MobileASL is about fair access to the current network– As soon as possible, no special accommodations– Not geographically limited– Lower bitrate + power savings = more accessible
6
Architecture
Camera
Encoder
Transmitter
Sender
Player
Decoder
Receiver
Receiver
Cell Phone Network
Cell phone
Encoder
7
Codec Used: x264
• Open source implementation of H.264 standard– Doubles compression ratio over MPEG2 – Replacing MPEG2 as industry standard
• x264 offers faster encoding
• Off-the-shelf H.264 decoder can be used– (speculation about H.264 on the iPhone)
8
Outline
• Motivation• Introduction• MobileASL Consumers• Eyetracking Motivation• Video Phone Study• Compression Challenges• Current Work• Conclusions
9
Discussions with Consumers
Open ended questions:– Physical Setup
• Camera, distance, …
– Features• Compatibility, text, …
– Scenarios• Lighting, driving, relay services, …
10
Consumer Response
• “I don’t foresee any limitations. I would use the phone anywhere: the grocery store, the bus, the car, a restaurant, … anywhere!”
• There is a need within the Deaf Community for mobile ASL conversations
• Existing video phone technology (with minor modifications) would be usable
11
Video Encoding for ASL
• Constraints of cell phone network create video compression challenges
• How do we compress ASL video to maximize intelligibility?
12
Outline
• Motivation• Introduction• MobileASL Consumers• Eyetracking Motivation• Video Phone Study • Compression Challenges• Current Work• Conclusions
13
Eyetracking Studies
• Participants watched ASL videos while eye movements were tracked
• Important regions of the video could be encoded differently
* Muir et al. (2005) and Agrafiotis et al. (2003)
14
Eyetracking Results
• 95% of eye movements within 2 degrees visual angle of the signer’s face (demo)
• Implications: Face region of video is most visually important– Detailed grammar in face requires foveal vision– Hands and arms can be viewed in peripheral vision
* Muir et al. (2005) and Agrafiotis et al. (2003)
15
Outline
• Motivation• Introduction• MobileASL Consumers• Eyetracking Motivation• Video Phone Study• Compression Challenges • Current Work• Conclusions
16
Mobile Video Phone Study
• 3 Region-of-Interest (ROI) values• 2 Frame rates, frames per second (FPS)• 3 different Bit rates
– 15 kbps, 20 kbps, 25 kbps
• 18 participants (7 women)– 10 Deaf, 5 hearing, 3 CODA*– All fluent in ASL
* CODA = (Hearing) Child of a Deaf Adult
17
Example of ROI
Varied quality in fixed-sized region around the face
• (demo)2x quality in face 4x quality in face
18
Examples of FPS• Varied frame rate: 10 fps and 15 fps• For a given bit rate:
Fewer frames = more bits per frame
• (demo)
20
User Preferences Results
1
2
3
4
5
15 kbps 20 kbps 25 kbps 10 fps 15 fps 0 roi 6 roi 12 roi
Type of Encoding
Av
era
ge
Pa
rtic
ipa
nt
Re
sp
on
se
Bit Rate Frame Rate Region of Interest
21
Implications of results
• A mid-range ROI was preferred– Optimal tradeoff between clarity in face and
distortion in rest of “sign-box”
• Lower frame rate preferred– Optimal tradeoff between clarity of frames and
number of frames per second
• Results independent of bit rate
22
Outline
• Motivation• Introduction• MobileASL Consumers• Eyetracking Motivation• Video Phone Study• Compression Challenges • Current Work• Conclusions
23
Rate, distortion and complexity optimization
H.264 H.264 encoderencoderH.264 H.264
encoderencoder
Inputparameters
Raw video
Compressed video
• H.264 standard provides 50% bit savings over MPEG 2, but with higher complexity.
• Objective: Achieve best possible quality for least encoding time at a given bitrate
25
Encoding/Decoding on the Cell Phone
• Implemented a command-line version of x264 on a cell phone using Windows Mobile Edition 5.0.
26
Encoding performance with and without code optimization
0
2
4
6
8
10
12
highsettings
mediumsettings
lowsettings
lowestsettings
fra
me
s p
er
se
co
nd
(a
ve
rag
e)
Wireless MMXOptimized
Unoptimized
QVGA 320x240
27
Outline
• Motivation• Introduction• MobileASL Consumers• Eyetracking Motivation• Video Phone Study• Compression Challenges• Current Work• Conclusions
28
Dynamic Region-of-Interest
• Skin detection algorithms
• Region-based metric for
bit allocation– Automatically determine
priority for face and hands based on currently available bitrate.
29
Activity Recognition• Can save data and power by detecting:
– Fingerspelling• Increase frame rate for better intelligibility
– Signing• Sign language-specific encoding
– “Just listening”• Less processing and transmission needed
• (demo)
30
User Interface• Leverages users’ prior experience with video
conferencing interfaces (such as Sorenson, HOVRS, etc.)
• Optimized for small screen space
Initial state user interface•Incoming Video Stream•Outgoing Video Stream•Control Toolbar•Toggle Privacy Mode•Toggle Chat View•Video Screen Layout•Toggle Status Bar
32
MobileASL Team
• Principal Investigators – Richard Ladner, Eve Riskin, and Sheila Hemami (Cornell)
• Graduate Students– Anna Cavender, Rahul Vanam, Neva Cherniavsky,
Jaehong Chon, Dane Barney, Frank Ciaramello (Cornell)
• Undergraduate Students– Omari Dennis, Jessica DeWitt, Loren Merritt
• National Science Foundation
Cyber infrastructure for Advancing Deaf & Hard of Hearing in STEM
Richard Ladner* Jorge Díaz-Herrera^ James J DeCaro^+ E William Clymer^+ Anna Cavender*
University of Washington* Rochester Institute of Technology+National Technical Institute for the Deaf^
34
• Advancing Deaf and Hard of Hearing people in STEM fields through better access to education.
Our Goal
35
Problems• Deaf students pursuing STEM fields need skilled
interpreters and captioners with specific domain knowledge.– The best interpreter may not be at the student’s
locale.
• Deaf students face challenging classroom environments: multiple sources of information are all visual– “Deaf Whiplash”
• Sign language is growing to include STEM vocabulary– Community consensus is required.
38
Summit to Create a Cyber-Community to Advance Deaf and Hard-of-Hearing Individuals in STEM
(DHH Cyber-Community)
• Scheduled for June 2008 – RIT/NTID• Discussion among the many stakeholders:
1. Deaf and hard of hearing students in STEM fields.
2. Faculty and administrators in colleges and universities with a commitment to deaf and hard of hearing students in STEM fields.
3. Interpreters and captioners.
4. Researchers who study sign vocabulary for STEM fields and interpreting and captioning for education.
5. Educational technology researchers.
6. Experts in multimedia and network services that use the national cyberinfrastructure (e.g., AccessGrid).
7. Companies already in the business of providing video relay interpreting (VRI) and real time captioning (RTC).
8. Leaders in organizations who have an interest in advancing deaf and hard of hearing students in STEM fields.
• Contact us if you have ideas for participation.
39
Questions?
Thanks!
MobileASL Webpage:www.cs.washington.edu/research/MobileASL
Richard Ladner:[email protected]
Anna Cavender:[email protected]