Detecting Activities of Daily Living in First-person Camera - People

Detecting Activities of Daily Living in First-person Camera Views

Hamed Pirsiavash, Deva Ramanan

Computer Science Department, UC Irvine

Motivation A sample video of Activities of Daily Living

Applications Tele-rehabilitation

•  Kopp et al,, Arch. of Physical Medicine and Rehabilitation. 1997.

•  Catz et al, Spinal Cord 1997.

Long-term at-home monitoring

Applications Life-logging

•  Gemmell et al, “MyLifeBits: a personal database for everything.” Communications of the ACM 2006.

•  Hodges et al, “SenseCam: A retrospective memory aid”, UbiComp, 2006.

So far, mostly “write-only” memory!

This is the right time for computer vision community to get involved.

There are quite a few video benchmarks for action recognition.

Collecting interesting but natural video is surprisingly hard. It is difficult to define action categories outside “sports” domain

Related work: action recognition

UCF sports, CVPR’08

UCF Youtube, CVPR’08

KTH, ICPR’04

Hollywood, CVPR’09

Olympics sport, BMVC’10

VIRAT, CVPR’11

Wearable ADL detection

It is easy to collect natural data

Wearable ADL detection

ADL actions derived from medical literature on patient rehabilitation

It is easy to collect natural data

•  Challenges –  What features to use? –  Appearance model –  Temporal model

•  Our model –  “Active” vs “passive” objects –  Temporal pyramid

•  Dataset

•  Experiments 8

Outline

Challenges What features to use?

Low level features

(Weak semantics)

High level features

(Strong semantics)

Human pose

Difficulties of pose: •  Detectors are not accurate enough •  Not useful in first person camera views

Space-time interest points

Laptev, IJCV’05

Challenges What features to use?

Low level features

(Weak semantics)

High level features

(Strong semantics)

Human pose Object-centric features Space-time interest points

Laptev, IJCV’05 Difficulties of pose: •  Detectors are not accurate enough •  Not useful in first person camera views

Challenges Occlusion / Functional state

“Classic” data

Challenges Occlusion / Functional state

Wearable data

“Classic” data

Challenges long-scale temporal structure

“Classic” data: boxing

Challenges long-scale temporal structure

time Start boiling

water Do other things (while waiting)

Pour in cup Drink tea

Difficult for HMMs to capture long-term temporal dependencies

Wearable data: making tea

“Classic” data: boxing

•  Dataset

Outline

“Passive” vs “active” objects

Passive Active

Passive Active 17

Passive Active

Better object detection (visual phrases CVPR’11) Better features for action classification (active vs passive)

Appearance feature: bag of objects

Bag of detected objects

fridge TV stove

SVM classifier

Video clip

Appearance feature: bag of objects

Bag of detected objects

SVM classifier

Video clip

Active fridge

Active stove

Passive fridge

Active fridge

Active stove

Passive fridge

Inspired by “Spatial Pyramid” CVPR’06 and “Pyramid Match Kernels” ICCV’05

Temporal pyramid Coarse to fine correspondence matching with a multi-layer pyramid

Temporal pyramid descriptor

Video clip

SVM classifier

time 21

•  Dataset

Outline

Sample video with annotations

Wearable ADL data collection

•  20 persons •  20 different apartments •  10 hours of HD video •  170 degrees of viewing angle •  Annotated

–  Actions –  Object bounding boxes –  Active-passive objects –  Object IDs

Prior work: •  Lee et al, CVPR’12 •  Fathi et al, CVPR’11, CVPR’12 •  Kitani et al, CVPR’11 •  Ren et al, CVPR’10

Average object locations

Active Passive Active Passive

Active Passive

Active objects tend to appear on the right hand side and closer –  Right-handed people are dominant –  We cannot mirror-flip images in training

•  Dataset

Outline

Experiments

Low level features High level features

Our model Object-centric features

24 object categories

Baseline Space-time interest points

(STIP) Laptev et al, BMVC’09

Accuracy on 18 action categories •  Our model: 40.6% •  STIP baseline: 22.8%

Classification accuracy

•  Temporal model helps

•  Temporal model helps •  Our object-centric features outperform STIP

•  Temporal model helps •  Our object-centric features outperform STIP •  Visual phrases improves accuracy

•  Temporal model helps •  Our object-centric features outperform STIP •  Visual phrases improves accuracy •  Ideal object detectors double the performance

Results on temporally continuous video and taxonomy loss are included in the paper

Summary

Data and code will be available soon!

Summary

Thanks!

Detecting Activities of Daily Living in First-person Camera - People

Documents

Transcript of Detecting Activities of Daily Living in First-person Camera - People

Tilburg University Detecting and explaining person misfit ... · Detecting and Explaining Person Misfit in Non-Cognitive ... Detecting and Explaining Person Misfit in Non-Cognitive

Utilization of portable digital camera for detecting cataract

A homography-based multiple-camera person-tracking algorithm

Detecting Activities of Daily Living in First-Person ...deva/papers/ADL_2012.pdfDetecting Activities of Daily Living in First-Person Camera Views ... (a fridge looks ... (Life-logging):

Robust Audio-Visual Person Veriﬁcation Using Web-Camera …groups.csail.mit.edu/sls/publications/2006/Schultz_MEng_Thesis_2006.pdfRobust Audio-Visual Person Veriﬁcation Using Web-Camera

Detecting Activities of Daily Living in First-person ...people.csail.mit.edu/hpirsiav/papers/adl_cvpr12.pdfproblem of detecting activities of daily living (ADL) in ﬁrst-person camera

Detecting events and key actors in multi-person videosvision.stanford.edu/pdf/ramanathan2016cvpr.pdf · 2016-04-13 · Detecting events and key actors in multi-person videos Vignesh

Camera On-Boarding for Person Re-Identification Using … · 2020-06-29 · Camera On-boarding for Person Re-identiﬁcation using Hypothesis Transfer Learning Sk Miraj Ahmed1,∗,

Detecting moving objects with an omnidirectional camera ...

Computer Vision and Image Processing for Automated ... · Given images of a person from camera view 1, nd matching person from camera view 2 Di cult: imperfect person detection

THERMAL IMAGING FOR PESTS DETECTING A REVIEW IMAGING FOR PESTS DETECTING ... A thermographic camera (also called an infrared camera or thermal imaging camera) is a device that forms

Unity 3D Third-Person Character and Camera …zeliggames.weebly.com/uploads/5/1/7/2/51725187/tpcc...Unity 3D Third-Person Character and Camera Controller – Documentation Contents:

Detecting Activities of Daily Living in First-person ...hpirsiav/papers/adl_cvpr12.pdf · Detecting Activities of Daily Living in First-person Camera Views Hamed Pirsiavash Deva Ramanan

Identifying First-person Camera Wearers in Third …krsingh.cs.ucdavis.edu/krishna_files/papers/first_third_person/1st...Identifying First-person Camera Wearers in Third-person Videos

Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva

Tilburg University Detecting and explaining person misfit in non-cognitive measurement Conijn, JM

Detecting Eating and Smoking Behaviors Using Smartwatchesdganesan/papers/mHealthBook...Detecting Eating and Smoking Behaviors Using Smartwatches 3 person may have taken less than 25

Detecting and Tracking People using an RGB-D Camera via ...

Multi-Person Multi-Camera Tracking for EasyLiving

Camera Style Adaptation for Person Re-Identificationopenaccess.thecvf.com/.../papers/Zhong_Camera_Style... · Camera Style Adaptation for Person Re-identiﬁcation Zhun Zhong†§,