arXiv:2106.07565v1 [cs.CV] 27 May 2021

Video-Based Inpatient Fall Risk Assessment: A Case Study

Ziqing Wang1,2, Mohammad Ali Armin1, Simon Denman3, Lars Petersson1, David Ahmedt-Aristizabal1,3

Abstract— Inpatient falls are a serious safety issue in hospi-tals and healthcare facilities. Recent advances in video analyticsfor patient monitoring provide a non-intrusive avenue to reducethis risk through continuous activity monitoring. However, in-bed fall risk assessment systems have received less attentionin the literature. The majority of prior studies have focusedon fall event detection, and do not consider the circumstancesthat may indicate an imminent inpatient fall. Here, we proposea video-based system that can monitor the risk of a patientfalling, and alert staff of unsafe behaviour to help prevent fallsbefore they occur. We propose an approach that leverages recentadvances in human localisation and skeleton pose estimationto extract spatial features from video frames recorded in asimulated environment. We demonstrate that body positionscan be effectively recognised and provide useful evidence for fallrisk assessment. This work highlights the benefits of video-basedmodels for analysing behaviours of interest, and demonstrateshow such a system could enable sufficient lead time forhealthcare professionals to respond and address patient needs,which is necessary for the development of fall interventionprograms.

I. INTRODUCTION

Falls in the ward, in particular those from the bed, area persistent problem and are commonly associated withinjuries such as soreness and bone fractures and often resultin a prolonged hospital stay. In mental health hospitals andsome psychogeriatric units, these events are of particularconcern due to patient cognitive impairment, dizziness orvertigo [1], [2]. Such incidents are one of the main concernsfor all staff involved in the care of patients, and can leadto anxiety or guilt, and potentially litigation. In most hospi-tals, medical staff follow well-defined protocols to preventfalls, however research into systems capable of generatingimmediate alerts to enable medical assists to prevent fallshas received limited attention from researchers.

Considering the importance of patient behaviour moni-toring, several in-clinic patient monitoring systems usingcomputer vision and deep learning have been introducedto provide an objective assessment of a patient’s behaviour.These vision-based systems have attracted great attention dueto their non-invasive nature and have shown promising resultsin analysing patient-specific pose [3] (for example, sleepingpose [4], [5]), epileptic patients [6], breathing disorders [7]and infant motions [8]. Furthermore, camera-based fall de-tection [9]–[11] and fall prediction systems [12], [13], whichdetect when a fall occurs rather than seeking to predict a fall

1 CSIRO, DATA61, Canberra, Australia. Corresponding author:[email protected]

2 Australian National University, Canberra, Australia3 SAIVT Research lab, Queensland University of Technology, Brisbane,

Australia.

Fig. 1. Overview of the proposed fall risk monitoring system. Given asingle RGB image collected with a custom camera system placed in theceiling, the system generates 2D human pose predictions. Next, the relativeposition of the human and the bed is computed to predict the risk of falling.

before it happens, have also received considerable attentionrecently and have achieved effective results using existingsimulated datasets [14]–[16], or synthetic libraries [11], [17].However, while human pose estimation has become the de-facto standard for inpatient analysis, its application to theprevention of falls from the bed remains limited.

To prevent falls, some systems detect the position inwhich the patient is lying with respect to the edge of thebed, or detect the patient’s bed-exit behaviour. In thesescenarios, the system monitors a key human pose or humanmotion to predict the risk of falling. These studies have usedcommercial pressure mat systems to detect the human off-bed position [18]; however, pressure pads have been shownto generate a high volume of false alarms leading to alarmfatigue [19]. On the other hand, camera-based systems seekto detect a sitting posture [20] or the bed exit action froma sequence of human images [21]. Although, many hospitalpatients fall as they get out of bed, there are risk factors forfalling such as uncontrolled motions caused by agitations,restless sleep and abnormal dreams that lead to a patienttrying to climb out of the bed for protection, or ”jumping”from the bed. Patients attempting to perform these activitiesunassisted account for a large proportion of inpatient falls,and they are the focus of this paper.

In this paper, we explore the feasibility of adapting pose-based frameworks to identify patients’ behaviour and assessthe risk of falling from the bed.

Our main contributions are summarized as follows:1) We introduce a flexible vision-based fall risk detec-

tion system capable of detecting actions in a novelsimulated environment that may indicate an imminentinpatient fall.

2) We propose a robust but simple non-obtrusive moni-toring system to capture relative body position infor-mation to assess the risk of falling from a bed.

arX

iv:2

106.

0756

5v1

[cs

.CV

] 2

7 M

ay 2

021

Fig. 2. Human and bed localisation. A dynamic and fast instancesegmentation approach is used to localise the region of interest.

Fig. 3. Selected samples of human actions collected in the simulateddataset. Top: Not at risk of falling. Bottom: At risk of falling.

II. MATERIALS AND METHODS

In this paper, we propose a non-invasive landmark-basedapproach to capture in-bed human pose, and predict whetherthe human in an image will fall from the bed they arecurrently occupying.

A. Fall prevention simulated dataset and pre-processing

Existing vision-based fall detection datasets do not coverinpatient fall events, or examples of patients at risk of falling.Considering this limitation, we designed and collected a sim-ulated dataset with the actions of interest represented by twoclasses, not at risk and at risk of falling. To generate this datatwo main stages were covered: i) recruitment, experimentalsetup, and data collection, and ii) data annotation and pre-processing.

We collect data from participants in a simulated hospitalenvironment. Volunteers lie down on the bed and simulatein-bed patient actions such as trying to climb out of the bedfor protection, turning around, exiting the bed, and falling.All images and videos are collected with a custom recordingdevice equipped with a Microsoft Azure Kinect camera, andare saved for a further prepossessing phase.

To estimate relative positions of the human and bed, wefirst define the region of interest as the location of thebed that contains the participant. This process helps to dealwith different camera-bed viewing angles, and changes inthe inclination angle of the bed which may impact the riskassessment. We perform object boundary detection usingSOLO [22], which is a dynamic and fast state-of-the-artinstance segmentation method. In our network, we use pre-trained weights, trained on the COCO dataset [23], to detectthe human and the bed. Then, we crop and resize all images

Fig. 4. Representation of the human key points detected in a selectedimage.

to a resolution of 1080×828 pixels as input to the system.Fig. 2 depicts selected examples of the participant andbed detection. Finally, each image is separated into twoclasses to conduct the proposed analysis: at risk and notat risk of falling from the bed, as illustrated in Fig. 3. Weaddress the data imbalance in our dataset by adopting dataaugmentation techniques including oversampling and addingGaussian noise to images to obtain the same number ofsamples for both classes. We argue that this pre-processingstep does not impact generalization.

B. Fall risk assessment system

Fig. 1 shows the overall architecture of our frameworkwhich has three main modules: i) 2D key-point estimation,which takes an RGB image and produces body joint locationsin 2D space; ii) A human-bed relative position estimation andfeature engineering; and iii) a fall risk classifier which com-bines human pose and relative position features to accuratelydiscriminate between at risk and not at risk cases. In thefollowing, we describe in detail the individual componentsof our framework.

1) Human localisation and pose identification: Quantify-ing a person’s posture and limb articulation is useful for un-derstanding patient behaviour. Human pose estimation fromstatic images has shown strong performance in detectingpositions of interest for the analysis of seizure disorders [6]and bed-exit posture [20].

We aim to employ a robust 2D pose prediction techniqueto extract consistent poses from a hospital environment,where challenges such as self-occlusion and similarities be-tween the background and foreground are present. We adoptthe Mask-RCNN architecture [24] to predict 2D locations ofbody joints and their corresponding confidence scores, whichis a lightweight, yet highly effective approach implementedin Detectron2 [25]. Here, a keypoint location is modeled as aone-hot mask where Mask-RCNN predicts k masks, one foreach of K keypoints (17 key points coordinates in this study).We fine tune a pre-trained Mask-RCNN model trained on theCOCO dataset [23], enabling the model to detect keypointsclearly in our dataset. When participants perform activitiessuch as turning around, their body parts may overlap andsome key points cannot be detected clearly. Based on ananalysis of the pose estimation results, we define the moststable coordinates for a later feature engineering step. Fig. 4illustrates the keypoint layout and a detected human pose.

2) Relative position determination and feature engineer-ing: Human body postures during events such as the patient

Fig. 5. Representation of the distance feature estimation. Head and kneesdetected by the pose estimation algorithm are marked as green dots, and alllines represent the contour of the detected bed.

TABLE IMULTI-FOLD CROSS-VALIDATION PERFORMANCE (10-TIME AVERAGE)

Test Accuracy (%)Feature representation Light GBM SVM

Dis < Knee,Bed > 94.44 91.11Dis < Knee,Bed >+Dis < Head,Bed > 96.11 92.8917Keypoints+Dis < Knee,Bed > 96.67 93.8917Keypoints+Dis < Knee,Bed >+Dis < Head,Bed > 97.22 94.20

Fig. 6. Qualitative results of human pose estimation on patients not atrisk and at risk of falling (trying to climb out of the bed for protection).

climbing out of the bed are complex and varied, so it isessential to define a criteria to classify whether a patient isat risk of falling from the bed or not. One criteria is to trackand estimate how much of the knee is outside of the bed.

To calculate the distance between the knee and the bed,we need to determine which side of the bed a patient is mostlikely to exit from. As shown in Fig. 5, the left line, middleline and right line of the bed define the bed position, and thehead and knees are marked as green dots. When the headand the two knees are all on the left of the middle line, thehuman body is defined as being on the left side of the bed.When the head and two knees are all on the right of middleline, the human body is defined as being on the right sideof the bed. If the human body is on the left side of the bed,we will calculate the distance between the two knees andthe left line of the bed. If the human body is on the rightside of the bed, we will calculate the distance between thetwo knees and the right line of the bed. Defining where thehuman body is located can also help to remove unnecessaryfeatures and decrease feature dimensionality.

The distance feature is the distance between the knees andthe bed boundary. Based on the human body’s location onthe bed, we can determine which boundary should be usedto calculate the distance between the knee and the bed. If theknee is outside of the bed, the distance value is negative andif the knee is inside the bed, the distance value is positive.

3) Fall risk classification: Each output feature related tothe human pose coordinates and relative position between the

body and the bed (i.e. distance features) is fed to a classifierto learn probabilistic distributions with respect to the targetclass. We adopt the LightGBM classifier [26] as a fast,distributed, high performance implementation of gradientboosted trees for supervised classification with robustness tooverfitting. The following setting is used in the experiment:boosting type (gradient boosting decision tree), boostinglearning rate (0.1), number of boosted trees to fit (100),maximum tree leaves (31), maximum tree depth (no limit).We also use a traditional support vector machine (SVM)classifier with automatic Bayesian optimization: kernel (sig-moid), shrinking (true), cache size (200).

III. EVALUATION

A. Experimental setup

To evaluate and compare the most discriminative featuresfrom the landmark-based analysis, we adopt four feature sets:i) the distance between knees and the bed; ii) set (i) plus thedistance between the head and the bed; iii) set (i) plus 17body keypoint coordinates; and iv) set (iii) plus the distancebetween the head and the bed. These feature sets are listedin Table I.

As an ablation study, we investigate a region-based ap-proach by training a ResNet50 [27] architecture to extracta spatial representation directly from images, and performclassification using a fully connected layer with a sigmoid ac-tivation function. This model was implemented in Keras [28]and trained by optimizing the categorical cross-entropy losswith the Adam optimizer [29].

Both models (landmark-based and region-based), are as-sessed through a 10-fold cross validation to ensure that thetraining and test data is disjoint. For each fold, the datasample of each class is randomly split into 90% for trainingand 10% for testing.

B. Experimental results and discussion

While general pose estimation frameworks are effectivewhen subjects are located in uncluttered settings, they canbe unreliable when applied to noisy environments suchas patient monitoring rooms. This can cause even moreconfusion when multiple frames are fed into the framework,thus in our study we consider a static scene. From the poseestimation results on the dataset, we confirm that knee pointsare the most stable keypoints, so we choose to use the kneeas a reference to classify at risk events and use the headlocation to identify the human position on the bed. Selectedsamples showing estimated joint locations are presented inFig. 6.

The region-based approach achieved an average accuracyof 65.8% on the test set, whereas the landmark-based ap-proach using LightGBM achieved 97.22% accuracy. Thecross-validation performance for each set of features and theproposed classifiers are shown in Table I. Our landmark-based results indicate that larger feature sets can improvesystem performance. The best accuracy is obtained from thefourth feature set using 17 keypoint coordinates, the distancebetween the knee and the bed and the distance between the

head and the bed. This performance gain is likely a resultof the distance features indicating whether key body partsoverlap the boundary of the bed, which indicates a highprobability of falling from the bed. This relative positioninformation is hard to capture without feature engineeringand is the main reason that the region-based approach showslow performance.

In most hospitals, there are programs and policies for fallprevention, but there is limited research into systems capableof generating immediate alerts for medical assistance toprevent falls. A video-based alarm system for fall preventionsimilar to our proposed framework needs the ability to detecthighly accurate relative human position in the bed. Such asystem is able to issue an alert as early as possible onceit detects a position where there is a high risk of falling.Further, it is envisioned that our approach is cost effectiveand low maintenance.

The majority of inpatient fall studies focus solely on fallrisk factors but may not identify potential causal factors forfalls (e.g. what triggered the fall) which is necessary forfall intervention programs. An interesting direction for futureresearch is the creation of libraries of behaviours to identifythese factors and patients at high risk of falling in the earlystages of monitoring.

IV. CONCLUSIONSIn this paper, we introduce a vision based monitoring

system that incorporates state-of-the-art computer visiontechniques to assess the risk of falling from a bed. Con-sidering the lack of datasets to assess fall risk and fallprevention, we introduce a simulated dataset that includes in-bed human actions such as trying to climb out of the bed forprotection, turning around, and bed-exit events. Our resultsin this particular case study show a promising technologythat can have a positive impact on monitoring inpatients atrisk of falling. Our proposed system has a high accuracy,resulting in lower false alarm rates for medical staff andthus a reduction in the likelihood of alarm fatigue.

Ethics statement: The experimental procedures involv-ing human subjects described in this paper were approvedby the CSIRO Health and Medical Human Research EthicsCommittee (CHMHREC).

REFERENCES

[1] D. Oliver, F. Healey, and T. P. Haines, “Preventing falls and fall-relatedinjuries in hospitals,” Clinics in geriatric medicine, vol. 26, no. 4, pp.645–692, 2010.

[2] M. Vassallo, T. Azeem, M. Pirwani, J. C. Sharma, and S. C. Allen, “Anepidemiological study of falls on integrated general medical wards.”International Journal of Clinical Practice, vol. 54, no. 10, pp. 654–657, 2000.

[3] K. Chen, P. Gabriel, A. Alasfour, C. Gong, W. K. Doyle, O. Devinsky,D. Friedman, P. Dugan, L. Melloni, T. Thesen et al., “Patient-specific pose estimation in clinical environments,” IEEE journal oftranslational engineering in health and medicine, vol. 6, pp. 1–11,2018.

[4] S. Liu and S. Ostadabbas, “Seeing under the cover: A physics guidedlearning approach for in-bed pose estimation,” in MICCAI, 2019, pp.236–245.

[5] S. Liu, Y. Yin, and S. Ostadabbas, “In-bed pose estimation: Deep learn-ing with shallow dataset,” IEEE journal of translational engineeringin health and medicine, vol. 7, pp. 1–12, 2019.

[6] D. Ahmedt-Aristizabal, S. Denman, K. Nguyen, S. Sridharan, S. Dion-isio, and C. Fookes, “Understanding patients’ behavior: Vision-basedanalysis of seizure disorders,” IEEE journal of biomedical and healthinformatics, vol. 23, no. 6, pp. 2583–2591, 2019.

[7] M. Martinez, D. Ahmedt-Aristizabal, T. Vath, C. Fookes, A. Benz,and R. Stiefelhagen, “A vision-based system for breathing disorderidentification: A deep learning perspective,” in EMBC, 2019, pp. 6529–6532.

[8] N. Hesse, S. Pujades, M. Black, M. Arens, U. Hofmann, andS. Schroeder, “Learning and tracking the 3d body shape of freelymoving infants from rgb-d sequences,” IEEE transactions on patternanalysis and machine intelligence, 2019.

[9] N. Lu, Y. Wu, L. Feng, and J. Song, “Deep learning for fall detection:Three-dimensional cnn combined with lstm on video kinematic data,”IEEE journal of biomedical and health informatics, vol. 23, no. 1, pp.314–323, 2018.

[10] Y. Chen, W. Li, L. Wang, J. Hu, and M. Ye, “Vision-based fall eventdetection in complex background using attention guided bi-directionallstm,” IEEE Access, vol. 8, pp. 161 337–161 348, 2020.

[11] U. Asif, B. Mashford, S. Von Cavallar, S. Yohanandan, S. Roy, J. Tang,and S. Harrer, “Privacy preserving human fall detection using videodata,” in Machine Learning for Health Workshop, 2020, pp. 39–51.

[12] M. Hua, Y. Nan, and S. Lian, “Falls prediction based on bodykeypoints and seq2seq architecture,” in ICCV Workshops, 2019.

[13] A. Masalha, N. Eichler, S. Raz, A. Toledano-Shubi, D. Niv,I. Shimshoni, and H. Hel-Or, “Predicting fall probability based ona validated balance scale,” in CVPR Workshops, 2020, pp. 302–303.

[14] E. Auvinet, C. Rougier, J. Meunier, A. St-Arnaud, and J. Rousseau,“Multiple cameras fall dataset,” DIRO-Universite de Montreal, Tech.Rep, vol. 1350, 2010.

[15] I. Charfi, J. Miteran, J. Dubois, M. Atri, and R. Tourki, “Optimizedspatio-temporal descriptors for real-time fall detection: comparison ofsupport vector machine and adaboost-based classification,” Journal ofElectronic Imaging, vol. 22, no. 4, p. 041106, 2013.

[16] B. Kwolek and M. Kepski, “Human fall detection on embeddedplatform using depth maps and wireless accelerometer,” Computermethods and programs in biomedicine, vol. 117, no. 3, pp. 489–501,2014.

[17] U. Asif, S. Von Cavallar, J. Tang, and S. Harre, “SSHFD: Single shothuman fall detection with occluded joints resilience,” arXiv preprintarXiv:2004.00797, 2020.

[18] W. Viriyavit and V. Sornlertlamvanich, “Bed position classification bya neural network and bayesian network using noninvasive sensors forfall prevention,” Journal of Sensors, vol. 2020, 2020.

[19] R. I. Shorr, A. M. Chandler, L. C. Mion, T. M. Waters, M. Liu, M. J.Daniels, L. A. Kessler, and S. T. Miller, “Effects of an interventionto increase bed alarm use to prevent falls in hospitalized patients: acluster randomized trial,” Annals of internal medicine, vol. 157, no. 10,pp. 692–699, 2012.

[20] M. Inoue, R. Taguchi, and T. Umezaki, “Bed-exit prediction applyingneural network combining bed position detection and patient postureestimation,” in EMBC, 2019, pp. 3208–3211.

[21] M. Inoue and R. Taguchi, “Bed exit action detection based on patientposture with long short-term memory,” in EMBC, 2020, pp. 4390–4393.

[22] X. Wang, R. Zhang, T. Kong, L. Li, and C. Shen, “SOLOv2: Dynamicand fast instance segmentation,” Advances in Neural InformationProcessing Systems, vol. 33, 2020.

[23] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,P. Dollar, and C. L. Zitnick, “Microsoft coco: Common objects incontext,” in ECCV, 2014, pp. 740–755.

[24] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” in ICCV,2017, pp. 2961–2969.

[25] Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick, “Detectron2,”https://github.com/facebookresearch/detectron2, 2019.

[26] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, andT.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decisiontree,” in Advances in neural information processing systems, 2017,pp. 3146–3154.

[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning forimage recognition,” in CVPR, 2016, pp. 770–778.

[28] F. Chollet et al., “Keras,” 2017.[29] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-

tion,” arXiv preprint arXiv:1412.6980, 2014.

https://github.com/facebookresearch/detectron2

arXiv:2106.07565v1 [cs.CV] 27 May 2021

Documents

Transcript of arXiv:2106.07565v1 [cs.CV] 27 May 2021