Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On...

22
Overview: On May 29-31 st 2015, Distress Centre participated in a DataThon, organized by Data for Good. This 72 hour event brought together 50+ data volunteers to analyze over 1.25 million records of Distress Centre’s calls, 211 contacts, chats and texts dating back as far as 2003. Our long list of objectives tasked the groups to explore items of analytics; in particular working to identify any changes in call content and severity by time of day (daytime, evening and overnight). We hoped to discover what types of issues we are most successful at stabilizing, or reducing the risk. We looked for correlations between our data and public data sets like deaths by suicide in Alberta and weather records. Spatial mapping capabilities were utilized to present 211 data by the postal codes of callers, text analysis was used to compare the language and topics covered in the narrative pieces of call, chat and text records. What we gained: Distress Centre’s most important takeaways from this opportunity are as follows: DataThon legacy: We now have our data in shape so that it can be analyzed. This opens us up to increased opportunities for academic research, and for more robust and regular analysis. Our experience with Data for Good has energized our curiosity about our data and what stories it can tell us. We are convinced of the value and power of data – we’d like to pursue adding a data analyst role to our organization to have the internal competency to do this type of work, and compiling this type of information. We’ll be looking at a strategy to attract funding to support a position like this. We have been introduced to a variety of tools that could help us save time in analyzing data, to better present and show our findings and to look for correlations and trends that have previously gone unnoticed. We were connected to an incredible group of knowledgeable volunteers who were passionate about our work and looked at our data with fresh eyes, prompting new ideas of how to approach some of our analysis, including what data we could access publicly to compare to.

Transcript of Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On...

Page 1: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Overview:

On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for Good. This 72 hour event brought together 50+ data volunteers to analyze over 1.25 million records of Distress Centre’s calls, 211 contacts, chats and texts dating back as far as 2003.

Our long list of objectives tasked the groups to explore items of analytics; in particular working to identify any changes in call content and severity by time of day (daytime, evening and overnight). We hoped to discover what types of issues we are most successful at stabilizing, or reducing the risk. We looked for correlations between our data and public data sets like deaths by suicide in Alberta and weather records. Spatial mapping capabilities were utilized to present 211 data by the postal codes of callers, text analysis was used to compare the language and topics covered in the narrative pieces of call, chat and text records.

What we gained:

Distress Centre’s most important takeaways from this opportunity are as follows:

DataThon legacy: We now have our data in shape so that it can be analyzed. This opens

us up to increased opportunities for academic research, and for more robust and regular analysis. Our experience with Data for Good has energized our curiosity about our data and what stories it can tell us.

We are convinced of the value and power of data – we’d like to pursue adding a data analyst role to our organization to have the internal competency to do this type of work, and compiling this type of information. We’ll be looking at a strategy to attract funding to support a position like this.

We have been introduced to a variety of tools that could help us save time in analyzing data, to better present and show our findings and to look for correlations and trends that have previously gone unnoticed.

We were connected to an incredible group of knowledgeable volunteers who were passionate about our work and looked at our data with fresh eyes, prompting new ideas of how to approach some of our analysis, including what data we could access publicly to compare to.

Page 2: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

What we learned:

A collections of some of the initial DataThon findings

Please note this list is not a full representation and these results need to be validated to ensure they are accurate. More information would need to be shared on which data sets were included and what limitations were considered to ensure reliability.

• Representation of call issues (% of calls related to suicide, DV, addictions, mental health

etc.) remains quite consistent throughout the shifts of the day, however more financial related calls during the daytime.

• Severity of risk (% of semi urgent, urgent and emergent calls received) breakdown looked fairly consistent over all time periods.

• The calls appropriate for linking to MRT (mobile response team) remain a consistent percentage of our total call volume at all times of the day. So, if MRT were to start operating 24 hours, we are confident we would have the referrals to validate this.

• We didn’t find any indication that average call length changed considerably in afterhours. So the data doesn’t indicate that people aren’t “more chatty” at night.

• Most calls remain at a consistent level of risk (i.e. no change from initial risk assessment to final risk assessment).

• Of the calls where risk level changes, the most common change is for a call to move from urgent to semi-urgent showing some good de-escalation progress.

• We found that we are more likely to be able to de-escalate student’s calls more often than any other group of callers.

• No discernable weather correlation (high and low temperature) was found in 211 calls. However most of these calls are related to financial issues, where weather is less likely to impact.

• We’ve had nearly 500,000 interactions via chat and text since these modes were launched. The average conversation appears to include approximately 50 interactions. Urgent chats show an average of an additional 7 exchanges.

• No correlation between suicide-related calls and deaths by suicide was found – except maybe a weak correlation with calls from men related to suicide.

• We can put a myth to rest, no correlation was found between full moons and the number of calls or the severity of calls. So, there is no reason for staff or volunteers to avoid these shifts. They are no more demanding than any other day

• We have established a baseline for the monthly and seasonal expected changes for caller issues – which we can use to compare data to in the future.

Page 3: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

What it means to Distress Centre:

These results have validated that crisis happens at all times of the day, there is no time period that is more prone to higher risk crisis. 24 hour support is vital to ensure people have the help they need, when they need it.

Some of our internal myths and narratives are not supported by the data. For example, there doesn’t appear to be more high risk calls on months where deaths by suicide are highest. Days of full moons do not appear to have more intense shifts.

Where we’ll go now:

We have tons of ideas that have come out of the work this weekend. A few of the many areas where we’d like to dig further:

We’d like to recreate the weather analysis using crisis line data and look specifically and

emotions related issues (depression, loneliness/isolation). We’d like to look at emergency room admissions for suicide attempts and see if this data

might have more of a connection to our calls received. We’d like to explore why are we better at de-escalating student calls? Is this because a

higher percentage of our volunteers are students themselves? Are the issues presented by students “easier” to de-escalate? Do students have a tendency to present issues at a higher urgency than do other groups? Does the ability to de-escalate calls relate to the level of experience a volunteer has?

We’d like to map the unmet needs (calls where there was no referral available) by issue and by area the call was received from 211 to provide to better illustrate where some gaps in services exist so that new programs could be developed.

The datathon may be over, but our work is really just beginning. Our sincere thanks to Geoff Zakaib and Data for Good for partnering with us, for challenging us and for elevating the work of Distress Centre to a new level.

Page 4: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Distress Center Datathon Technical Summary Prepared By: Tawheed (Tao) Al-towaitee

The data analysis completed by the participants of the Datathon considered 6 different objectives of research in addition to a miscellaneous section where a variety of question where answered.

Analytics Team 1 – Afterhours focus

The objective of this group was to identify the relative value of the after-hours availability of the distress center services. To classify the calls and texts, three time slots where considered as follows:

o Daytime 9am-5pm

o Evenings 5-10pm

o Overnight 10pm-9am

The specific goals of this research objective were to:

Identify the issues discussed more frequently

Identify differences in average length of call

Identify difference in the severity of risk

Identify differences in demographic makeup of the clients

Tools and Technologies:

o Excel

o Splunk

o R Programing language

Analysis techniques:

The group started by started by classifying call issues into single traits (i.e suicide, financial …etc.). Then,

pivotal analysis was used to related frequency of calls by issue to different times during the course of 24

hours.

The following chart shows the frequency of calls by issue for all the three time slots. The charts below

considered each trait to be a separate call. The chart shows that in general, the overnight period has the

lowest frequency of calls for all issues.

Page 5: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

When the percentage change of call frequency for the different issues over the three designated time slots was considered the plot uncovered the following insights:

• Frequency of financial and family calls reduced slightly from daytime to evening and night. • Relationship and Physical health calls increased during the evening and night. • Mental health, abuse and addiction calls are consistent throughout the day.

Page 6: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

When the average call length by issue was analyzed the results showed that the average call length for “Harm to Others” is higher during the day than after the 5PM hour. This type of call displays a different trend compared to other calls.

The team also, analyzed the calls made during different times of the day on the basis of urgency and made the following conclusions:

• A high proportion of calls are semi-urgent • 42% of urgent calls occurred during the day, 10% in the evening and 48% at night • 44% of semi-urgent calls occurred during the day, 9% in the evening, and 47% at night. • 32% of emergent calls occurred during the day, 13% in the evening, and 55% at night • Ratio of urgent and emergent calls does not significantly change during the day

The team identified the top issue discussed to be Mental Health followed by Relationships. The following plot shows the change of call percentage by issue during the course of a day.

16

17

18

19

20

21

22

23

24

Daytime Evening Night

ChildSafety/Abuse

Suicide

Domestic

General

Harm_To_Others

Alcohol

Drug

Gambling

Page 7: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Distress Centre discussion: What made this findings more relevant was to take these results and layer them with the composition or volume of calls by time frame (44% of all calls are received in the daytime hours, 25% in evenings, and 31% overnight). So when you consider that 48% of urgent calls and 55% of emergent calls happen overnight, it becomes clear that the nighttime hours have a much higher concentration of high risk calls. Similarly, the evening time frame is the shortest, just 5 hours or 21% of the day. But it is busy, taking 25% of the calls. While there is a higher frequency of higher risk calls (more of them happen) they are overall, less of a percentage of total calls received during that time frame. The graph below was completed post DataThon by a volunteer Jan Topinka who aimed to demonstrate the flow of calls by hour as well as to show what calls came in during business hours (green) and non-business hours (blue), when other agencies are closed (evenings, weekends, overnights and holidays).

Page 8: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Analytics Team 2 The second analytics team tackled the following two questions:

Identify which types of calls we are most often able to de-escalate (reduce level of risk). Which types of

calls tend to escalate?

How does the percentage of call related to suicide, domestic violence, addictions and financial issues

change over time? Can we identify predictable seasonal or monthly change?

Tools and Technologies:

o Python

o R Programing language

Analysis techniques:

To address these questions, the team implemented this approach:

1. Develop baselines 2. Break variables into groups 3. Compare groups to baselines

Risk Change:

The results of mapping the initial risk level to the final risk level of the calls produced the results shown in

the chart below.

Page 9: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

The group also looked at the change of risk level away from emergent for specific caller group and the results are summarized in the table below:

Risk level analysis results evaluation:

● The majority of calls showed no change in risk level between the initial and final risk assessment. ● The risk level change from Urgent to Semi-Urgent is the most common according to the results ● Gender was not significant factor in the de-escalation of risk level ● Response for students is significant showing the highest change away from emergent risk level as

compared to other groups analyzed.

Seasonal and monthly trends The following plots shows the frequency of domestic, finance, addiction and suicide calls broken down monthly and weekly.

Consolidated:

Page 10: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Domestic and suicide calls frequency monthly and weekly

Page 11: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Finance and addiction monthly and weekly:

Discussion:

● Data used included calls from November 2011-December 2014. Therefore, November and December months had one year of additional data over all other months. So, the notable bump seen here is most certainly attributed to this. A more accurate reflection would have been to exclude the 2 months of 2011 data and simply use the three years of complete data. However, a more modest increase in financial related calls might still be seen as calls related to utility disconnections and the costs for gifts over the holiday season are common at this time of year.

● The results show a relatively significant drop in finance and addiction related calls on the weekends.

Page 12: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Analytics Team 3

The third analytics team tackled the following two questions: 1. Identify if there is any correlation in the months where the highest numbers of deaths by suicide have

occurred in Alberta and the issues/severity of risk presented in Distress Centre calls and chats.

2. Identify how many calls could have been linked to our primary mobile partner if they operated 24

hours (calls between 9pm and 9:30am).

Suicide Calls Seasonality The team started by looking at monthly suicide data in Alberta between 2005 -2009.

Suicide Rates / Suicide Related Calls Analysis

• There was no meaningful connection between the calls that mentioned suicide as a topic and actual suicide rates

• There was an observable (but non-significant) correlation between callers whose condition was rated as suicidal and actual suicides within Alberta

Suicide Findings Observed

• Men might tend to under call and are categorized as suicidal at a much more escalated point as women. It might also mean that women can be deescalated from a suicidal state better than men. In general it appears that the training is successful in identifying a suicidal state and escalating the situation. A take away could be that men should be escalated at an earlier point to run a more successful intervention.

• Future analyses could focus on demographics that are associated with suicide rates (men at a certain age rage which needs to be clarified) and pinpoint what times and months they are more likely to call in order to adjust staffing accordingly.

Page 13: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

MRT after Hours Analysis

A population of 75,924 Distress Centre callers was considered. An availability interval for Mobile Response Team (MRT) was considered from 9:30am to 9.30pm, regardless of the actual availability interval, in order to homogenize the availability interval and dichotomize the day and the night hours.

The final risk assessment on behalf of the volunteers was categorized as Emergent, Urgent, Semi-urgent, Non-Urgent, if known.

Risk assessment was available for 53,398 callers.

The number and percentage of callers per risk between day and night was estimated as reported in Table 1:

Page 14: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

MRT after Hours Finding Observed

• Summary of Findings indicate that after hours call pattern based on Category remain the same. o Emergent | Urgent | Semi-Urgent | Non-Urgent

• In the absence of direct MRT referral data the extrapolation was made that based on call patterns ¼ of the calls that may benefit from MRT intervention are not served.

• Results support 7/24 MRT to address the key areas reviewed o 8 PM Peak Time for Suicide calls o 10 PM Peak Time High Risk Callers o Analysis not done- Cost of CPS, EMS, AHS Emergency incurred via lack of MRT

Page 15: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Analytics Team 4 The fourth analytics team tackled the following two questions: 1. Identify is there is a correlation between poor weather and certain issues presented (i.e. depression,

loneliness or suicide following a period of rain or a cold spell)

2. Obtain a count of how many calls does the average volunteer will take in one year?

Tools and Technologies:

o Ruby

o R Programming language

Analysis techniques:

As per the analysis, this group employed descriptive statistical techniques such as normal distribution,

confidence interval, histogram, stacked bar plots and so on.

Weather Type and call issues:

For the purposes of this analysis the group classified the weather type into 7 categories as show in the

figure below.

Note 1: Due to time restraints, not all data for gambling calls was able to be entered in time for the material to be

presented.

Page 16: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Average calls received by a volunteer: As shown in the figures bellow, volunteers are able to take more calls for 211 compared to the Crisis Line calls. However, the distress center would need more volunteers for the crisis line at any given time.

Distress Centre Discussion: As a correction to the information presented, volunteers answer the crisis lines, but it is staff who respond to the 211 calls. These staff work either full time or relief hours, but are on shift on a much higher frequency than volunteers who generally do between 2-4 shifts per month. That explains the large variance of calls answered/person. Furthermore, 211 calls are significantly shorter than crisis calls, at an average of 4-5 minutes per 211 call and 13 minutes per crisis call. What is interesting to note is that despite a considerable increase in the volunteer pool in recent years, the average # of calls per volunteer has increased in 2014. This is a result of higher demand recently and validate Distress Centre’s addition of more volunteer shifts during peak hours to help reduce the burden of the increased call volume on volunteers.

Page 17: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Spatial Mapping Team 5

The fifth analytics team completed the following tasks:

1. Create an interactive map that shows what types of 211 calls are received from different areas using

postal code data.

2. Create a specific map that demonstrates where flood related calls were received from.

Team 5 mapped spatial data to calls postal code data and created a visual representation of data using Tableau and PYXIS WorldView. Team 5 also created a video visualization of call frequency and issue type in the Calgary and High River area around the flood time that showed the city’s spirit of volunteerism and cooperation.

211 calls around the flood .wmv

Page 18: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Text Analysis Team 6 The sixth analytics team efforts were focused around these tasks: 1. Identify different language used by youth clients than adult crisis clients.

2. Identify words and phrases used by clients when talking about suicide.

3. Obtain a count of the total # of interactions we have had in chat and text (not conversations, but how

many back and forth interactions total for each mode of service).

Total number of interactions

• 292,117 interactions with teens in total with online and SMS included • Additional 201802 interactions with adults • Grand total: 493,919 interactions • Does not include email

Page 19: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Some of the common words that appeared more frequently in Teen Chat and Teen SMS:

Page 20: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

Miscellaneous For this category of data analysis the volunteers looked at number of ideas including some suggestions form the Distress Center including the following:

1. Average call length per volunteer for Crisis Calls 2011-2014 (Darker bars indicate a higher number of calls taken)

2. Full Moon call frequency

Page 21: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

3. Full Moon Call length:

4. Full Moon Risk Level:

Page 22: Overview - Meetupfiles.meetup.com/...DataForGood_DataThon_May29-31-2015_Technic… · Overview: On May 29-31st2015, Distress Centre participated in a DataThon, organized by Data for

5. Call frequency on rainy days ( Darker bars represent higher levels of precipitation)

6. Call frequency during snowfalls (Red bars indicate snow on the ground)