ALISS
Technical Reference
Group
Agenda
• Introduction to the Health and Social Care Alliance Scotland (the ALLIANCE)
• Information management and quality issues with out-going product• Search• Data quality• Duplication• Incentives • Location
• New website (lessoned learned)• Duplication solution• Data quality solution• Location solution• Relevance ?
• Categorisation• Roadmap
Introduction to the ALLIANCE
Our vision is for a Scotland where people of all ages who are
disabled or living with a long term conditions, and unpaid carers,
have a strong voice and enjoy their right to live well, as equal and
active citizens, free from discrimination, with support and services
that put them at the centre.
• National intermediary, 3rd Sector organisation and strategic
partner of the Scottish Government.
• Membership organisation, over 2,100 members.
• The ALLIANCE delivers its vision through many projects and
programmes.
Search on aliss.org
We allowed people to search “Health and wellbeing
resources”.
Resources included:
Title
Description
URL
Tags
Location - Address and Latitude/Longitude
Search
• Search was done using Elasticsearch.
• Users could search using keywords, location (latitude and longitude), and radius.
• Search was done using a ‘multi match’ query of type ‘most fields’ across the fields ‘title’, ‘description’, ‘tags’ and ‘url’.
• This meant that it done a full text query against each field and calculates a score using TF/IDF relevance, then adds up all the scores to give a total score.
Search
• Previously we used 'best_fields' which gave a score to each field individually then ordered results by the best score of any field.
• However this caused problems whereby resources with small descriptions or perhaps one tag that matched exactly with the search terms would come up higher than resources which were intrinsically better from the point of view of users.
• This tied into some of the problems of data quality we had.
Data quality
• ALISS also used a dataset with 33,000 resources that were not designed for public consumption, created as part of a reporting tool on charities activities whereby they had to provide information on themselves and their programmes.
• Much of the information was out of date, had inappropriate (or missing) descriptions and was generally based on “organisations” rather than specific “resources”.
Duplication
• Many of the resources were also duplicates, but in subtle
ways that we had trouble spotting via automation. E.g.
cases where the title was identical but description, locations
and tags were different.
• A further problem was that even if we could identify these
duplicates it was hard to remove them because many users
of the API filtered by resources only they had added, so
removing one resource added by someone in favour of
another caused complaints from users.
Incentives
Incentives
Locations
• The problem with locations was that the system only stored
locations as latitude and longitude but many users tried to
encourage their resource to show up in more searches by
including multiple, wide ranging locations.
• Also there were some resources that were excellent
examples of health and wellbeing resources that covered
geographical areas. These were available over the phone,
or delivered to users, such as telephone befriending
services or “meals-on-wheels” style services.
User’s perspective
1. Someone searched for “Scouts Dingwall” on Chrome browser.
2. ALISS entry was at top of results.
3. Person clicked on page.
4. Get Involved entry for “1st Dingwall Beaver Scouts. No description, location
is in London, URL takes to Get Involved where no information is held.
5. They then typed “Scouts” & “Dingwall” into ALISS and got 12 unhelpful
results.
6. They clicked on the 10th result and it was a LiU Argyll & Bute entry for 1st
Dingwall Scouts. The URL links to non-existent Facebook page, the tag
has 2 x locations in it, the location points to a mountain.
7. Unable to find useful information the person reported the resource and
asked for advice; “Hi I have a 9 year old son who would like to join the
scouts…”.
8. There is no 1st Dingwall Scout group, its 1st Ross & Sutherland group.
New website
Duplication solution
• We changed our schema to having organisations and services rather than just “resources” as it is much easier to solve duplication problems.
• We have one canonical organisation entry for each organisation, that can be “claimed” by a representative of the organisation.
• If a duplicate of an organisation is incorrectly added it is easier to point to the older or more developed or claimed one as the correct canonical one and delete the other.
• Similarly, service duplicates are easier to deal with in the same fashion.
Poor quality information
solution
• We have changed to only allow ALISS staff to add organisations and services.
• We are training volunteers and staff as moderators to add and edit information.
• We are also encouraging organisations to claim their listings.
• We have created data standards to benchmark minimum quality requirements and help encourage users to input good content.
• We will no longer import mass any datasets that impact thequality of the data.
Location solution
• We allow users to give services both a physical location and an
“Area Served” if required.
• Areas Served are official areas as designated through the use of
the National Records of Scotland, Scottish Postcode Directory
dataset.
• When a user searches with a postcode we are able to match
resources that are marked as serving that area, showing them in
the search results. This means that telephone services or delivery
services can be more easily found in relevant areas without
having an inappropriate or inaccurate physical location.
Relevance
When a user searches ALISS we want to return the best services to help them with their specific problem.
But…
The only two factors we have to put the right listing in front of the user is; Postcode and Search term (either free text or category).
What other considerations could or should we make in this area?
• Claimed organisation relevance factor?• “I found this information useful” relevance factor?
Roadmap
• Categorisation work – ongoing
• Content generation – ongoing
• Launch beta website - February 2018
• Launch API version 3 documentation & Terms of Service -
March 2018
• Business development (marketing, business models?) -
April 2018
Top Related