Darllenwch y dudalen hon yn Gymraeg
While accurate and timely statistics have always been recognised as a key input to good quality decision and policy making within government, this past year has put data at the forefront of strategy and reporting in a new way due to the COVID-19 pandemic. The public appetite for data has risen and products such as data dashboards are being frequently used by people who may not have been actively engaged in data previously.
In April 2020, the Welsh Government set up a Data Science Unit to explore how new data sources, techniques and systems could be adopted to improve how we use data and how we make information more accessible to everyone. This blog is the first in a series that introduce some of the work that is underway.
Welsh Index of Multiple Deprivation – Clusters
The Welsh Index of Multiple Deprivation (WIMD) is an award winning output from the Welsh Government that makes information about a broad range of deprivation types publicly available at very granular levels. Data is published for Lower Level Super Output Areas, which are small areas that represent about 1,600 people, and report on deprivation indicators for income, employment, health and education among others.

Recently the WIMD team published a blog discussing analysis to better understand representation of specific groups in deprived areas, and how deprived areas have been more severely impacted by coronavirus in Wales.
The Data Science Unit has been exploring WIMD and applying data science techniques to group LSOAs based on their scores across all of the indicators. This process is sometimes referred to as segmentation or clustering. The purpose of carrying out the analysis is to group the areas that are similar in deprivation characteristics which can allow us to improve our understanding of how different types of deprivation commonly co-occur. From our groups we can develop– descriptive summaries of the different deprivation types.
The process we are exploring has first identified three groups of areas which seem to reflect the rural/urban split to some extent. In our analysis we have found that geography unsurprisingly has a strong relationship with the types of deprivation experienced by areas. Our next stage is to break these groups down into sub-groups to better understand how the deprivation might differ within those geographical areas. At the end of the clustering process we will have around 10 written descriptions of deprivation groups that capture our 1,909 LSOAs.
We hope that this work will achieve two things. Firstly, that the clusters will give us new insight into the indicator data from a more overarching position, to support further analysis by those who want to explore the data. Secondly, that the clusters are relatable and interpretable through their descriptions so that a wider audience can engage with the information, for example to identify communities facing very similar types of deprivation in different areas of Wales. Through the use of data science we hope that we can help our outputs become even more inclusive to people in the public and wider government who may not normally engage with data but can relate to the descriptions.
Three cluster groups for deprivation indicators in Wales

The data science unit are making good progress on this work and will be sharing more in future. Also, look out for the next part of this blog series where we will introduce how we are using internet speed test data to understand the quality of internet that people receive across Wales.
The analysis is being carried out in R, a statistical programming language. R is an open source programming language that has a large online community and is free to use. To find out more about R you can check out the R project website.
If you want to get in touch with us please email datascienceunit@gov.wales
Steven Hopkins, Lead Data Scientist