Why create Dexter?
Dexter was built with the aim of using trends in media coverage to contribute to creating a free and fair media environment by monitoring what is covered and how it is covered. We all know that the media has limited resources, not only in terms of time but also the topics and issues it covers. Therefore, it matters who they give a voice to and what they choose to speak about when given that opportunity. We also know that the news media doesn’t act merely as a mirror of society, but it re-presents and reinforces certain power dynamics in society. News media have the power to both bolster existing areas of power and also challenge it through ensuring a diversity of views, voices and issues. At the same time news media serves a critical role in holding the powerful to account, or as a media scholar phrased it, “media should give comfort to the afflicted and afflict the comfortable.” The immediate question is how can we determine to what extent media either reinforces power dynamics or how they help give voice to the marginalised? The short answer is we do it through media monitoring.
Dexter’s main aim is to analyse voice, representation and diversity in media, to know who is given a voice and who is excluded. Who is spoken about and how they are spoken about from the perspective of gender representation, marginalised voices, and race representation. It further helps determine the quality of the media to drive policy and decision making that ensures good quality journalism that represents society fairly.
How does Dexter do this?
Open Cities Lab (OCL), the developers behind Dexter, use NLP to break online media articles into pieces that we can further analyse, simply turning qualitative data into quantitative data. It does this by extracting entities from online media articles, such as people, places, companies, and utterances in the article related to the entities.
As an example, let us assume that Dexter is going to process this News24 article entitled “Several Lesotho foreign nationals sent to prison” . It will identify that the article is referring to foreign nationals. It will also recognise that the word arrest was used in relation to them and extract the quote used by the Hawk’s spokesperson. All of this is done by an automated tagging system that attaches the relevant tag to the article being monitored.
The focal point for Dexter is to understand the utterances from the sources quoted in the article and how these speak to the various entities. All this data is stored on a database that can be easily queried by MMA to understand media coverage during a particular period of time.
A key addition to the Dexter model is adding the ability for human monitors to access the data processed by the NLP. This allows an additional quality check that ensures that the information extracted is accurate and quality checked. We know that NLP is not perfect. In fact, one could argue that first language speakers themselves are not always able to infer what is being written about or spoken about. But a lot of the NLP models currently are trained using North American English that does not work seamlessly with Southern African based media. NLPs are not trained to work on how we digital write or work with the entities that are more prominent in our spaces.
How do we define where Dexter can help?
In a nutshell, Dexter is one of the best, if not the best, tool to answer any question that relates to media coverage on any topic for a period of time. MMA, for instance, uses it as an evidence based approach to answer various questions, sometimes in collaboration with other stakeholders, to drive research work and develop policy recommendations.
During the 2021 local government elections, for instance, Dexter helped define how the media covered the elections (MMA has been monitoring the coverage of every South African national and local election since our first democratic elections in 1994). Of course, that is a difficult question to answer but MMA’s team has over the years perfected the methodology that can help answer them. The dashboard developed helped answer various questions such as:
- Which party received the most coverage in the media space through quotations and interviews.
- The gender representation of the quoted media sources.
- The proportional fairness for the media in their coverage.
- The topics covered in the media space.
At the same time, Dexter houses a mammoth database of more than 2 500 000 articles, rich with textual data and tags that in itself is useful to developing machine learning models and conduct further scientific research. A recent collaboration with the University of KwaZulu Natal has helped develop a model that can distinguish speakers from each other and their opinions on it. In the political landscape, this can have significant impact, for instance, in allowing voters to distinguish between political parties and determine where they stand in regards to a particular topic.
What are our learnings?
Whatever the need is, our OCL team has realised the importance of engaging the end user is key in every step of the Design Thinking Process. MMA has over the years received numerous requests for developing sub-tools from Dexter. All these requests have been delivered by carefully performing user design sessions where we try to understand the features that are most valuable to those who will be using this tool to make decisions or answer questions. Simply, we have learnt the value of putting the user front and centre of the product and approaching it in the sense of design with rather than design for.
We have also realised how easily a platform like this can continue to grow during the digital age, and as access to the internet grows, and people’s reliance on online information increases, online media will also continue to expand. Dexter’s database in the last 6 years has grown three-fold. The key is balancing the development of systems that manage and search this data seamlessly with the resources available to make the changes.
What is next?
NLP, as used in Dexter’s case, has significant potential in helping us train models that extract valuable information from textual data. It gives us the opportunity to reduce the time required that would previously be taken to work on reports or articles with the important touch of human intervention.
Everyone dreads reading long reports. However, we can’t run away from the fact that they are here to stay in our day to day lives. The government, for instance, uses them to communicate crucial information to its residents and so do many businesses, private and public. Dexter provides the ability for us to analyse these reports and extract important pieces that make it easier to understand what is being said. The sky is the limit here with the potential for Dexter to analyse the Zondo Reports or regular council minutes; turning them into bite size chunks of information that are easily consumable.
At the same time, there is further prospect for NLP models to be trained to work with textual data that isn’t written in American English. Human language is complicated but it is even more difficult to extract useful information when the models are trained mostly on one form of language. This is especially a challenge in South Africa, for instance, which has 11 official languages and even its own version of English. The same can be said about other African countries. We, at OCL, are quite positive about this and see huge potential in developing these models.
As Dexter grows from strength to strength, as technology continues to improve, and as further research is conducted into how NLP models can contribute to Dexter’s development, the potential for organisations to use the one of a kind tool also grows.