COVID in Pixels — Tamara Lottering

How might we determine the impact of the Coronavirus Pandemic on global humanitarian efforts?

COVID in Pixels explores the impact that COVID-19 has had on humanitarian efforts using Natural Language Processing (NLP) techniques on global media discourse.

PROJECT DURATION

3 Months

MY ROLE

Researcher

Data Scientist

Designer

TEAM

Tamara Lottering

Tashfeen Ahmed

Xiaohang Xu

Minjia Zhao

Jin Mu

TOOLS & METHODS

Python for NLP

Topic Modelling

TF-IDF

Collocation Analysis

HighCharts.js

AmCharts

THE PROBLEM

The first case of the coronavirus (COVID-19) was reported to the World Health Organization (WHO) in December 2019, and the virus was subsequently declared a pandemic in March 2020. The pandemic has had an enormous impact on economies across the globe, as governments are shifting their efforts towards healthcare interventions for testing, hospitalization and vaccination development; reviving the economy and distributing stimulus packages for unemployed citizens etc, putting their citizens first.

RESEARCH QUESTION

We explored the following research questions in this project:

What is the impact of COVID-19 on the language around 'humanitarianism' in global media discourse?
Who donates humanitarian aid and who are its recipients? What kinds of aid is donated and what are the values attached to it?

METHODOLOGY

We used text analysis and NLP to analyse a corpus of humanitarian media discourse from three categories of countries, predefined by our data holders at the University of Edinburgh and UNHCR. The humanitarian media discourse corpus was analysed from Dec 2019 - Aug 2020.

Euro-Atlantic Countries: USA, UK, Germany, France
Gulf Donors: UAE, Kuwait, Qatar, Saudi Arabia
New Global Media Players: China, Russia, Iran and Turkey

This analysis gave us insight into:

The approaches of global nations humanitarian efforts.
Where humanitarian aid was flowing.
The key topics discussed in global media during that period of time.

The directionality of flow of humanitarian aid globally.

Text Analysis and NLP

Initially, our cleaning & pre-processing involved corpus tokenisation, normalisation, stemming, and stopword processing to clean the dataset. Thereafter, we analysed the collocation of terms using n-grams. We found that Trigrams (n=3) yielded better results than bigrams (n=2) since they revealed the context of how the terms were used.

To better understand the topics discussed in humanitarian media discourse over time, I performed LSA and LDA topic modelling and explored the most frequently used terms using TF-IDF. The term frequency gave us a better idea about the topics that we could compare.

Exploratory Data Analysis (EDA)

We used Highcharts.js and AmCharts to show visualisations. The goal was to make the viewers quickly grasp key insights from the data.

END RESULT

We built a website called COVID-in-Pixels, hosted on Github pages. It features a timeline of news articles along with other informative data visualisations. The data was generated in Python and exported to JSON for JavaScript-based visualisations.

Visit COVID in Pixels