How might we determine the impact of the Coronavirus Pandemic on global humanitarian efforts?
COVID in Pixels explores the impact that COVID-19 has had on humanitarian efforts using Natural Language Processing (NLP) techniques on global media discourse.
PROJECT DURATION
3 Months
MY ROLE
Researcher
Data Scientist
Designer
TEAM
Tamara Lottering
Tashfeen Ahmed
Xiaohang Xu
Minjia Zhao
Jin Mu
TOOLS & METHODS
Python for NLP
Topic Modelling
TF-IDF
Collocation Analysis
HighCharts.js
AmCharts
THE PROBLEM
The first case of the coronavirus (COVID-19) was reported to the World Health Organization (WHO) in December 2019, and the virus was subsequently declared a pandemic in March 2020. The pandemic has had an enormous impact on economies across the globe, as governments are shifting their efforts towards healthcare interventions for testing, hospitalization and vaccination development; reviving the economy and distributing stimulus packages for unemployed citizens etc, putting their citizens first.
RESEARCH QUESTION
We explored the following research questions in this project:
What is the impact of COVID-19 on the language around 'humanitarianism' in global media discourse?
Who donates humanitarian aid and who are its recipients? What kinds of aid is donated and what are the values attached to it?
METHODOLOGY
We used text analysis and NLP to analyse a corpus of humanitarian media discourse from three categories of countries, predefined by our data holders at the University of Edinburgh and UNHCR. The humanitarian media discourse corpus was analysed from Dec 2019 - Aug 2020.
Euro-Atlantic Countries: USA, UK, Germany, France
Gulf Donors: UAE, Kuwait, Qatar, Saudi Arabia
New Global Media Players: China, Russia, Iran and Turkey
This analysis gave us insight into:
The approaches of global nations humanitarian efforts.
Where humanitarian aid was flowing.
The key topics discussed in global media during that period of time.
Text Analysis and NLP
Initially, our cleaning & pre-processing involved corpus tokenisation, normalisation, stemming, and stopword processing to clean the dataset. Thereafter, we analysed the collocation of terms using n-grams. We found that Trigrams (n=3) yielded better results than bigrams (n=2) since they revealed the context of how the terms were used.
To better understand the topics discussed in humanitarian media discourse over time, I performed LSA and LDA topic modelling and explored the most frequently used terms using TF-IDF. The term frequency gave us a better idea about the topics that we could compare.
Exploratory Data Analysis (EDA)
We used Highcharts.js and AmCharts to show visualisations. The goal was to make the viewers quickly grasp key insights from the data.
END RESULT
We built a website called COVID-in-Pixels, hosted on Github pages. It features a timeline of news articles along with other informative data visualisations. The data was generated in Python and exported to JSON for JavaScript-based visualisations.