CO2 emissions in Europe — Mapping longitude and latitude to NUTS3
Combining geospatial data in python to visualize greenhouse gas emissions in Europe using geopandas
I recently wanted to aggregate data on CO2 emissions available from the Emissions Database for Global Atmospheric Research EDGAR (download it here) to NUTS3 level (NUTS1, NUT2, and NUTS3 are ever more granular territorial units for statistis, enabling cross-border statistical comparisons at various regional levels within the EU).
Let’s first setup the necessary environment:
import pandas as pd, numpy as np, seaborn as sns, geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Point
and get some data:
# import CO2 dataframe
co2 = pd.read_csv("data/v6.0_CO2_excl_short-cycle_org_C_2018_TOTALS.txt",
sep=';', skiprows=2)
# first look
co2.head()
lat lon emission 2018 (tons)
0 89.9 -180.0 5.254660
1 89.9 -167.5 0.015244
2 89.9 -167.4 0.017199
3 89.9 -167.3 0.006027
4 89.9 -167.2 0.013846
This data, I wanted to map into Europe’s NUTS3 regions:
The challenge is to map tuples of longitude and latitude into NUTS regions. Luckliy, the “border” of
NUTS regions is contained in shape files, available at the Eurostat.
Shapefiles are polygons, available in different resolutions, that trace the border of a (NUTS) region. They are a stanard
object in spatial analysis and at the core of many applications (such as GIS
).
First, we need a compatible coordinate reference system. There are different encodings, make sure to get coordinate reference system 4326. Download the zip file, unzip and load it using:
# import NUTS 3 shape file
nuts3 = gpd.GeoDataFrame.from_file('NUTS_RG_01M_2021_4326.shp')
nuts3.head()
NUTS_ID LEVL_CODE ... FID geometry
0 CZ 0 ... CZ POLYGON ((14.49122 51.04353, 14.49945 51.04610...
1 DE 0 ... DE MULTIPOLYGON (((10.45444 47.55580, 10.43954 47...
2 DK 0 ... DK MULTIPOLYGON (((15.19308 55.32014, 15.19056 55...
3 AL 0 ... AL MULTIPOLYGON (((19.83100 42.46645, 19.83568 42...
4 CY 0 ... CY MULTIPOLYGON (((34.60609 35.70767, 34.60060 35...
Here, we can see that for each NUTS_ID
the colunm geometry
contains a polygon (or multipolygon) that desribes the borders of the region.
Next we prepare the co2
dataframe:
# turn two columsn in Point type
co2['coords'] = list(zip(co2['lon'], co2['lat']))
co2['coords'] = co2['coords'].apply(Point)
points = gpd.GeoDataFrame(co2, geometry='coords', crs=nuts3.crs)
points.head()
lat lon emission 2018 (tons) coords
0 89.9 -180.0 5.254660 POINT (-180.00000 89.90000)
1 89.9 -167.5 0.015244 POINT (-167.50000 89.90000)
2 89.9 -167.4 0.017199 POINT (-167.40000 89.90000)
3 89.9 -167.3 0.006027 POINT (-167.30000 89.90000)
4 89.9 -167.2 0.013846 POINT (-167.20000 89.90000)
For each point in points
above we want to check whether it is contained in any of the polygons of nuts3
.
This is not trivial. Luckily, geopandas
has built-in functionality to achieve just that. This is
called a spatial join:
# Perform spatial join to match points and polygons
merge = gpd.tools.sjoin(points, nuts3, op="within", how='left')
# drop non-merged obs
matched = merge[~merge['NUTS_NAME'].isna()]
# show result
matched.head()
lat lon emission 2018 (tons) NUTS_ID LEVL_CODE CNTR_CODE CO2
21922 80.7 20.8 0.004540 NO0B2 3.0 NO 4.540380
21923 80.7 20.9 0.003293 NO0B2 3.0 NO 3.293150
21924 80.7 21.0 0.001719 NO0B2 3.0 NO 1.719140
21927 80.7 21.3 0.000154 NO0B2 3.0 NO 0.154194
23104 80.5 19.9 0.019575 NO0B2 3.0 NO 19.574600
Our merge was successful! For each lat
/lon
combination we found the associated NUTS_ID
. geopandas
made our
life really easy (Note: It can be a pain to install it, in particular if you dont have admin priveleges on your machine.
I uploaded a copy of the final data here)
Multiple lat/long data points can be the same region (e.g. NO0B2
), so
we can aggreage those further:
agg = matched.groupby("NUTS_ID")["emission 2018 (tons)"].mean().reset_index()
Finally we can plot those:
using this:
def plot_map():
# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(5, 5))
nuts.plot(column='emission 2018 (tons)', legend=True,
cmap='OrRd', scheme='quantiles',
linewidth=0.1,
legend_kwds={'loc':'lower center',
'interval': True,
'ncol':2,
'title':'CO2 emissions (kilo tons)'},
ax=ax, edgecolor='0.8')
# remove the axis
ax.set_axis_off()
# add a title
ax.set_title("European CO2 emissions by NUTS3", fontsize='x-large',fontweight="bold")
minx, miny, maxx, maxy = -20, 23, 40, 70
ax.set_xlim(minx, maxx)
ax.set_ylim(miny, maxy)
ax.margins(0)
plt.tight_layout()
# saving our map as .png file.
fig.savefig('map_export.png', dpi=300)
plot_map()