CO2 emissions in Europe — Mapping longitude and latitude to NUTS3

Combining geospatial data in python to visualize greenhouse gas emissions in Europe using geopandas

I recently wanted to aggregate data on CO2 emissions available from the Emissions Database for Global Atmospheric Research EDGAR (download it here) to NUTS3 level (NUTS1, NUT2, and NUTS3 are ever more granular territorial units for statistis, enabling cross-border statistical comparisons at various regional levels within the EU).

Let’s first setup the necessary environment:

import pandas as pd, numpy as np, seaborn as sns, geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Point

and get some data:

# import CO2 dataframe
co2 = pd.read_csv("data/v6.0_CO2_excl_short-cycle_org_C_2018_TOTALS.txt",
                  sep=';', skiprows=2)
# first look
co2.head()

    lat    lon  emission 2018 (tons)
0  89.9 -180.0              5.254660
1  89.9 -167.5              0.015244
2  89.9 -167.4              0.017199
3  89.9 -167.3              0.006027
4  89.9 -167.2              0.013846

This data, I wanted to map into Europe’s NUTS3 regions:

<strong>Europe CO2 emissions</strong>

The challenge is to map tuples of longitude and latitude into NUTS regions. Luckliy, the “border” of NUTS regions is contained in shape files, available at the Eurostat. Shapefiles are polygons, available in different resolutions, that trace the border of a (NUTS) region. They are a stanard object in spatial analysis and at the core of many applications (such as GIS).

First, we need a compatible coordinate reference system. There are different encodings, make sure to get coordinate reference system 4326. Download the zip file, unzip and load it using:

# import NUTS 3 shape file
nuts3 = gpd.GeoDataFrame.from_file('NUTS_RG_01M_2021_4326.shp')

nuts3.head()

  NUTS_ID  LEVL_CODE  ... FID                                           geometry
0      CZ          0  ...  CZ  POLYGON ((14.49122 51.04353, 14.49945 51.04610...
1      DE          0  ...  DE  MULTIPOLYGON (((10.45444 47.55580, 10.43954 47...
2      DK          0  ...  DK  MULTIPOLYGON (((15.19308 55.32014, 15.19056 55...
3      AL          0  ...  AL  MULTIPOLYGON (((19.83100 42.46645, 19.83568 42...
4      CY          0  ...  CY  MULTIPOLYGON (((34.60609 35.70767, 34.60060 35...

Here, we can see that for each NUTS_ID the colunm geometry contains a polygon (or multipolygon) that desribes the borders of the region.

Next we prepare the co2 dataframe:

# turn two columsn in Point type
co2['coords'] = list(zip(co2['lon'], co2['lat']))
co2['coords'] = co2['coords'].apply(Point)

points = gpd.GeoDataFrame(co2, geometry='coords', crs=nuts3.crs)

points.head()

   lat    lon  emission 2018 (tons)                       coords
0  89.9 -180.0              5.254660  POINT (-180.00000 89.90000)
1  89.9 -167.5              0.015244  POINT (-167.50000 89.90000)
2  89.9 -167.4              0.017199  POINT (-167.40000 89.90000)
3  89.9 -167.3              0.006027  POINT (-167.30000 89.90000)
4  89.9 -167.2              0.013846  POINT (-167.20000 89.90000)

For each point in points above we want to check whether it is contained in any of the polygons of nuts3. This is not trivial. Luckily, geopandas has built-in functionality to achieve just that. This is called a spatial join:

# Perform spatial join to match points and polygons
merge = gpd.tools.sjoin(points, nuts3, op="within", how='left')

# drop non-merged obs
matched = merge[~merge['NUTS_NAME'].isna()]

# show result
matched.head()

        lat   lon  emission 2018 (tons) NUTS_ID  LEVL_CODE CNTR_CODE        CO2
21922  80.7  20.8              0.004540   NO0B2        3.0        NO   4.540380
21923  80.7  20.9              0.003293   NO0B2        3.0        NO   3.293150
21924  80.7  21.0              0.001719   NO0B2        3.0        NO   1.719140
21927  80.7  21.3              0.000154   NO0B2        3.0        NO   0.154194
23104  80.5  19.9              0.019575   NO0B2        3.0        NO  19.574600

Our merge was successful! For each lat/lon combination we found the associated NUTS_ID. geopandas made our life really easy (Note: It can be a pain to install it, in particular if you dont have admin priveleges on your machine. I uploaded a copy of the final data here) Multiple lat/long data points can be the same region (e.g. NO0B2), so we can aggreage those further:

agg = matched.groupby("NUTS_ID")["emission 2018 (tons)"].mean().reset_index()

Finally we can plot those:

<strong>Europe CO2 emissions</strong>

using this:

def plot_map():
    # create figure and axes for Matplotlib
    fig, ax = plt.subplots(1, figsize=(5, 5))

    nuts.plot(column='emission 2018 (tons)',  legend=True,
              cmap='OrRd',   scheme='quantiles',
              linewidth=0.1,
              legend_kwds={'loc':'lower center',
                           'interval': True,
                           'ncol':2,
                           'title':'CO2 emissions (kilo tons)'},
              ax=ax,  edgecolor='0.8')

    # remove the axis
    ax.set_axis_off()

    # add a title
    ax.set_title("European CO2 emissions by NUTS3", fontsize='x-large',fontweight="bold")

    minx, miny, maxx, maxy = -20, 23, 40, 70
    ax.set_xlim(minx, maxx)
    ax.set_ylim(miny, maxy)
    ax.margins(0)

    plt.tight_layout()
    # saving our map as .png file.
    fig.savefig('map_export.png', dpi=300)


plot_map()

Jannic Alexander Cutura
Jannic Alexander Cutura
Software ∪ Data Engineer

My interests include distributed computing and cloud as well as financial stability and regulation.

Related