Authors graph


This repo contains codes to build the fundmappeR project, an open source tool that webscraps portfolios of money market funds (MMF). MMFs have been at the center of the great financial crisis (Chernenko and Sunderam, 2014; Gorton and Metrick, 2012), the sovereign debt crisis (Corre et. al., 2012) and recently experienced some turmoil during the Covid-19 pandemic (Reuters, 2020) in March 2021.

Research on MMFs received a lot of attention, yet the barrier to enter the field is quite high since there is no off the shelve data available. The SEC is collecting and publishing MMFs portfolios, but those are stored on their servers in an inconvenient format. fundmappeR parses the SEC’s website for money market fund portfolio data and provides the data in an easily accessible format. The table is updated every month and can be accessed here.


This project is build on Amazon Web Services (AWS). AWS made it easy to automate the downloading, parsing and cleaning of the data. The data pipeline is run every month to fetch and add the latest data. In a first step, a lambda function checks the SEC website for new funds and sends a notification if new funds are found. Next, an EC2 instance runs R code that downloads the raw report to S3. Any S3 object put serves as an event trigger for lambda, which picks up the raw filing, parses its XML structure and creates four tables for a given fund-month report. These are stored in on S3, partitioned by year. AWS Glue crawler populates the data catalog, used for Adhoc queries in Athena. A Glue ETL job runs a pyspark script which transform the individial csv files into parquet tables and stores them in a public S3 bucket to make the data available to the user.

Architecture Diagram
Architecture Diagram


This project is implemented using Python and R and runs on AWS, leveraging several of its proprietary technologies. You can rebuild it using the codes published in this repo or you can download the tables directly:

import pandas as pd

# set s3path
s3_path = ""

# pick a table
tables = ["holdings_data","collateral_data","series_data","class_data"]
table = tables[0] ## 0-3

# pick a date 
date = 201906  ## 201112--today

# read to pandas
df = pd.read_parquet(f"{s3_path}/{table}/{date}/{table}_{date}.parquet")

There is also a web app that visualizes the results which can be viewed here.

There are four tables available, stored in monthly versions:

  • Class table: This table contains data on the individual fund share class. class_id and date act as primary key.
  • Series table: This table contains data on the individual fund (i.e. a series). series_id and date act as primary key.
  • Holdings table: This table contains data on the individual holdings. class_id, date and issuer_number act as primary key.
  • Collateral table: This table contains data on the collateral posted for secured items. class_id, date and issuer_number act as primary key.

A data dictionary is available here.

Database Schema
Database Schema
Jannic Alexander Cutura
Jannic Alexander Cutura
Software ∪ Data Engineer

My interests include distributed computing and cloud as well as financial stability and regulation.