A tutorial for using the Mapineq API from Python

Author

Doug Leasure

February 12, 2025

Introduction

The Mapineq dashboard and data catalogue are interactive tools that allow you to explore the Mapineq database and discover data relationships on the map. The dashboard uses the Mapineq API in the background to pull data from the Mapineq database, providing a user-friendly way to explore the database. If you are interested in automating this process or exporting many data sources, then you may benefit from accessing the API directly from your local R or Python console.

This tutorial demonstrates how to query the Mapineq API using Python. In general terms, an API (Application Programming Interface) is a tool that allows your computer to communicate with a remote server. In this case, we will use it to automate data imports from the MapIneq database server directly into our local R environment.

This tutorial covers the following topics:

  1. Fetching a list of data sources available from the MapIneq database for a given geographic level (i.e. NUTS level).
  2. Retrieving a list of years and geographic levels that are available for a data set.
  3. Identifying the data filters that are available for a data set.
  4. Querying data from a single data set.
  5. Querying data from two data sets joined by location.

For additional information about the API, please refer to our tutorial on using the API from R and the complete technical documentation for the API. You can also download the Quarto source code for this tutorial if you prefer to work with it directly in your Python development environment.

Setup

Install Required Libraries

We begin by loading the Python libraries that we need for the tutorial. We use the requests and json libraries for API requests and JSON processing, and the pandas library for data wrangling. If these libraries are not already installed on your local machine, then you can use the command line tool pip to install them:

pip install requests pandas

Import Libraries

Now let’s import those libraries into our Python session:

# Required for API queries
import requests
import json

# Required for data wrangling
import pandas as pd

API Base URL

We also need to identify the base URL for the Mapineq API:

url = "https://api.mapineq.org"

What follows is a series of steps to identify a data source and its characteristics and then query data from that source.

1. Fetch Sources by NUTS Level

Now, let’s get a list of data that are available in the Mapineq database for a given geographic level.

We will use NUTS administrative units to define geographic levels. NUTS regions are standardized administrative boundaries for Europe. NUTS level-0 are country boundaries, NUTS level-1 are the largest subnational administrative units, and NUTS level-3 are the smallest subnational administrative units.

Let’s take a look at how to get a list of all data available in the Mapineq database for NUTS level-2. We will do this by querying the Mapineq API endpoint get_source_by_nuts_level.

The first step is to select the API endpoint that we want to query and to define our values for its parameters:

# Identify the API endpoint to query
endpoint = "functions/postgisftw.get_source_by_nuts_level/items.json"

# Provide parameters required by the endpoint
params = {
  "_level": 2,  # define the geographic level (i.e. NUTS level)
  "limit": 500  # limit response to this many data sets
}

Notice that we defined the parameters to retrieve the data catalogue for NUTS level-2 and to limit the response to include no more than 500 data sets.

Now, we are ready to submit our API request:

# Submit HTTP GET request to query the API
response = requests.get(f"{url}/{endpoint}", params=params)

if response.status_code == 200:
    print(response)
else:
    raise Exception(f"Error: {response.status_code}")
<Response [200]>

The API response should have an HTTP status code of 200, which indicates that our API request was successful. The response contains the data list that we want in JSON format, so we now need to do some data wrangling to convert this into a DataFrame object.

# Convert JSON to DataFrame
data = response.json()
df_content = pd.DataFrame(data)

# View data
df_content.head(10)
f_description f_resource f_short_description
0 All causes of death by NUTS 2 regions (ESTAT) TGS00057 All causes of death
1 Area by NUTS 3 region (ESTAT) DEMO_R_D3AREA Area of regions
2 At-risk-of-poverty rate by NUTS 2 regions (ESTAT) TGS00103 At-risk-of-poverty
3 Available beds in hospitals by NUTS 2 region (... HLTH_RS_BDSRG2 Hospital beds
4 Available beds in hospitals by NUTS 2 regions ... TGS00064 Hospital beds
5 Business demography and high growth enterprise... BD_HGNACE2_R3 Business demography
6 Business demography by size class and NUTS 3 r... BD_SIZE_R3 Business demography
7 Causes of death by NUTS 2 region - absolute nu... HLTH_CD_YNRF Number of deaths
8 Causes of death by NUTS 2 region - absolute nu... HLTH_CD_YNRM Number of deaths
9 Causes of death by NUTS 2 region - absolute nu... HLTH_CD_YNRT Number of deaths

We now have a nice data frame that we can use to explore the data that are available in the Mapineq database. Note: We are only showing the first 10 rows of a much larger data frame.


2. Retrieve Years and NUTS Levels Available for a Data Source

For this step, we will select a single data source that we want to investigate further. Let’s look at At-risk-of-poverty rate by NUTS 2 regions from EuroStat (ESTAT). You will notice from the data catalogue that we retrieved in the previous step that the resource ID for this data set is TGS00103, which we will use for our next steps.

To retrieve years and NUTS levels that are available for this data source, we will use the API endpoint get_year_nuts_level_from_source:

# API endpoint
endpoint = "functions/postgisftw.get_year_nuts_level_from_source/items.json"

# Endpoint parameters
params = {"_resource": "TGS00103"}  # resource ID of the data set

# GET request to query API
response = requests.get(f"{url}/{endpoint}", params=params)

if response.status_code == 200:
    data = response.json()
    df_content = pd.DataFrame(data)
else:
    raise Exception(f"Error: {response.status_code}")

# Look at the data
df_content
f_level f_year
0 2 2012
1 2 2013
2 2 2014
3 2 2015
4 2 2016
5 2 2017
6 2 2018
7 2 2019
8 2 2020
9 2 2021

The API response provides all of the years and geographic levels covered by this data set.


3. Identify Data Filters Available for a Data Source

Many data sets available in the Mapineq database contain various data filters that can be applied to extract the data you are specifically interested in. These filters could include age, sex, education level, etc. Our next step is to identify which filters are available for the At-risk-of-poverty rate by NUTS 2 regions data that we are investigating.

To identify the filters that are available for this data source, we will use the API endpoint get_column_values_source_json:

# API endpoint
endpoint = "functions/postgisftw.get_column_values_source_json/items.json"

# Endpoint parameters
params = {
  "_resource": "TGS00103",
  "source_selections": json.dumps({
      "year": "2020",
      "level": "2", 
      "selected": []
  })
}

This endpoint requires a source_selections that is in JSON format. We implemented this in Python by creating a dictionary object and then converting it to JSON using the json.dumps function. Let’s take a quick look at what that final JSON parameter looks like:

print(params['source_selections'])
{"year": "2020", "level": "2", "selected": []}

Now we are ready to submit our API request:

# GET request to query API
response = requests.get(f"{url}/{endpoint}", params=params)

if response.status_code == 200:
    data = response.json()
    df = pd.json_normalize(data).explode("field_values")
    field_values_expanded = pd.json_normalize(df["field_values"])
    df_content = pd.concat(
        [df.drop(columns="field_values"), field_values_expanded], axis=1
    )
else:
    raise Exception(f"Error: {response.status_code}")

The API response contains a more complex JSON object with nested values that our previous API responses did not include. So, we have done a bit of extra data wrangling to get it into a nice data.frame format making use of the unnest_longer and unnest_wider functions from the tidyr package.

Here is the original nested JSON which contains a named list within each element of the field_values column:

print(json.dumps(response.json(), indent=4))
[
    {
        "field": "freq",
        "field_label": "Time frequency",
        "field_values": [
            {
                "label": "Annual",
                "value": "A"
            }
        ]
    },
    {
        "field": "unit",
        "field_label": "Unit of measure",
        "field_values": [
            {
                "label": "Percentage of total population",
                "value": "PC_POP"
            }
        ]
    }
]

And, here is our nicely formatted data frame which splits those elements into two columns:

# Look at the data
df_content
field field_label label value
0 freq Time frequency Annual A
1 unit Unit of measure Percentage of total population PC_POP

This response shows us the filters available for this data source and their respective values.


4. Query Data for One Data Source

In the previous steps, we (1) identified a data source to query, (2) checked which years and geographic levels are available, and (3) identified the filters available. We will now use this information to retrieve the data that we want from our selected data source, TGS00103.

To query data from this data source, we will use the API endpoint get_x_data:

# API endpoint
endpoint = "functions/postgisftw.get_x_data/items.json"

# Endpoint parameters
params = {
    "_level": 2,
    "_year": 2018,
    "X_JSON": json.dumps(
        {
            "source": "TGS00103",
            "conditions": [
                {"field": "unit", "value": "PC_POP"},
                {"field": "freq", "value": "A"},
            ],
        }
    ),
    "limit": 1500,
}

# GET request to query API
response = requests.get(f"{url}/{endpoint}", params=params)

if response.status_code == 200:
    data = response.json()
    df_content = pd.DataFrame(data)
else:
    raise Exception(f"Error: {response.status_code}")

# View data
df_content.head(10)
data_year geo geo_name geo_source geo_year x
0 2018 AL01 Veri NUTS 2018 25.1
1 2018 AL02 Qender NUTS 2018 23.6
2 2018 AL03 Jug NUTS 2018 21.6
3 2018 AT11 Burgenland NUTS 2018 NaN
4 2018 AT12 Niederösterreich NUTS 2018 NaN
5 2018 AT13 Wien NUTS 2018 NaN
6 2018 AT21 Kärnten NUTS 2018 NaN
7 2018 AT22 Steiermark NUTS 2018 NaN
8 2018 AT31 Oberösterreich NUTS 2018 NaN
9 2018 AT32 Salzburg NUTS 2018 NaN

The response includes the data that we have requested, including identifiers like geographic unit name (geo_name), NUTS ID (geo), and year (best_year). Note: We are only showing the first 10 rows of a much larger data frame.


5. Query Data from Two Sources Joined by Location

Now we want to repeat the same query from our previous step, but this time we will include a second variable joined to the output by geographic location. We will use Life expectancy by age, sex, and NUTS 2 region from EuroStat as our second variable. This data has resource ID DEMO_R_MLIFEXP, and we will fill in the correct filters following the steps outlined above:

# API endpoint
endpoint = "functions/postgisftw.get_xy_data/items.json"

# Endpoint parameters
params = {
    "_level": 2,
    "_predictor_year": 2018,
    "_outcome_year": 2018,
    "X_JSON": json.dumps(
        {
            "source": "TGS00103",
            "conditions": [
                {"field": "unit", "value": "PC_POP"},
                {"field": "freq", "value": "A"},
            ],
        }
    ),
    "Y_JSON": json.dumps(
        {
            "source": "DEMO_R_MLIFEXP",
            "conditions": [
                {"field": "unit", "value": "YR"},
                {"field": "age", "value": "Y_LT1"},
                {"field": "sex", "value": "T"},
                {"field": "freq", "value": "A"},
            ],
        }
    ),
    "limit": 1500,
}

# GET request to query API
response = requests.get(f"{url}/{endpoint}", params=params)

if response.status_code == 200:
    data = response.json()
    df_content = pd.DataFrame(data)
else:
    raise Exception(f"Error: {response.status_code}")

# View data
df_content.head(10)
geo geo_name geo_source geo_year outcome_year predictor_year x y
0 AL01 Veri NUTS 2018 2018 2018 25.1 NaN
1 AL02 Qender NUTS 2018 2018 2018 23.6 NaN
2 AL03 Jug NUTS 2018 2018 2018 21.6 NaN
3 AT11 Burgenland NUTS 2018 2018 2018 NaN 81.5
4 AT12 Niederösterreich NUTS 2018 2018 2018 NaN 81.5
5 AT13 Wien NUTS 2018 2018 2018 NaN 80.6
6 AT21 Kärnten NUTS 2018 2018 2018 NaN 81.6
7 AT22 Steiermark NUTS 2018 2018 2018 NaN 82.2
8 AT31 Oberösterreich NUTS 2018 2018 2018 NaN 82.2
9 AT32 Salzburg NUTS 2018 2018 2018 NaN 82.9

The x column contains the variable defined by the parameter X_JSON, and the y column contains the variable defined by the Y_JSON parameter. Note: We are only showing the first 10 rows of a much larger data frame.


Conclusion

This tutorial provided a step-by-step guide to querying the Mapineq API using Python. The code examples demonstrate how to retrieve data for various endpoints and how to process the responses into usable formats. For further details, consult the Mapineq API documentation available from https://mapineq.org/data-users.