February 12, 2025
Introduction
The Mapineq dashboard and data catalogue are interactive tools that allow you to explore the Mapineq database and discover data relationships on the map. The dashboard uses the Mapineq API in the background to pull data from the Mapineq database, providing a user-friendly way to explore the database. If you are interested in automating this process or exporting many data sources, then you may benefit from accessing the API directly from your local R or Python console.
This tutorial demonstrates how to query the Mapineq API using Python. In general terms, an API (Application Programming Interface) is a tool that allows your computer to communicate with a remote server. In this case, we will use it to automate data imports from the MapIneq database server directly into our local R environment.
This tutorial covers the following topics:
- Fetching a list of data sources available from the MapIneq database for a given geographic level (i.e. NUTS level).
- Retrieving a list of years and geographic levels that are available for a data set.
- Identifying the data filters that are available for a data set.
- Querying data from a single data set.
- Querying data from two data sets joined by location.
For additional information about the API, please refer to our tutorial on using the API from R and the complete technical documentation for the API. You can also download the Quarto source code for this tutorial if you prefer to work with it directly in your Python development environment.
Setup
Install Required Libraries
We begin by loading the Python libraries that we need for the tutorial. We use the requests and json libraries for API requests and JSON processing, and the pandas library for data wrangling. If these libraries are not already installed on your local machine, then you can use the command line tool pip to install them:
pip install requests pandas
Import Libraries
Now let’s import those libraries into our Python session:
# Required for API queries
import requests
import json
# Required for data wrangling
import pandas as pd
API Base URL
We also need to identify the base URL for the Mapineq API:
= "https://api.mapineq.org" url
What follows is a series of steps to identify a data source and its characteristics and then query data from that source.
1. Fetch Sources by NUTS Level
Now, let’s get a list of data that are available in the Mapineq database for a given geographic level.
We will use NUTS administrative units to define geographic levels. NUTS regions are standardized administrative boundaries for Europe. NUTS level-0 are country boundaries, NUTS level-1 are the largest subnational administrative units, and NUTS level-3 are the smallest subnational administrative units.
Let’s take a look at how to get a list of all data available in the Mapineq database for NUTS level-2. We will do this by querying the Mapineq API endpoint get_source_by_nuts_level.
The first step is to select the API endpoint that we want to query and to define our values for its parameters:
# Identify the API endpoint to query
= "functions/postgisftw.get_source_by_nuts_level/items.json"
endpoint
# Provide parameters required by the endpoint
= {
params "_level": 2, # define the geographic level (i.e. NUTS level)
"limit": 500 # limit response to this many data sets
}
Notice that we defined the parameters to retrieve the data catalogue for NUTS level-2 and to limit the response to include no more than 500 data sets.
Now, we are ready to submit our API request:
# Submit HTTP GET request to query the API
= requests.get(f"{url}/{endpoint}", params=params)
response
if response.status_code == 200:
print(response)
else:
raise Exception(f"Error: {response.status_code}")
<Response [200]>
The API response should have an HTTP status code of 200, which indicates that our API request was successful. The response contains the data list that we want in JSON format, so we now need to do some data wrangling to convert this into a DataFrame object.
# Convert JSON to DataFrame
= response.json()
data = pd.DataFrame(data)
df_content
# View data
10) df_content.head(
f_description | f_resource | f_short_description | |
---|---|---|---|
0 | All causes of death by NUTS 2 regions (ESTAT) | TGS00057 | All causes of death |
1 | Area by NUTS 3 region (ESTAT) | DEMO_R_D3AREA | Area of regions |
2 | At-risk-of-poverty rate by NUTS 2 regions (ESTAT) | TGS00103 | At-risk-of-poverty |
3 | Available beds in hospitals by NUTS 2 region (... | HLTH_RS_BDSRG2 | Hospital beds |
4 | Available beds in hospitals by NUTS 2 regions ... | TGS00064 | Hospital beds |
5 | Business demography and high growth enterprise... | BD_HGNACE2_R3 | Business demography |
6 | Business demography by size class and NUTS 3 r... | BD_SIZE_R3 | Business demography |
7 | Causes of death by NUTS 2 region - absolute nu... | HLTH_CD_YNRF | Number of deaths |
8 | Causes of death by NUTS 2 region - absolute nu... | HLTH_CD_YNRM | Number of deaths |
9 | Causes of death by NUTS 2 region - absolute nu... | HLTH_CD_YNRT | Number of deaths |
We now have a nice data frame that we can use to explore the data that are available in the Mapineq database. Note: We are only showing the first 10 rows of a much larger data frame.
2. Retrieve Years and NUTS Levels Available for a Data Source
For this step, we will select a single data source that we want to investigate further. Let’s look at At-risk-of-poverty rate by NUTS 2 regions from EuroStat (ESTAT). You will notice from the data catalogue that we retrieved in the previous step that the resource ID for this data set is TGS00103, which we will use for our next steps.
To retrieve years and NUTS levels that are available for this data source, we will use the API endpoint get_year_nuts_level_from_source:
# API endpoint
= "functions/postgisftw.get_year_nuts_level_from_source/items.json"
endpoint
# Endpoint parameters
= {"_resource": "TGS00103"} # resource ID of the data set
params
# GET request to query API
= requests.get(f"{url}/{endpoint}", params=params)
response
if response.status_code == 200:
= response.json()
data = pd.DataFrame(data)
df_content else:
raise Exception(f"Error: {response.status_code}")
# Look at the data
df_content
f_level | f_year | |
---|---|---|
0 | 2 | 2012 |
1 | 2 | 2013 |
2 | 2 | 2014 |
3 | 2 | 2015 |
4 | 2 | 2016 |
5 | 2 | 2017 |
6 | 2 | 2018 |
7 | 2 | 2019 |
8 | 2 | 2020 |
9 | 2 | 2021 |
The API response provides all of the years and geographic levels covered by this data set.
3. Identify Data Filters Available for a Data Source
Many data sets available in the Mapineq database contain various data filters that can be applied to extract the data you are specifically interested in. These filters could include age, sex, education level, etc. Our next step is to identify which filters are available for the At-risk-of-poverty rate by NUTS 2 regions data that we are investigating.
To identify the filters that are available for this data source, we will use the API endpoint get_column_values_source_json:
# API endpoint
= "functions/postgisftw.get_column_values_source_json/items.json"
endpoint
# Endpoint parameters
= {
params "_resource": "TGS00103",
"source_selections": json.dumps({
"year": "2020",
"level": "2",
"selected": []
}) }
This endpoint requires a source_selections that is in JSON format. We implemented this in Python by creating a dictionary object and then converting it to JSON using the json.dumps function. Let’s take a quick look at what that final JSON parameter looks like:
print(params['source_selections'])
{"year": "2020", "level": "2", "selected": []}
Now we are ready to submit our API request:
# GET request to query API
= requests.get(f"{url}/{endpoint}", params=params)
response
if response.status_code == 200:
= response.json()
data = pd.json_normalize(data).explode("field_values")
df = pd.json_normalize(df["field_values"])
field_values_expanded = pd.concat(
df_content ="field_values"), field_values_expanded], axis=1
[df.drop(columns
)else:
raise Exception(f"Error: {response.status_code}")
The API response contains a more complex JSON object with nested values that our previous API responses did not include. So, we have done a bit of extra data wrangling to get it into a nice data.frame format making use of the unnest_longer and unnest_wider functions from the tidyr package.
Here is the original nested JSON which contains a named list within each element of the field_values column:
print(json.dumps(response.json(), indent=4))
[
{
"field": "freq",
"field_label": "Time frequency",
"field_values": [
{
"label": "Annual",
"value": "A"
}
]
},
{
"field": "unit",
"field_label": "Unit of measure",
"field_values": [
{
"label": "Percentage of total population",
"value": "PC_POP"
}
]
}
]
And, here is our nicely formatted data frame which splits those elements into two columns:
# Look at the data
df_content
field | field_label | label | value | |
---|---|---|---|---|
0 | freq | Time frequency | Annual | A |
1 | unit | Unit of measure | Percentage of total population | PC_POP |
This response shows us the filters available for this data source and their respective values.
4. Query Data for One Data Source
In the previous steps, we (1) identified a data source to query, (2) checked which years and geographic levels are available, and (3) identified the filters available. We will now use this information to retrieve the data that we want from our selected data source, TGS00103.
To query data from this data source, we will use the API endpoint get_x_data:
# API endpoint
= "functions/postgisftw.get_x_data/items.json"
endpoint
# Endpoint parameters
= {
params "_level": 2,
"_year": 2018,
"X_JSON": json.dumps(
{"source": "TGS00103",
"conditions": [
"field": "unit", "value": "PC_POP"},
{"field": "freq", "value": "A"},
{
],
}
),"limit": 1500,
}
# GET request to query API
= requests.get(f"{url}/{endpoint}", params=params)
response
if response.status_code == 200:
= response.json()
data = pd.DataFrame(data)
df_content else:
raise Exception(f"Error: {response.status_code}")
# View data
10) df_content.head(
data_year | geo | geo_name | geo_source | geo_year | x | |
---|---|---|---|---|---|---|
0 | 2018 | AL01 | Veri | NUTS | 2018 | 25.1 |
1 | 2018 | AL02 | Qender | NUTS | 2018 | 23.6 |
2 | 2018 | AL03 | Jug | NUTS | 2018 | 21.6 |
3 | 2018 | AT11 | Burgenland | NUTS | 2018 | NaN |
4 | 2018 | AT12 | Niederösterreich | NUTS | 2018 | NaN |
5 | 2018 | AT13 | Wien | NUTS | 2018 | NaN |
6 | 2018 | AT21 | Kärnten | NUTS | 2018 | NaN |
7 | 2018 | AT22 | Steiermark | NUTS | 2018 | NaN |
8 | 2018 | AT31 | Oberösterreich | NUTS | 2018 | NaN |
9 | 2018 | AT32 | Salzburg | NUTS | 2018 | NaN |
The response includes the data that we have requested, including identifiers like geographic unit name (geo_name), NUTS ID (geo), and year (best_year). Note: We are only showing the first 10 rows of a much larger data frame.
5. Query Data from Two Sources Joined by Location
Now we want to repeat the same query from our previous step, but this time we will include a second variable joined to the output by geographic location. We will use Life expectancy by age, sex, and NUTS 2 region from EuroStat as our second variable. This data has resource ID DEMO_R_MLIFEXP, and we will fill in the correct filters following the steps outlined above:
# API endpoint
= "functions/postgisftw.get_xy_data/items.json"
endpoint
# Endpoint parameters
= {
params "_level": 2,
"_predictor_year": 2018,
"_outcome_year": 2018,
"X_JSON": json.dumps(
{"source": "TGS00103",
"conditions": [
"field": "unit", "value": "PC_POP"},
{"field": "freq", "value": "A"},
{
],
}
),"Y_JSON": json.dumps(
{"source": "DEMO_R_MLIFEXP",
"conditions": [
"field": "unit", "value": "YR"},
{"field": "age", "value": "Y_LT1"},
{"field": "sex", "value": "T"},
{"field": "freq", "value": "A"},
{
],
}
),"limit": 1500,
}
# GET request to query API
= requests.get(f"{url}/{endpoint}", params=params)
response
if response.status_code == 200:
= response.json()
data = pd.DataFrame(data)
df_content else:
raise Exception(f"Error: {response.status_code}")
# View data
10) df_content.head(
geo | geo_name | geo_source | geo_year | outcome_year | predictor_year | x | y | |
---|---|---|---|---|---|---|---|---|
0 | AL01 | Veri | NUTS | 2018 | 2018 | 2018 | 25.1 | NaN |
1 | AL02 | Qender | NUTS | 2018 | 2018 | 2018 | 23.6 | NaN |
2 | AL03 | Jug | NUTS | 2018 | 2018 | 2018 | 21.6 | NaN |
3 | AT11 | Burgenland | NUTS | 2018 | 2018 | 2018 | NaN | 81.5 |
4 | AT12 | Niederösterreich | NUTS | 2018 | 2018 | 2018 | NaN | 81.5 |
5 | AT13 | Wien | NUTS | 2018 | 2018 | 2018 | NaN | 80.6 |
6 | AT21 | Kärnten | NUTS | 2018 | 2018 | 2018 | NaN | 81.6 |
7 | AT22 | Steiermark | NUTS | 2018 | 2018 | 2018 | NaN | 82.2 |
8 | AT31 | Oberösterreich | NUTS | 2018 | 2018 | 2018 | NaN | 82.2 |
9 | AT32 | Salzburg | NUTS | 2018 | 2018 | 2018 | NaN | 82.9 |
The x column contains the variable defined by the parameter X_JSON, and the y column contains the variable defined by the Y_JSON parameter. Note: We are only showing the first 10 rows of a much larger data frame.
Conclusion
This tutorial provided a step-by-step guide to querying the Mapineq API using Python. The code examples demonstrate how to retrieve data for various endpoints and how to process the responses into usable formats. For further details, consult the Mapineq API documentation available from https://mapineq.org/data-users.