Accessing PostGIS with Python

Lindsey Smith

In this code, the psycopg2 library is used to access spatial and non-spatial data from a PostgreSQL database.

In order to establish a connection with the database and successfully access data using psycopg2, the execution environment of the Python script or Jupyter Notebook must align with the hosting environment of the target database. For example, if the database is hosted locally (e.g., on your machine), ensure that the Python code using psycopg2 is executed within this local environment. I recommend installing and using tools such as Anaconda Navigator or Jupyter Labs to maintain this consistency.

Note that you will not be able to establish a connection with a database on your local machine if working in The University of Toronto’s Jupyter Hub.

# If not already installed
!pip install psycopg2-binary
pip install geopandas
# Import required libraries
import psycopg2
import pandas as pd  
import geopandas as gpd

1. Establish a connection and access the data from the database

# Create variables for database connection parameters
dbname = "libraries"
user = "postgres"
#password = ""
host = "localhost"
port = "5432"
# Establish a connection to the database
# Add paramter password=password if needed
conn = psycopg2.connect(dbname=dbname, user=user, host=host, port=port)
# Create a cursor object to interact with the database
cur = conn.cursor()
# Example query: get library names and ST_AsText of their geometries
query = """
SELECT branchcode, ST_AsText(geom) AS geom_wkt
FROM librarypnts
LIMIT 5;
"""

# Execute the query
cur.execute(query)

# Fetch the results and store in a new variable
results = cur.fetchall()
# Take a quick look at the results
for row in results:
    print(row)

2. Store the queried data in a dataframe

For further analysis, it’s helpful to store data in a dataframe. If the data are non-spatial, a pandas dataframe is appropriate, but if the data include a geometry field then a geopandas dataframe must be used.

## Store non-spatial data in pandas dataframe

# Extract column names from the cursor description (cur.description) and store them in a list 
column_names = [desc[0] for desc in cur.description]

# Create a pandas DataFrame with the fetched results and column names
df = pd.DataFrame(results, columns=column_names)

# View the dataframe
df

Note that if you store a geometry field within a pandas dataframe, the data type of the field (accessible using df.dtype) will be an object for storing strings (text values).

In order to store spatial data in a geopandas dataframe, data can be converted from WKB or WKT into a format compatible with geopandas. To check whether data are in WKB or WKT, look at the values in the geometry field in the pandas dataframe above.

WKB is a binary format representing geometry data as a sequence of hexadecimal values (e.g., 0101000000…) WKT data often starts with a specific geometry type, such as POINT, LINESTRING, POLYGON, etc., followed by the coordinates that define the geometry (e.g., POINT (30 10))

## Store spatial data in geopandas dataframe

# Convert geometry data into a format compatible with geopandas (e.g., WKB or WKT. Adjust depending on your data)
df['geometry'] = gpd.GeoSeries.from_wkt(df['geom_wkt'])

# Create new geodpandas dataframe from the pandas dataframe, specifying the geometry column
gdf = gpd.GeoDataFrame(df, geometry='geometry')

# View the geodataframe with additional new field 'geometry'
gdf
# Let's take a quick look at those points
gdf.plot()

3. Save the geodataframe

If you plan to access the queried data from another notebook, you may wish to save the data as a shapefile for future use.

# Save geodataframe to shapefile
gdf.to_file('output.shp') # stored in same folder as python notebook