6. Projecting Data

Setting up the conda env:

conda create -n sql python
conda activate sql
conda install ipython-sql sqlalchemy psycopg2 notebook pandas -c conda-forge

Sample dataset:


6.1. Connecting to the database

%load_ext sql
import os
host = "localhost"
database = "nyc"
user = os.getenv('SQL_USER')
password = os.getenv('SQL_PASSWORD')
connection_string = f"postgresql://{user}:{password}@{host}/{database}"
%sql $connection_string

SELECT * from nyc_subway_stations LIMIT 5
 * postgresql://postgres:***@localhost/nyc
5 rows affected.
id geom objectid name alt_name cross_st long_name label borough nghbhd routes transfers color express closed
376 010100002026690000371775B5C3CE2141CBD2347771315141 1 Cortlandt St None Church St Cortlandt St (R,W) Manhattan Cortlandt St (R,W) Manhattan None R,W R,W YELLOW None None
2 010100002026690000CBE327F938CD21415EDBE1572D315141 2 Rector St None None Rector St (1) Manhattan Rector St (1) Manhattan None 1 1 RED None None
1 010100002026690000C676635D10CD2141A0ECDB6975305141 3 South Ferry None None South Ferry (1) Manhattan South Ferry (1) Manhattan None 1 1 RED None None
125 010100002026690000F4CF3E3654032241B5704681A73C5141 4 138th St Grand Concourse Grand Concourse 138th St / Grand Concourse (4,5) Bronx 138th St / Grand Concourse (4,5) Bronx None 4,5 4,5 GREEN None None
126 01010000202669000084DADF7AED0422410C380E6E3A3D5141 5 149th St Grand Concourse Grand Concourse 149th St / Grand Concourse (4) Bronx 149th St / Grand Concourse (4) Bronx None 4 2,4,5 GREEN express None

6.2. Checking SRID

The earth is not flat, and there is no simple way of putting it down on a flat paper map (or computer screen), so people have come up with all sorts of ingenious solutions, each with pros and cons. Some projections preserve area, so all objects have a relative size to each other; other projections preserve angles (conformal) like the Mercator projection; some projections try to find a good intermediate mix with only little distortion on several parameters. Common to all projections is that they transform the (spherical) world onto a flat Cartesian coordinate system, and which projection to choose depends on how you will be using the data.

We’ve already encountered projections when we loaded our nyc data. (Recall that pesky SRID 26918). Sometimes, however, you need to transform and re-project between spatial reference systems. PostGIS includes built-in support for changing the projection of data, using the ST_Transform(geometry, srid) function. For managing the spatial reference identifiers on geometries, PostGIS provides the ST_SRID(geometry) and ST_SetSRID(geometry, srid) functions.

We can confirm the SRID of our data with the ST_SRID function:


SELECT ST_SRID(geom) FROM nyc_streets LIMIT 1;
 * postgresql://postgres:***@localhost/nyc
1 rows affected.

And what is definition of “26918”? As we saw in loading data section, the definition is contained in the spatial_ref_sys table. In fact, two definitions are there. The “well-known text” (WKT) definition is in the srtext column, and there is a second definition in “proj.4” format in the proj4text column.


SELECT * FROM spatial_ref_sys WHERE srid = 26918
 * postgresql://postgres:***@localhost/nyc
1 rows affected.
srid auth_name auth_srid srtext proj4text
26918 EPSG 26918 PROJCS["NAD83 / UTM zone 18N",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-75],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","26918"]] +proj=utm +zone=18 +datum=NAD83 +units=m +no_defs

In fact, for the internal PostGIS re-projection calculations, it is the contents of the proj4text column that are used. For our 26918 projection, here is the proj.4 text:


SELECT proj4text FROM spatial_ref_sys WHERE srid = 26918
 * postgresql://postgres:***@localhost/nyc
1 rows affected.
+proj=utm +zone=18 +datum=NAD83 +units=m +no_defs

In practice, both the srtext and the proj4text columns are important: the srtext column is used by external programs like GeoServer, QGIS, and FME and others; the proj4text column is used internally.

6.3. Comparing Data

Taken together, a coordinate and an SRID define a location on the globe. Without an SRID, a coordinate is just an abstract notion. A “Cartesian” coordinate plane is defined as a “flat” coordinate system placed on the surface of Earth. Because PostGIS functions work on such a plane, comparison operations require that both geometries be represented in the same SRID.

If you feed in geometries with differing SRIDs you will just get an error:

# %%sql

# SELECT ST_Equals(
#          ST_GeomFromText('POINT(0 0)', 4326),
#          ST_GeomFromText('POINT(0 0)', 26918)
#          )

Be careful of getting too happy with using ST_Transform for on-the-fly conversion. Spatial indexes are built using SRID of the stored geometries. If comparison are done in a different SRID, spatial indexes are (often) not used. It is best practice to choose one SRID for all the tables in your database. Only use the transformation function when you are reading or writing data to external applications.

6.4. Transforming Data

If we return to our proj4 definition for SRID 26918, we can see that our working projection is UTM (Universal Transverse Mercator) of zone 18, with meters as the unit of measurement.

+proj=utm +zone=18 +ellps=GRS80 +datum=NAD83 +units=m +no_defs 

Let’s convert some data from our working projection to geographic coordinates – also known as “longitude/latitude”.

To convert data from one SRID to another, you must first verify that your geometry has a valid SRID. Since we have already confirmed a valid SRID, we next need the SRID of the projection to transform into. In other words, what is the SRID of geographic coordinates?

The most common SRID for geographic coordinates is 4326, which corresponds to “longitude/latitude on the WGS84 spheroid”. You can see the definition at the spatialreference.org site:

You can also pull the definitions from the spatial_ref_sys table:


SELECT srtext FROM spatial_ref_sys WHERE srid = 4326;
 * postgresql://postgres:***@localhost/nyc
1 rows affected.
GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]

Let’s convert the coordinates of the ‘Broad St’ subway station into geographics:


SELECT ST_AsText(geom)
FROM nyc_subway_stations 
WHERE name = 'Broad St';
 * postgresql://postgres:***@localhost/nyc
1 rows affected.
POINT(583571.905921312 4506714.34119218)

SELECT ST_AsText(ST_Transform(geom,4326)) 
FROM nyc_subway_stations 
WHERE name = 'Broad St';
 * postgresql://postgres:***@localhost/nyc
1 rows affected.
POINT(-74.0106714688734 40.7071048155876)

If you load data or create a new geometry without specifying an SRID, the SRID value will be 0. Recall in geometries, that when we created our geometries table we didn’t specify an SRID. If we query our database, we should expect all the nyc_ tables to have an SRID of 26918, while the geometries table defaulted to an SRID of 0.

To view a table’s SRID assignment, query the database’s geometry_columns table.


SELECT f_table_name AS name, srid 
FROM geometry_columns;
 * postgresql://postgres:***@localhost/nyc
6 rows affected.
name srid
nyc_homicides 26918
nyc_census_blocks 26918
nyc_neighborhoods 26918
nyc_streets 26918
nyc_subway_stations 26918
geometries 0

However, if you know what the SRID of the coordinates is supposed to be, you can set it post-facto, using ST_SetSRID on the geometry. Then you will be able to transform the geometry into other systems.


FROM geometries;
 * postgresql://postgres:***@localhost/nyc
5 rows affected.
POINT(-79.488743884387 0)
LINESTRING(-79.488743884387 0,-79.4887349253904 0.000009019375921,-79.4887259663935 0.000009019376033,-79.4887259663937 0.000018038752065)
POLYGON((-79.488743884387 0,-79.4887349253903 0,-79.4887349253904 0.000009019375921,-79.4887438843871 0.00000901937581,-79.488743884387 0))
POLYGON((-79.488743884387 0,-79.4886542944147 0,-79.4886542944202 0.000090193769243,-79.4887438843926 0.000090193758097,-79.488743884387 0),(-79.4887349253904 0.000009019375921,-79.4887349253905 0.000018038751842,-79.4887259663937 0.000018038752065,-79.4887259663935 0.000009019376033,-79.4887349253904 0.000009019375921))
GEOMETRYCOLLECTION(POINT(-79.4887259663935 0),POLYGON((-79.488743884387 0,-79.4887349253903 0,-79.4887349253904 0.000009019375921,-79.4887438843871 0.00000901937581,-79.488743884387 0)))

6.5. Function List

ST_AsText: Returns the Well-Known Text (WKT) representation of the geometry/geography without SRID metadata.

ST_SetSRID(geometry, srid): Sets the SRID on a geometry to a particular integer value.

ST_SRID(geometry): Returns the spatial reference identifier for the ST_Geometry as defined in spatial_ref_sys table.

ST_Transform(geometry, srid): Returns a new geometry with its coordinates transformed to the SRID referenced by the integer parameter.