3. Introduction to PostGIS¶
Setting up the conda env:
conda create -n sql python
conda activate sql
conda install ipython-sql sqlalchemy psycopg2 notebook pandas -c conda-forge
Sample dataset:
nyc_data.zip (Watch this video to load data into PostGIS)
References:
3.1. Connecting to the database¶
%load_ext sql
import os
host = "localhost"
database = "nyc"
user = os.getenv('SQL_USER')
password = os.getenv('SQL_PASSWORD')
connection_string = f"postgresql://{user}:{password}@{host}/{database}"
%sql $connection_string
%%sql
SELECT * FROM nyc_neighborhoods WHERE FALSE
* postgresql://postgres:***@localhost/nyc
0 rows affected.
| id | geom | boroname | name |
|---|
%%sql
SELECT id, boroname, name from nyc_neighborhoods LIMIT 10
* postgresql://postgres:***@localhost/nyc
10 rows affected.
| id | boroname | name |
|---|---|---|
| 1 | Brooklyn | Bensonhurst |
| 2 | Manhattan | East Village |
| 3 | Manhattan | West Village |
| 4 | The Bronx | Throggs Neck |
| 5 | The Bronx | Wakefield-Williamsbridge |
| 6 | Queens | Auburndale |
| 7 | Manhattan | Battery Park |
| 8 | Manhattan | Carnegie Hill |
| 9 | Staten Island | Mariners Harbor |
| 10 | Staten Island | Rossville |
3.2. Simple SQL¶
%%sql
SELECT postgis_full_version()
* postgresql://postgres:***@localhost/nyc
1 rows affected.
| postgis_full_version |
|---|
| POSTGIS="3.0.3 6660953" [EXTENSION] PGSQL="130" GEOS="3.8.1-CAPI-1.13.3" PROJ="6.3.2" LIBXML="2.9.10" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.4.3 (Internal)" |
3.2.1. NYC Neighborhoods¶

What are the names of all the neighborhoods in New York City?
%%sql
SELECT name FROM nyc_neighborhoods
* postgresql://postgres:***@localhost/nyc
129 rows affected.
| name |
|---|
| Bensonhurst |
| East Village |
| West Village |
| Throggs Neck |
| Wakefield-Williamsbridge |
| Auburndale |
| Battery Park |
| Carnegie Hill |
| Mariners Harbor |
| Rossville |
| Harlem |
| Gramercy |
| Queens Village |
| Middle Village |
| Ettingville |
| Morris Park |
| Baychester |
| Great Kills |
| New Brighton |
| Fordham |
| Nkew Gardens |
| Soho |
| Spuyten Duyvil |
| Woodside |
| Bay Ridge |
| Mott Haven |
| Rosedale |
| Boerum Hill |
| Cobble Hill |
| Morningside Heights |
| Murray Hill |
| Port Richmond |
| Williams Bridge |
| Downtown |
| Woodrow |
| Union Port |
| Woodhaven-Richmond Hill |
| Soundview |
| Jamaica |
| Laurelton |
| Sunset Park |
| Borough Park |
| East Brooklyn |
| Midland Beach |
| Jackson Heights |
| Maspeth |
| Flatbush |
| Central Park |
| Charlestown-Richmond Valley |
| Glendale |
| Morris Heights |
| Kings Bridge |
| Country Club |
| Park Slope |
| Greenwich Village |
| Midtown |
| Bloomfield-Chelsea-Travis |
| Richmondtown |
| Bedford Park |
| Williamsburg |
| Flushing |
| Canarsie |
| Greenwood |
| Annandale |
| City Island |
| Saintalbans |
| University Heights |
| Gravesend-Sheepshead Bay |
| Tribeca |
| North Sutton Area |
| Queensboro Hill |
| Springfield Gardens |
| Dyker Heights |
| Upper East Side |
| Financial District |
| Inwood |
| Sunny Side |
| Lower East Side |
| Chelsea |
| Oakwood |
| South Beach |
| Tottensville |
| Hunts Point |
| Ridgewood |
| Forest Hills |
| Clearview |
| Brownsville |
| Bushwick |
| Washington Heights |
| Upper West Side |
| The Rockaways |
| Howland Hook |
| Ardon Heights |
| Fort Green |
| Clinton |
| Prince's Bay |
| Fresh Kills |
| High Bridge |
| Eastchester |
| Riverdale |
| Woodlawn-Nordwood |
| Huguenot |
| Clifton |
| Bayside |
| Howard Beach |
| Tremont |
| Utopia |
| Garment District |
| East Harlem |
| Todt Hill |
| Parkchester |
| South Bronx |
| Westerleigh-Castleton |
| College Point |
| Mapleton-Flatlands |
| Little Italy |
| Bedford-Stuyvesant |
| Hamilton Heights |
| Carroll Gardens |
| Astoria-Long Island City |
| Yorkville |
| Chinatown |
| Coney Island |
| Corona |
| Red Hook |
| Douglastown-Little Neck |
| Whitestone |
| Steinway |
| Rosebank |
What are the names of all the neighborhoods in Brooklyn?
%%sql
SELECT name
FROM nyc_neighborhoods
WHERE boroname = 'Brooklyn'
* postgresql://postgres:***@localhost/nyc
23 rows affected.
| name |
|---|
| Bensonhurst |
| Bay Ridge |
| Boerum Hill |
| Cobble Hill |
| Downtown |
| Sunset Park |
| Borough Park |
| East Brooklyn |
| Flatbush |
| Park Slope |
| Williamsburg |
| Canarsie |
| Greenwood |
| Gravesend-Sheepshead Bay |
| Dyker Heights |
| Brownsville |
| Bushwick |
| Fort Green |
| Mapleton-Flatlands |
| Bedford-Stuyvesant |
| Carroll Gardens |
| Coney Island |
| Red Hook |
What is the number of letters in the names of all the neighborhoods in Brooklyn?
%%sql
SELECT char_length(name)
FROM nyc_neighborhoods
WHERE boroname = 'Brooklyn'
* postgresql://postgres:***@localhost/nyc
23 rows affected.
| char_length |
|---|
| 11 |
| 9 |
| 11 |
| 11 |
| 8 |
| 11 |
| 12 |
| 13 |
| 8 |
| 10 |
| 12 |
| 8 |
| 9 |
| 24 |
| 13 |
| 11 |
| 8 |
| 10 |
| 18 |
| 18 |
| 15 |
| 12 |
| 8 |
What is the average number of letters and standard deviation of number of letters in the names of all the neighborhoods in Brooklyn?
%%sql
SELECT avg(char_length(name)), stddev(char_length(name))
FROM nyc_neighborhoods
WHERE boroname = 'Brooklyn'
* postgresql://postgres:***@localhost/nyc
1 rows affected.
| avg | stddev |
|---|---|
| 11.7391304347826087 | 3.9105613559407395 |
What is the average number of letters in the names of all the neighborhoods in New York City, reported by borough?
%%sql
SELECT boroname, avg(char_length(name)), stddev(char_length(name))
FROM nyc_neighborhoods
GROUP BY boroname
* postgresql://postgres:***@localhost/nyc
5 rows affected.
| boroname | avg | stddev |
|---|---|---|
| Queens | 11.6666666666666667 | 5.0057438272815975 |
| Brooklyn | 11.7391304347826087 | 3.9105613559407395 |
| Staten Island | 12.2916666666666667 | 5.2043390480959474 |
| The Bronx | 12.0416666666666667 | 3.6651017740975152 |
| Manhattan | 11.8214285714285714 | 4.3123729948325257 |
3.2.2. NYC Census Blocks¶

%%sql
SELECT * FROM nyc_census_blocks WHERE FALSE
* postgresql://postgres:***@localhost/nyc
0 rows affected.
| id | geom | blkid | popn_total | popn_white | popn_black | popn_nativ | popn_asian | popn_other | boroname |
|---|
What is the population of the City of New York?
%%sql
SELECT Sum(popn_total) AS population
FROM nyc_census_blocks
* postgresql://postgres:***@localhost/nyc
1 rows affected.
| population |
|---|
| 8175032 |
What is the population of the Bronx?
%%sql
SELECT SUM(popn_total) AS population
FROM nyc_census_blocks
WHERE boroname = 'The Bronx'
* postgresql://postgres:***@localhost/nyc
1 rows affected.
| population |
|---|
| 1385108 |
For each borough, what percentage of the population is white?
%%sql
SELECT
boroname,
100 * SUM(popn_white)/SUM(popn_total) AS white_pct
FROM nyc_census_blocks
GROUP BY boroname
* postgresql://postgres:***@localhost/nyc
5 rows affected.
| boroname | white_pct |
|---|---|
| Queens | 39.7220773945910130 |
| Brooklyn | 42.8011737932686549 |
| The Bronx | 27.9037446899447552 |
| Manhattan | 57.4493039480462811 |
| Staten Island | 72.8942034860154033 |