3. Introduction to PostGIS

Setting up the conda env:

conda create -n sql python
conda activate sql
conda install ipython-sql sqlalchemy psycopg2 notebook pandas -c conda-forge

Sample dataset:

References:

3.1. Connecting to the database

%load_ext sql
import os
host = "localhost"
database = "nyc"
user = os.getenv('SQL_USER')
password = os.getenv('SQL_PASSWORD')
connection_string = f"postgresql://{user}:{password}@{host}/{database}"
%sql $connection_string
%%sql 

SELECT * FROM nyc_neighborhoods WHERE FALSE
 * postgresql://postgres:***@localhost/nyc
0 rows affected.
id geom boroname name
%%sql 

SELECT id, boroname, name from nyc_neighborhoods LIMIT 10
 * postgresql://postgres:***@localhost/nyc
10 rows affected.
id boroname name
1 Brooklyn Bensonhurst
2 Manhattan East Village
3 Manhattan West Village
4 The Bronx Throggs Neck
5 The Bronx Wakefield-Williamsbridge
6 Queens Auburndale
7 Manhattan Battery Park
8 Manhattan Carnegie Hill
9 Staten Island Mariners Harbor
10 Staten Island Rossville

3.2. Simple SQL

%%sql

SELECT postgis_full_version()
 * postgresql://postgres:***@localhost/nyc
1 rows affected.
postgis_full_version
POSTGIS="3.0.3 6660953" [EXTENSION] PGSQL="130" GEOS="3.8.1-CAPI-1.13.3" PROJ="6.3.2" LIBXML="2.9.10" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.4.3 (Internal)"

3.2.1. NYC Neighborhoods

What are the names of all the neighborhoods in New York City?

%%sql

SELECT name FROM nyc_neighborhoods
 * postgresql://postgres:***@localhost/nyc
129 rows affected.
name
Bensonhurst
East Village
West Village
Throggs Neck
Wakefield-Williamsbridge
Auburndale
Battery Park
Carnegie Hill
Mariners Harbor
Rossville
Harlem
Gramercy
Queens Village
Middle Village
Ettingville
Morris Park
Baychester
Great Kills
New Brighton
Fordham
Nkew Gardens
Soho
Spuyten Duyvil
Woodside
Bay Ridge
Mott Haven
Rosedale
Boerum Hill
Cobble Hill
Morningside Heights
Murray Hill
Port Richmond
Williams Bridge
Downtown
Woodrow
Union Port
Woodhaven-Richmond Hill
Soundview
Jamaica
Laurelton
Sunset Park
Borough Park
East Brooklyn
Midland Beach
Jackson Heights
Maspeth
Flatbush
Central Park
Charlestown-Richmond Valley
Glendale
Morris Heights
Kings Bridge
Country Club
Park Slope
Greenwich Village
Midtown
Bloomfield-Chelsea-Travis
Richmondtown
Bedford Park
Williamsburg
Flushing
Canarsie
Greenwood
Annandale
City Island
Saintalbans
University Heights
Gravesend-Sheepshead Bay
Tribeca
North Sutton Area
Queensboro Hill
Springfield Gardens
Dyker Heights
Upper East Side
Financial District
Inwood
Sunny Side
Lower East Side
Chelsea
Oakwood
South Beach
Tottensville
Hunts Point
Ridgewood
Forest Hills
Clearview
Brownsville
Bushwick
Washington Heights
Upper West Side
The Rockaways
Howland Hook
Ardon Heights
Fort Green
Clinton
Prince's Bay
Fresh Kills
High Bridge
Eastchester
Riverdale
Woodlawn-Nordwood
Huguenot
Clifton
Bayside
Howard Beach
Tremont
Utopia
Garment District
East Harlem
Todt Hill
Parkchester
South Bronx
Westerleigh-Castleton
College Point
Mapleton-Flatlands
Little Italy
Bedford-Stuyvesant
Hamilton Heights
Carroll Gardens
Astoria-Long Island City
Yorkville
Chinatown
Coney Island
Corona
Red Hook
Douglastown-Little Neck
Whitestone
Steinway
Rosebank

What are the names of all the neighborhoods in Brooklyn?

%%sql

SELECT name
FROM nyc_neighborhoods
WHERE boroname = 'Brooklyn'
 * postgresql://postgres:***@localhost/nyc
23 rows affected.
name
Bensonhurst
Bay Ridge
Boerum Hill
Cobble Hill
Downtown
Sunset Park
Borough Park
East Brooklyn
Flatbush
Park Slope
Williamsburg
Canarsie
Greenwood
Gravesend-Sheepshead Bay
Dyker Heights
Brownsville
Bushwick
Fort Green
Mapleton-Flatlands
Bedford-Stuyvesant
Carroll Gardens
Coney Island
Red Hook

What is the number of letters in the names of all the neighborhoods in Brooklyn?

%%sql

SELECT char_length(name)
FROM nyc_neighborhoods
WHERE boroname = 'Brooklyn'
 * postgresql://postgres:***@localhost/nyc
23 rows affected.
char_length
11
9
11
11
8
11
12
13
8
10
12
8
9
24
13
11
8
10
18
18
15
12
8

What is the average number of letters and standard deviation of number of letters in the names of all the neighborhoods in Brooklyn?

%%sql

SELECT avg(char_length(name)), stddev(char_length(name))
FROM nyc_neighborhoods
WHERE boroname = 'Brooklyn'
 * postgresql://postgres:***@localhost/nyc
1 rows affected.
avg stddev
11.7391304347826087 3.9105613559407395

What is the average number of letters in the names of all the neighborhoods in New York City, reported by borough?

%%sql

SELECT boroname, avg(char_length(name)), stddev(char_length(name))
FROM nyc_neighborhoods
GROUP BY boroname
 * postgresql://postgres:***@localhost/nyc
5 rows affected.
boroname avg stddev
Queens 11.6666666666666667 5.0057438272815975
Brooklyn 11.7391304347826087 3.9105613559407395
Staten Island 12.2916666666666667 5.2043390480959474
The Bronx 12.0416666666666667 3.6651017740975152
Manhattan 11.8214285714285714 4.3123729948325257

3.2.2. NYC Census Blocks

%%sql

SELECT * FROM nyc_census_blocks WHERE FALSE
 * postgresql://postgres:***@localhost/nyc
0 rows affected.
id geom blkid popn_total popn_white popn_black popn_nativ popn_asian popn_other boroname

What is the population of the City of New York?

%%sql 

SELECT Sum(popn_total) AS population
FROM nyc_census_blocks
 * postgresql://postgres:***@localhost/nyc
1 rows affected.
population
8175032

What is the population of the Bronx?

%%sql 

SELECT SUM(popn_total) AS population
FROM nyc_census_blocks
WHERE boroname = 'The Bronx'
 * postgresql://postgres:***@localhost/nyc
1 rows affected.
population
1385108

For each borough, what percentage of the population is white?

%%sql

SELECT
boroname,
100 * SUM(popn_white)/SUM(popn_total) AS white_pct
FROM nyc_census_blocks
GROUP BY boroname
 * postgresql://postgres:***@localhost/nyc
5 rows affected.
boroname white_pct
Queens 39.7220773945910130
Brooklyn 42.8011737932686549
The Bronx 27.9037446899447552
Manhattan 57.4493039480462811
Staten Island 72.8942034860154033