3. Introduction to PostGIS¶
Setting up the conda env:
conda create -n sql python
conda activate sql
conda install ipython-sql sqlalchemy psycopg2 notebook pandas -c conda-forge
Sample dataset:
nyc_data.zip (Watch this video to load data into PostGIS)
References:
3.1. Connecting to the database¶
%load_ext sql
import os
host = "localhost"
database = "nyc"
user = os.getenv('SQL_USER')
password = os.getenv('SQL_PASSWORD')
connection_string = f"postgresql://{user}:{password}@{host}/{database}"
%sql $connection_string
%%sql
SELECT * FROM nyc_neighborhoods WHERE FALSE
* postgresql://postgres:***@localhost/nyc
0 rows affected.
id | geom | boroname | name |
---|
%%sql
SELECT id, boroname, name from nyc_neighborhoods LIMIT 10
* postgresql://postgres:***@localhost/nyc
10 rows affected.
id | boroname | name |
---|---|---|
1 | Brooklyn | Bensonhurst |
2 | Manhattan | East Village |
3 | Manhattan | West Village |
4 | The Bronx | Throggs Neck |
5 | The Bronx | Wakefield-Williamsbridge |
6 | Queens | Auburndale |
7 | Manhattan | Battery Park |
8 | Manhattan | Carnegie Hill |
9 | Staten Island | Mariners Harbor |
10 | Staten Island | Rossville |
3.2. Simple SQL¶
%%sql
SELECT postgis_full_version()
* postgresql://postgres:***@localhost/nyc
1 rows affected.
postgis_full_version |
---|
POSTGIS="3.0.3 6660953" [EXTENSION] PGSQL="130" GEOS="3.8.1-CAPI-1.13.3" PROJ="6.3.2" LIBXML="2.9.10" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.4.3 (Internal)" |
3.2.1. NYC Neighborhoods¶
What are the names of all the neighborhoods in New York City?
%%sql
SELECT name FROM nyc_neighborhoods
* postgresql://postgres:***@localhost/nyc
129 rows affected.
name |
---|
Bensonhurst |
East Village |
West Village |
Throggs Neck |
Wakefield-Williamsbridge |
Auburndale |
Battery Park |
Carnegie Hill |
Mariners Harbor |
Rossville |
Harlem |
Gramercy |
Queens Village |
Middle Village |
Ettingville |
Morris Park |
Baychester |
Great Kills |
New Brighton |
Fordham |
Nkew Gardens |
Soho |
Spuyten Duyvil |
Woodside |
Bay Ridge |
Mott Haven |
Rosedale |
Boerum Hill |
Cobble Hill |
Morningside Heights |
Murray Hill |
Port Richmond |
Williams Bridge |
Downtown |
Woodrow |
Union Port |
Woodhaven-Richmond Hill |
Soundview |
Jamaica |
Laurelton |
Sunset Park |
Borough Park |
East Brooklyn |
Midland Beach |
Jackson Heights |
Maspeth |
Flatbush |
Central Park |
Charlestown-Richmond Valley |
Glendale |
Morris Heights |
Kings Bridge |
Country Club |
Park Slope |
Greenwich Village |
Midtown |
Bloomfield-Chelsea-Travis |
Richmondtown |
Bedford Park |
Williamsburg |
Flushing |
Canarsie |
Greenwood |
Annandale |
City Island |
Saintalbans |
University Heights |
Gravesend-Sheepshead Bay |
Tribeca |
North Sutton Area |
Queensboro Hill |
Springfield Gardens |
Dyker Heights |
Upper East Side |
Financial District |
Inwood |
Sunny Side |
Lower East Side |
Chelsea |
Oakwood |
South Beach |
Tottensville |
Hunts Point |
Ridgewood |
Forest Hills |
Clearview |
Brownsville |
Bushwick |
Washington Heights |
Upper West Side |
The Rockaways |
Howland Hook |
Ardon Heights |
Fort Green |
Clinton |
Prince's Bay |
Fresh Kills |
High Bridge |
Eastchester |
Riverdale |
Woodlawn-Nordwood |
Huguenot |
Clifton |
Bayside |
Howard Beach |
Tremont |
Utopia |
Garment District |
East Harlem |
Todt Hill |
Parkchester |
South Bronx |
Westerleigh-Castleton |
College Point |
Mapleton-Flatlands |
Little Italy |
Bedford-Stuyvesant |
Hamilton Heights |
Carroll Gardens |
Astoria-Long Island City |
Yorkville |
Chinatown |
Coney Island |
Corona |
Red Hook |
Douglastown-Little Neck |
Whitestone |
Steinway |
Rosebank |
What are the names of all the neighborhoods in Brooklyn?
%%sql
SELECT name
FROM nyc_neighborhoods
WHERE boroname = 'Brooklyn'
* postgresql://postgres:***@localhost/nyc
23 rows affected.
name |
---|
Bensonhurst |
Bay Ridge |
Boerum Hill |
Cobble Hill |
Downtown |
Sunset Park |
Borough Park |
East Brooklyn |
Flatbush |
Park Slope |
Williamsburg |
Canarsie |
Greenwood |
Gravesend-Sheepshead Bay |
Dyker Heights |
Brownsville |
Bushwick |
Fort Green |
Mapleton-Flatlands |
Bedford-Stuyvesant |
Carroll Gardens |
Coney Island |
Red Hook |
What is the number of letters in the names of all the neighborhoods in Brooklyn?
%%sql
SELECT char_length(name)
FROM nyc_neighborhoods
WHERE boroname = 'Brooklyn'
* postgresql://postgres:***@localhost/nyc
23 rows affected.
char_length |
---|
11 |
9 |
11 |
11 |
8 |
11 |
12 |
13 |
8 |
10 |
12 |
8 |
9 |
24 |
13 |
11 |
8 |
10 |
18 |
18 |
15 |
12 |
8 |
What is the average number of letters and standard deviation of number of letters in the names of all the neighborhoods in Brooklyn?
%%sql
SELECT avg(char_length(name)), stddev(char_length(name))
FROM nyc_neighborhoods
WHERE boroname = 'Brooklyn'
* postgresql://postgres:***@localhost/nyc
1 rows affected.
avg | stddev |
---|---|
11.7391304347826087 | 3.9105613559407395 |
What is the average number of letters in the names of all the neighborhoods in New York City, reported by borough?
%%sql
SELECT boroname, avg(char_length(name)), stddev(char_length(name))
FROM nyc_neighborhoods
GROUP BY boroname
* postgresql://postgres:***@localhost/nyc
5 rows affected.
boroname | avg | stddev |
---|---|---|
Queens | 11.6666666666666667 | 5.0057438272815975 |
Brooklyn | 11.7391304347826087 | 3.9105613559407395 |
Staten Island | 12.2916666666666667 | 5.2043390480959474 |
The Bronx | 12.0416666666666667 | 3.6651017740975152 |
Manhattan | 11.8214285714285714 | 4.3123729948325257 |
3.2.2. NYC Census Blocks¶
%%sql
SELECT * FROM nyc_census_blocks WHERE FALSE
* postgresql://postgres:***@localhost/nyc
0 rows affected.
id | geom | blkid | popn_total | popn_white | popn_black | popn_nativ | popn_asian | popn_other | boroname |
---|
What is the population of the City of New York?
%%sql
SELECT Sum(popn_total) AS population
FROM nyc_census_blocks
* postgresql://postgres:***@localhost/nyc
1 rows affected.
population |
---|
8175032 |
What is the population of the Bronx?
%%sql
SELECT SUM(popn_total) AS population
FROM nyc_census_blocks
WHERE boroname = 'The Bronx'
* postgresql://postgres:***@localhost/nyc
1 rows affected.
population |
---|
1385108 |
For each borough, what percentage of the population is white?
%%sql
SELECT
boroname,
100 * SUM(popn_white)/SUM(popn_total) AS white_pct
FROM nyc_census_blocks
GROUP BY boroname
* postgresql://postgres:***@localhost/nyc
5 rows affected.
boroname | white_pct |
---|---|
Queens | 39.7220773945910130 |
Brooklyn | 42.8011737932686549 |
The Bronx | 27.9037446899447552 |
Manhattan | 57.4493039480462811 |
Staten Island | 72.8942034860154033 |