Databases
Databases are used by web apps to store, use and manipulate data (see: CRUD). Relational databases are particularly useful. Database relationships are the logical connections between tables, based on the interactions between tables.
SQL (Structured Query Language) is commonly used language to interact with relational databases. We are using PostgreSQL in CS50W, but there are other versions with different features.
Using SQL

Other data types: SERIAL (automatically incrementing integer), BOOLEAN, ENUM (one of a discrete number of possible values)
Constraints: NOT NULL (field must have value), UNIQUE, PRIMARY KEY (main way of indexing), DEFAULT (set default value), CHECK (bound values)
PSQL command line
To get a database running in psql (Postgres), use command psql .
\d: prints out current database
Basic operations
Creating a table:
CREATE TABLE flights (
id SERIAL PRIMARY KEY,
origin VARCHAR NOT NULL,
destination VARCHAR NOT NULL,
duration INTEGER NOT NULL
);
Inserting data into a table:
INSERT INTO flights
(origin, destination, duration)
VALUES ('New York', 'London', 415);
- Note that there is no
idfield. Becauseidis of typeSERIAL, it will increment and be set automatically. - The order of values in
VALUESmust match the order listed earlier in the command. - This command could also be entered all in one line.
Reading data from a table:
SELECT * FROM flights;
SELECT origin, destination FROM flights;
SELECT * FROM flights WHERE id = 3;
SELECT * FROM flights WHERE origin = 'New York';
SELECT * FROM flights WHERE duration > 500;
SELECT * FROM flights WHERE destination = 'Paris' AND duration > 500;
SELECT * FROM flights WHERE destination = 'Paris' OR duration > 500;
SELECT AVG(duration) FROM flights WHERE origin = 'New York';
SELECT * FROM flights WHERE origin LIKE '%a%'; /* % is a wildcard, eg will match results with 'a' included. */
SELECT * FROM flights LIMIT 2;
SELECT * FROM flights ORDER BY duration ASC;
SELECT * FROM flights ORDER BY duration ASC LIMIT 3;
SELECT origin, COUNT(*) FROM flights GROUP BY origin;
SELECT origin, COUNT(*) FROM flights GROUP BY origin HAVING COUNT(*) > 1;
If a SQL function is passed as a column selector, a column with the return value of that function will be returned. Useful functions include:
AVG(column): returns the average valueCOUNT(*): returns the number of rows returned by the databaseMIN(column): returns the minimum valueMAX(column): returns the maximum value
LIKE is a keyword that takes a template string and returns all rows where the column fits that template.
GROUP BY organizes rows by grouping the same values in a given column together.
HAVING is an optional specifier for GROUP BY which limits what rows are going to be returned, similar to WHERE.
Updating data in a table:
UPDATE flights
SET duration = 430
WHERE origin = 'New York'
AND destination = 'London';
SEToverwrites a column in all the rows that match theWHEREquery.
Deleting data from a table:
DELETE FROM flights
WHERE destination = 'Tokyo'
Relating tables & Compound queries
To relate tables with one another, we can reference one column (eg: id) of table A in table B via foreign key.
Create table with foreign key:
CREATE TABLE passengers (
id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL,
flight_id INTEGER REFERENCES flights
);
flight_idis marked as being a foreign key for the tableflightswithREFERENCES flights. Sinceidis thePRIMARY KEYforflights, that is the column that isflights_idwill map to by default.
- Once these two tables are created, they can be queried simultaneously:
SELECT origin, destination, name FROM flights JOIN passengers ON passengers.flight_id = flights.id; SELECT origin, destination, name FROM flights JOIN passengers ON passengers.flight_id = flights.id WHERE name = 'Alice'; SELECT origin, destination, name FROM flights LEFT JOIN passengers ON passengers.flight_id = flights.id;JOINindicates that tablesflightsandpassengersare being queried together.JOINperforms an ‘inner join’: only rows where both tables match the query will be returned. In this example, only flights with passengers will be returned.ONindicates how the two tables are related. In this example, the columnflight_idinpassengersreflects values in the columnidinflights.- As before, queries can be constrained with
WHERE. LEFT JOINincludes rows from the first table listed even if there is no match (e.g. there are no passengers on that flight).RIGHT JOINis analogous (e.g. passengers with no flights).
- When databases get large, it is often useful to ‘index’ them, which makes it faster to quickly reference a given column in a table any time a
SELECTquery is made. Note, however, that this takes extra space, as well as time. When updating the table, the index must be updated as well. It is therefore unwise to index every column of every table unnecessarily. - Nested queries are yet another way to make more complex selections:
SELECT * FROM flights WHERE id IN (SELECT flight_id FROM passengers GROUP BY flight_id HAVING COUNT(*) > 1);- First, in the inner query, a table containing
flight_idfor flights with more than 1 passenger will be returned. - Then, in the outer query, all rows from
flightswill be selected that have anidin the table returned by the inner query. - In other words, this nested query returns flight info for flights with more than 1 passenger.
- First, in the inner query, a table containing
Preventing SQL injections
("INSERT INTO users (email, hash) \
VALUES (:email, :hash)",
{"email": email, "hash": hash})
SQLAlchemy
SQLAlchemy is a Python library that integrates SQL commands with the Python code running the web server.
Simple Python code for printing outside of web context:
import os
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
engine = create_engine(os.getenv("DATABASE_URL")) # database engine object from SQLAlchemy that manages connections to the database
# DATABASE_URL is an environment variable that indicates where the database lives
db = scoped_session(sessionmaker(bind=engine)) # create a 'scoped session' that ensures different users' interactions with the
# database are kept separate
flights = db.execute("SELECT origin, destination, duration FROM flights").fetchall() # execute this SQL command and return all of the results
for flight in flights
print(f"{flight.origin} to {flight.destination}, {flight.duration} minutes.") # for every flight, print out the flight info
flightsis a list of the rows the came back from the SQL query. The individual columns in each row can be accessed with dot notation.
Data can also be inserted into a database with Python. In this example, the raw data is coming from a CSV (comma-separated values) file:
import csv
# same import and setup statements as above
f = open("flights.csv")
reader = csv.reader(f)
for origin, destination, duration in reader: # loop gives each column a name
db.execute("INSERT INTO flights (origin, destination, duration) VALUES (:origin, :destination, :duration)",
{"origin": origin, "destination": destination, "duration": duration}) # substitute values from CSV line into SQL command, as per this dict
print(f"Added flight from {origin} to {destination} lasting {duration} minutes.")
db.commit() # transactions are assumed, so close the transaction finished
- The colon notation used in
db.execute()call is Postgres’ placeholder notation for values. This allows for the substitution of Python variables into SQL commands. Additionally, SQLAlchemy automatically takes care of sanitizing the values passed in.
Using Flask
- Everything discussed so far can be implemented in the exact same way inside a Flask application. Some of the code to add to
application.py(along with the necessary import and set up statements) could look like this:@app.route("/") def index(): flights = db.execute("SELECT * FROM flights").fetchall() return render_template("index.html", flights=flights) @app.route("/book", methods=["POST"]) def book(): # Get form information. name = request.form.get("name") try: flight_id = int(request.form.get("flight_id")) except ValueError: return render_template("error.html", message="Invalid flight number.") # Make sure the flight exists. if db.execute("SELECT * FROM flights WHERE id = :id", {"id": flight_id}).rowcount == 0: return render_template("error.html", message="No such flight with that id.") db.execute("INSERT INTO passengers (name, flight_id) VALUES (:name, :flight_id)", {"name": name, "flight_id": flight_id}) db.commit() return render_template("success.html")- The
tryblock of code is always run. If there is an error, and in particular, aValueError, the code in theexceptblock is run. The program’s flow then continues as normal. rowcountis a SQLAlchemy feature that is a property of db.execute(), which is equal the number of rows returned by the query.error.htmlandsuccess.htmlcould be generic templates that render the errormessageand some success statement, respectively.
- The
- The corresponding
index.html:<form action="{{ url_for('book') }}" method="post">class="form-group"> class="form-control" name="flight_id"> {% for flight in flights %} value="{{ flight.id }}">{{ flight.origin }} to {{ flight.destination }} {% endfor %}class="form-group"> class="form-control" name="name" placeholder="Passenger Name">class="form-group"> class="btn btn-primary">Book Flight</form>- Note that some elements, such as the
form-controlclass, are Bootstrap components. nameattributes are relevant for referencing them in Python code.- As is shown, the same dot notation that can be used in Python can also be used in Jinja2 templating.
- Note that some elements, such as the
- Taking this example one step further, it is possible to set up individual web pages for each flight that display some information about that flight. Here’s some Python code that would take care of the routing for these new pages:
@app.route("/flights") def flights(): flights = db.execute("SELECT * FROM flights").fetchall() return render_template("flights.html", flights=flights) @app.route("/flights/<int:flight_id>") def flight(flight_id): # Make sure flight exists. flight = db.execute("SELECT * FROM flights WHERE id = :id", {"id": flight_id}).fetchone() if flight is None: return render_template("error.html", message="No such flight.") # Get all passengers. passengers = db.execute("SELECT name FROM passengers WHERE flight_id = :flight_id", {"flight_id": flight_id}).fetchall() return render_template("flight.html", flight=flight, passengers=passengers)/flightsis a going to be a generic route to simply display a list of all flights.- Additionally,
/flights/<int:flight_id>provides for any individual flight’s info page.<int:flight_id>is a variable that is going to passed to Flask by the HTML inflights.html. This variable is then passed to theflightfunction, which passes the id into a SQL query to get all the info about the flight, including all of the passengers on that flight.
flights.html:<ul> {% for flight in flights %} <li> <a href="{{ url_for('flight', flight_id=flight.id) }}"> {{ flight.origin }} to {{ flight.destination }} </a> </li> {% endfor %} </ul>- It’s in the link here that
flight.id, which is an column from the rowflight, which comes from looping throughflights, which in turn was passed in from the Python code for/flights. It’s given the variable nameflight_id, which is what the python route for/flights/<int:flight_id>expects.
- It’s in the link here that
flight.html:<h1>Flight Details</h1> <ul> <li>Origin: {{ flight.origin }}</li> <li>Destination: {{ flight.destination }}</li> <li>Duration: {{ flight.duration}} minutes</li> </ul> <h2>Passengers</h2> <ul> {% for passenger in passengers %} <li>{{ passenger.name }}</li> {% else %} <li>No passengers.</li> {% endfor %} </ul>- The only new piece here is using
{% else %}with a for-loop to account for the case wherepassengersis empty.
- The only new piece here is using