SQL Alchemy tutorial


This describes the basic overview of SQL Alchemy. Notice the color theme. It will be related to the next onion-ring overview of SQL Alchemy.

SQL Alchemy is divided into Core and ORM. You can use Core and not use ORM. ORM is more geared towards mapping model objects to Python data types. Core on the other hand focuses on transacting with the database.

Level 1: Engine, Connections, Transactions

This is the level that talks directly to Python DBAPI.

Engine

Engine is what creates and maintains connection to the database.

Here we first create an engine to talk to the database. Engine object is lazy in that the actual connection to the database is not created until the first execute command is run (or some other command that makes it to actually access database).

Here, the third line in the code will actually cause the database connection to be created and actually create a sqlite database named tutorial.db.

The result object also has methods like fetchone(), fetchall().
NOTE: engine.echo = True will enable logging to console. You will then be able to see all the SQL generated by SQLAlchemy when you run these in console.
If we do something like:
row = result.fetchone()
the row object can behave like a tuple :
>>>row
(5, u’hello’)
but can also act like a dictionary:
>>>row['greeting_message']
u’hello’
where ‘greeting_message’ is a column in the database table. i.e. columns in select query become dictionary keys.
>>>result.close()
results object will close automatically when all rows are exhausted, but we can also close explicitly. results object contains the cursor object inside it, as well as the connection. The cursor object needs to be closed and connection needs to be returned to the pool of the connections to the database. result.close() does all this.

result object can also be iterated:

Note: We will create the toy database of employee in the context manager section

Connection Scoping

We can control the scope of the connection using connect()

Note, conn is not the DBAPI connection object. It is a SQLAlchemy connection object that wraps the DBAPI connection object. The same way that the result object wraps the cursor object from DBAPI.

Transactions

To run several statements inside a transaction, Connection features a begin() method that returns a Transaction.

The transaction gets committed when commit() is called. Note that between begin() and commit(), we have executed multiple sql statements. If commit is not successful, they will be rolled back.

Note that if we don’t provide begin() and commit(), then SQL Alchemy will auto commit each insert, update, delete, select statement. So, the moment you say begin(), it turns off the auto-commit. Also, note that begin() does not exist in python DBAPI. DBAPI only has COMMIT and ROLLBACK. SQLAlchemy generates a BEGIN(implicit) where you say begin() just to give a visual indication that begin() was called here.

Context Manager

We can use a context manager to streamline the process of being and commit.

context manager takes care of committing. For the select statement, an autocommit happened.

Engine Facts

  • Executing via the Engine directly is called connectionless execution – the Engine connects and disconnects for us.
  • Using a Connection is called explicit execution. We control the span of a connection in use.
  • Engine usually uses a connection pool, which means “disconnecting” often means the connection is just returned to the pool.
  • The SQL we send to engine.execute () as a string is not modified, is consumed by the DBAPI verbatim.

Level 2: Table Metadata, Reflection, DDL

DDL

DDL stands for Create, drop, alter table (among other things). This is a sql subset of commands. DDL is the language that we use to alter the schema of the database.
Describes the structure of the database, i.e. tables, columns, constraints, in terms of data structures in Python

Metadata

• Serves as the basis for SQL generation and object relational mapping. This is a python object that closely mirrors the database object.
• Can generate to a schema i.e. we can use metadate to generate a create table statement, or drop table statement or alter table statement. It will do so based on the info it has.
• Can be generated from a schema i.e. we can generate metadata from a sql schema (like create table statement can be used to generate metadata). This is called Reflection. If you have an existing database, you can just connect SQLAlchemy to this databse and it will generate all the python objects representing the database.

Schema and MetaData

The structure of a relational schema is represented in Python using MetaData, Table and other objects.

Here, we created a metadata for a table. Most of the time, metadata is just an container for tables. And does nothing else.

Also, there is no connection to the database. So user_table is just a python object, created from Table class in sqlalchemy, at this time. We just constructed this table and gave it 5 arguments. The first argument is the name of the table. The second argument is the metadata (the container in which this table exists). Note: every table we create that has this “metadata” will populate itself as a collection of tables that has this metadata. A table of a given name can only exist once within that metadata object. And we can make tables that refer to each-other using foreign key constraints by name. And metadata serves as a construct so we lookup those tables by name. Metadata is also a place where we can commit commands like create all tables, drop all tables.

So, the python Table is represented as a Table object, Column object is represented as Column here. Each column has a name. You should also give each column a datatype. On the database, String is VARCHAR. Nothing has happened with the database yet. This is just a Python object.

Table provides a single point of information regarding the structure of a table in a schema. So, we can check what the name of table is:
>>>user_table.name
There is a .c attribute of Table which is an associative array of Columns objects (its not an array but like a dictionary), keyed on name. Here I think c is short for collection or container of column objects. c is like a collection object. It acts like a dictionary, but is not a dictionary.
>>>user_table.c.name
Output:>>>Column('name', String(), table=user)

If you do:
>>>user_table.c.keys()
you will get the list of all column names in this table object.

The Column object itself has info about column like name and type.
>>>user_table.c.id.name
Output:>>>'id'
>>>user_table.c.id.type
Output:>>>String()

Table has other information available, such as collection of columns which comprise the table’s primary key.
>>>user_table.primary_key

The Table object is at the core of the SQL expression system.

You can already see how SQL expressions are generated from Table metadata.

So, now lets create a whole table from metadata we wrote above.

This time, we actually created a table schema and committed to sqlite. sqlalchemy always generates COMMIT after every transaction. If the metadata had more tables, they would all have been created as one transaction. This way, if there is a problem anywhere, while creating all tables and inserting data to them, then everything will get rolled back and no commit will happen.

Lets create another table with some different data-types in columns:

the create will generate this SQL and commit it:

Let us now make another table with a foreign key constraint.

Note: This is how you create a table with one foreign key. If you have composite foreign keys, its slightly different.

One interesting thing to note here is that in actual SQL, we would be creating the user table first before creating any other table that has a foreign key constraint to it. In sqlalchemy, so long as all tables exist in the metadata, it does not matter in which order they occur. The user table and the user_details table just need to be present in the metadata. The order in which they were added to the metadata does not matter.

Now let’s look at an example of composite primary key. Composite primary key is when there are multiple columns in a table that are primary keys.
Note: ForeignKey is shortcut for ForeignKeyContraint, which should be used for composite references.

Note: In ForeignKeyConstraint field, we didn’t add a type. SqlAlchemy is now able to infer the type from the field it is referring to.
Note: here, if you do metadata.create_all(), it will check which tables in the metadata already exist in the database and skip those and only create schema and commit for those that don’t exist.

Reflection

Reflection refers to loading Table objects based on reading from an existing database.
This is how we do it. Remember we have already created a table called ‘user’.

metadata_new is an empty metadata. We now do reflection, and load an empty Table object with reflected data from database for table ‘user’

user_reflected now has all the schema information related to ‘user’ in the database. This is a very powerful feature of sqlalchemy.

Inspector

Information about a database at a more specific level is available using the inspector object. What you saw before in Reflection, Inspector gives you the same ability with many more convenience methods.

>>>from sqlalchemy import inspect
>>>inspector = inspect(engine)

Noe, the inspector object can be queried to do reflection on all types of stuff in the database. Here inpector is an object that is specific to engines and knows how to reflect them.

For example inspector can give you the names of the all the tables in the database like so:
>>>inspector.get_table_names()

Inspector object will give you all the columns of a table if you give it a table name.
>>>inspector.get_columns('user')

Inspector object will give you all the foreign keys of a table if you give it a table name.
>>>inspector.get_foreign_keys('published')

Basic Types

• Integer () – basic integer type, generates INT
• String () – ASCII strings, generates VARCHAR
• Unicode () – Unicode strings – generates VARCHAR, NVARCHAR depending on database
• Boolean () – generates BOOLEAN, INT, TINYINT
• DateTime () – generates DATETIME or TIMESTAMP, returns Python datetime () objects
• Float () – floating point values
• Numeric () – precision numerics using Python Decimal()

CREATE and DROP

Here are different ways to issue create and drops
• metadata.create_all (engine, checkfirst = ) emits CREATE statements for all tables.
• table.create (engine, checkfirst = ) emits CREATE for a single table.
• metadata.drop_all (engine, checkfirst = ) emits DROP statements for all tables.
• table.drop (engine, checkfirst = ) emits DROP for a single table.

Level 3: SQL Expressions

• The SQL Expression system builds upon Table Metadata in order to compose SQL statements in Python.
• We will build Python objects that represent individual SQL strings (statements) we’d send to the database.
• These objects are composed of other objects that each represent some unit of SQL, like a comparison, a SELECT statement, a conjunction such as AND or OR.
• We work with these objects in Python, which are then converted to strings when we “execute” them (as well as if we print them).

Tables

Let’s start with tables. We will first create a table using metadata and then we will start building SQL Expressions (you’ll be mostly dealing with DML in these SQL Expressions.. as in select, insert, update).

Note: If the table already exists, metadata will not generate the create table sql schema. But there will be no error.

As we saw before, we can access the collection of all columns in the table via the .c attribute. Individual columns can be accessed as .c. >>>user_table.c.fullname
Output:Column(‘fullname’, String(length=50), table=)

The “Column” object you see in the output is part of a class known as ColumnElement, which exhibits custom Python expression behavior. Meaning Column is a subclass of ColumnElement and inherits some custom Python expression behavior as we will see below.

What this means is that Column object will have a bunch of python magic methods like __eq__, __lt__, __gt__ . These methods give us the ability to overload operators.

The trick thing to note here is this:A statement likeuser_table.c.fullname == 'john galt' when executed, does not return a bool but an expression object like so:

These expression objects become SQL when evaluated as a string.

Here, you can see the generated SQL. In it, the string ‘john galt’ has vanished and replaced by a bound parameter :fullname_1

The ‘john galt’ string is still there, you just can’t see it.

ColumnElements are the basic building blocks of SQL Expressions. ColumnElements can be further combined to produce more ColumnElements.

for example, for the above code, we use a python bitwise OR operator to generate an SQL expression with OR

OR and AND are available with |, &, or or_() and and_()

Compare to None produces IS NULL

“+” meaning vaies with context. If used with integers, it will add, with strings it will concatenate

To generate IN:

bound parameters in SQL Expressions

The “bound” parameters take place of the literal values, which are stored away for use when the statement is executed.

The values of our “bound” parameters are still there, however, stored away within a structure called “compiled” object.

So, when we execute an SQL expression, the bound parameters are extracted. On execution, it goes through the process of compiling the expression and extracting the params.

SQL Expressions produce different strings according to dialect objects. Sqlalchemy can take multiple objects, like below is a comparison between default sqlite dialect (called qmark) and postgresql dialect

postgresql dialect:

sqlite dialect:

same bound parameter in prostgres dialect is %(name_1)s and is :name_1 in sqlite dialect.

Some more examples:
user.fullname == ‘john’ AND user.id > 5

user.fullname == ‘john’ OR (user.fullname == ‘john’ AND user.id > 5)

DML statements: insert

We can do an insert into a table like so. We create the SQL expression with insert object. We then modify the insert object to add values. Then create the connection object because we want to use explicit execution. Then execute the SQL expression.

Also, note that because we didn’t say begin() and commit(), an autocommit happened.

insert and other DML can run multiple parameters at once.

For example, lets insert multiple values:

DML statements: select

select() is used to produce any SELECT statement

You can also say user_table.select() and it will do the same thing as above.
To select all columns from a table

Notice that in this case, sqlalchemy does not use select(*) but just select with table name inside. By default, sqlalchemy fetches all columns, unless specific columns are specified.

A more complex select query:

If we use WHERE multiple times, it will be joined by AND

above the additional where clauses are added using method chaining but for some weird reason method chaining is referred to as “generative” in documentation for sqlalchemy.

DML Statements: UPDATE

The only difference from SQL here is that in SQL Expression, update() is used with values() whereas in actual SQL, UPDATE is used with SET

Note: if you write something like r = conn.execute(update_stmt), then you can also do this r.rowcount to get a row-count of how many rows got modified with that update.
UPDATE can also use expressions based on other columns

that ? you see is because of the space in the concatenation. Any literal becomes a bound parameter. (? is as in sqlite dialect. It will be different if using postgres dialect)

DML Statements: Delete

Level 4: Object Relational Mapping

Object Relational Mapping or ORM is the process of associating object oriented classes with database tables. Till the SQLAlchemy core level, the data fetched from database is still in the form of tuples. What ORM usually does is convert this into a domain object. In contrast to the SQL Expression language, which presents a schema-centric view of data, SQLAlchemy-ORM presents a domain-model centric view of data.

The set of object oriented classes is referred to as the domain model.


Other things ORMs do:
• provide a means of querying the database in terms of the domain model structure
• Some can represent class inheritance hierarchies using a variety of schemes
• Some can handle “sharding” of data, i.e. storing a domain model across multiple schemas or databases
• Provide various patterns for concurrency, including row versioning
• Provide patterns for data validation and coercion

There are two flavors of ORMs. One like Django, which is an Active Record pattern. It has a .save() method in the domain object that will take care of persisting the data to the database(by creating the data). When you wish to update, you again call .save() and this will now do an update.

SQLAlchemy follows Data Record pattern. Here, persistence and domain objects are kept separate. What this means is that domain objects do not have a .save() method. Instead, there is as session object. You add the domain objects to the session and this takes care of persistence. This is also a unit of work pattern. Java hibernate follows the same pattern as sqlalchemy.

Key ORM Patterns

Unit of Work – objects are maintained by a system that tracks changes over the course of a transaction, and flushes pending changes periodically, in a transparent or semitransparent manner
Identity Map – objects are tracked by their primary key within the unit of work, and are kept unique on that primary key identity.
Lazy Loading – Some attributes of an object may emit additional SQL queries when they are accessed.
Eager Loading – Multiple tables are queried at once in order to load related objects and collections.
Method Chaining – queries are composed using a string of method calls which each return a new query object.

Declarative

The declarative system is normally used to configure object relational mappings.
You use the Base as the base class using which you build your domain class. Something like models.Model in Django. The only difference is that models.Model is a class. But in sqlalchemy, you create a base class object by calling a function declarative_base.

This seems to resemble the user table we have in the database. But this is just a class. Because it inherits from Base, and has the attribute __tablename__ set, when this class is executed, because of underlying metaclasses, Table object (the SQL Alchemy core Table) gets created internally. You can see this Table like so:
User.__table__

Note that there is no __init__ method. declarative Base will give you a convenience __init__ method (the constructor method) that takes as keyword arguments all names in the User class like id, name and fullname

The Mapper object mediates the relationship between User and the “user” Table object. User.__mapper__
The mapper is a sqlalchemy object that represents the linkage of a Class to a Table. It has all kinds of info stored in it about how this linkage is configured. It has a lot of informational methods. Normally we don’t need to deal with the mapper directly.

User has a default constructor, accepting field names as arguments.
>>>my_user = User(name='hello', fullname='World World')

Note that above we didn’t set the id field explicitly. The “id” field is the primary key, which gets automatically assigned to None if we didn’t set it explicitly. This makes sense because this object still does not represent a database row and does not exist yet in the database. Its still just a Python only object.
>>>print(my_user.name, my_user.fullname)
(‘hello’, ‘World World’)
>>>print(my_user.id)
None

“id” will have a value attached to it when the object actually gets flushed to database and database assigns the id to it. This relates to a concept called Identity Map discussed below in Identity Mapping. Until an object goes into persisted state (due to a flush to database), the identity map of that object is empty. On flush, its identity map gets filled to whatever id was generated by database.

Metadata object is here too, available from the Base.

Session

To persist and load User Objects from the database, we use a Session object.
>>>from sqlalchemy.orm import Session
>>>session = Session(bind=engine)
In the core, we dealt a lot with the engine. In the ORM, we don’t deal directly with the engine. We create it and then put it in the Session object. And we don’t deal with enigne anymore. We don’t say engine.execute() or engine.connect(). We now work with a new interface called Session which is an ORM object. If you are not using the ORM, you don’t need to use a Session. So, like in core, the engine is a facade over the Python DBAPI, in the ORM the Session is a facade over the engine.

New objects are placed into the Session using add()
>>>my_user = User(name='hello', fullname='World World')
The object my_user is now called a transient object as it has not yet been added to a session.
>>>session.add(my_user)
The object my_user is now called a pending object as it has been added to a session and awaiting a flush to the database.

Note that lazy loading pattern is extensively used in ORM. Because of this, the Session will flush pending objects to the database before each Query.

Here, when you added a new user to session, there was no connection made to the database. my_user was still sitting in ORM as pending object. The first time a connection was made was when the first query was made. This caused a flush of all pending objects, before the execution of the query. Also, by default, this entire process of flushing and executing the query happens inside transaction. It can be changed, but this is the default.

Cool fact: query = session.query(User).filter_by(name='john') is just a query object. To make it execute, you need to put a .all(), .first() etc after it. The problem with this is that it will put the entire result set in memory. You can also lazy execute it like so:

here the for-in acts as a context manager and gets one row at a time.

Identity Mapping

The Session maintains a unique object per identity. So, in the code below, my_user and existing_user are the same object.

l
Note: we used python is for equality check. We avoid == because __eq__ operator can be overloaded where is cannot be overloaded. So this gives you the answer beyond doubt that both are indeed the same object.

This concept of my_user and existing_user being same is called Identity Mapping. This is a major design pattern and a standout feature of SQL Alchemy. Sql Alchemy maintains an identity map of objects flushed to the database and when you try to get a row from database, the ORM checks if the identity key of the requested object is already present in the identity map and gives you the same object instead of creating a new object and returning.

Note, if we create two objects with same primary key, we will only receive error when they are flushed, because it is at that point all constraint checking is done.

Also, if you notice, flush happens in one transaction. So, if a failure happens anywhere, whole transaction will be rolledback. If we wish to avoid this, we can create Safe Points and transaction will only be rolled back to that point (given error happened after safe-point).

Here we add multiple objects to be pending for flush. Just call add_all().

Now let’s modify my_user from previous step. This is an existing persistent object (meaning it has been flushed to the database). So now my_user is marked as dirty. Objects that are dirty means those objects have had some state changed.
>>>my_user.fullname = "mongo spam IV"

At this point nothing has happened. No flush has been made. This is a good point to look at the session. We list all session objects that are in dirty state. Objects that are dirty means those objects have had some state changed. These objects mostly trigger an UPDATE statement. However, if there is no net change in values, then there will be no UPDATE statement.

Note: If we had this instead:
>>>my_user.fullname = "mongo spam"
this would have still put the my_user object in dirty state, but there would have been no UPDATE issues during flush because ORM checks if there has been a net change in value before issuing an UPDATE statement.

So, objects that are pending for UPDATE, will be in the dirty list.

We can also get session to tell us objects which are pending (meaning they are not dirty, because these are new objects for which there is no identity map yet) through session.new().

Then we do this:
session.commit()

The whole transaction is now committed. Commit always triggers a final flush of remaining changes.

We saw before that a flush occurs before a query. Flush will also occur right before we commit (session.commit()). This flush will be within the scope of the commit. So, it will first issue UPDATE, then INSERT the three rows and then COMMIT the transaction.

This is the first part of unit of work.

We can also use something called bulk_save_objects like so:
c1 = User(name="spam", fullname="mongo spam I")
c2 = User(name="spam", fullname="mongo spam II")
session.bulk_save_objects([c1,c2])
session.commit()

This will also flush a group of models objects to database. But the difference here is that neither c11 nor c2 were added to the session. So, if after committing, you tried to check any attributes of c1 or c2, you will get nothing. Because they were not added to the Session, their state was invalidated upon commit and was never refreshed as they are not in the session. Doing like this does give a major speed boost if objects are many million, but downside is that you cannot further work with these objects as they are not in session. Also, their state is not refreshed if there are changes to the database. Their related data is also not altered since foreign key checks are only signaled if they are in session. Also, no events will be triggered. Sqlalchemy has many hooks like commit, pre_commit hooks, but bulk_save_objects will not trigger any of them.

Also, note that when it is committing multiple rows, it will also try to sort out the dependencies among the rows. It will first check if there are foreign key dependencies among the committed rows and will create a topological sort based on dependencies and then commit based on that order. It will also group commits based on tables, so all commits in one table happen one after another. Hence the order in which the rows actually get inserted into the database might be different from the way they are inserted via session.add().

Note: INSERT and UPDATE happen before DELETE. The exception is when you want to delete and then insert a row in the same primary key location of existing row. Then the DELETE for that row will execute before the INSERT. This default behavior can be modified though.

After a commit, there is no transaction. Once, the session.commit() is called, the Session forgets all about the current transaction that it just committed. The Session invalidates all data, so that accessing them will automatically start a new transaction and reload from the database.

This makes sense as the transaction just committed would have changed the state of database and only by running a transaction and fetching values again can we know what the current data is. So, what sqlalchemy does is between two transactions, it expires/invalidates all the data. Please note that we still have all the Python objects that we made, but their guts are emptied out. Basically, the context of their attributes are emptied and they have all been affixed with a pattern called expired, i.e. their guts have all been labelled as expired.

If you try to access any attribute of a persistent object, like my_user above, the session will awake again, begin a new transaction and then resuscitate the my_user object and load its contents again.

The print of my_user initiated a select query to load its contents before printing them. This is because we did a session.commit() just before this print, so contents of my_user were emptied and invalidated. They had to be loaded again from database.

Note: This expire on commit default behavior can be turned off. So, now the data will not be invalidated on a commit. This might be useful in some scenarios as it can decrease hits to database but increases risk of getting stale data.

Let’s drive this concept futher.

We make another “dirty” change and then one “pending” change, that we might change our minds about by issuing a rollback

NOTE: There has been a flush to the database but there has been no commit. You will not see any COMMIT anywhere in the generated SQL. Hence, if we so wished, we can issue a rollback() to undo these changes. This is still a pending transaction. So, let’s undo by doing rollback()

This ROLLBACK will also end the current transaction, exactly like session.commit() also ends the current transaction.

If we try to now do my_user.fullname, it will again initiate a query to database. This makes sense because we just did a rollback and this changed the state of database again (rollback discarded all the database updates/writes done by the transaction.. this undoing is also a state change). So we fetch data again from database for the query.

We see that the fullname is back to what it was. The change we did has been discarded. The pending object another_user has been evicted from session.

In other words, objects like my_user are really a proxy for a row in database. This proxy object only has valid object when it is inside of a transaction. When a transaction ends, the contents of this proxy object are invalidated and sql alchemy has no idea what that data is. If we tried to query the proxy object again, it will have to start a new transaction and fetch the row it corresponds to, to populate itself.

ORM Querying

The attributes on our mapped class act like Column objects, and produce SQL expressions.

Caveat here is that the object User.name is not actually a Column but acts like a column (its actually InstrumentedAttribute):

Also, this “pseudo Column” acts like a ClauseElement. What this means is that it can be used in pattern matching in .like() clause like so:
query = session.query(User).filter(User.fullname.like('%john%')).first()
This will return the first user whose full name has john in it.

There are lot of other ClauseElement methods like:between(cleft,cright), distinct(), in_([list]), is_(None), contains('string'), endswith('string'), startswith('string'), ilike('string')

Another note about .query() is that it acts a lot like the core .select() except that .query() gives a lot more functionality. Like here we are selecting the User entity

Note: instead of saying conn.execute(), the query has all the execution methods in it. So, saying query.all() actually executes the query and fetches all the required items. Also, unlike in core.select(), where we got tuple objects back, we now get User objects back.

Query can also return individual columns.

Note that each row is returned as a sort of a named-tuple. It is actually a KeyedTuple but it acts like a named-tuple in the sense that tuple values can be accessed by their column names.

You can also mix entities and columns together

So, here we queried for an entity and a column at the same time and we get them back as a list of named-tuples like objects.

This feature of mixed entity and column helps us do this cool thing:

Here we got a dict of all users with their id as keys for dict.

Also, array indexes will OFFSET to that index and LIMIT by one

Notice that LIMIT is 1 and OFFSET is 2 (meaning it returns the third element)

For array slice too, its similar

This won’t work if array indices are -ve because -ve array slice has no meaning in this context

the WHERE clause is either filter_by(), which is convenient (note that you don’t need to say User.fullname):

where .filter() is the full-blown expression:

conjunctions can be passed to filter() as well:

you can method chain multiple filter to do AND in the WHERE clause

.query() has some variations like:
session.query(User).all() which will return all the rows in User. If we use session.query(User).first(), it will generate the SQL with LIMIT set to 1. This is an optimization so that the DBAPI does not load the entire table. There is one more variant of .first() called .one()

The specialty of query.one() is that it will return the first row and verifies that there’s one and only one.

This is really hand when we are querying for something unique, for which there should be only one value returned. If multiple values are returned, it will throw an exception.

If you query.one() for a row that does not exist, you will get an exception that NoResultFound.

If you query.one() for a row and multiple rows are found, you will get an exception that MultipleResultsFound.

So, for all the cases where you know that you need to get one and only one row, use query.one()

Advanced ORM

In the above code, User class is familiar. We have added another class called Address. This class is dependent on User via foreign-key relationship. The logic is that every user can have 0 or more addresses associated to him. That is every user in our database will have a list of addresses associated to him.

Notice, the verbose way in which this relationship is described.
user_id = sa.Column(sa.Integer, sa.ForeignKey('user.id')) is the foreign key column in Address. It specifies that Address will have a foreign key column called user_id which will reference id field in user table of the database. This is the forward reference.

The other side of the foreign key reference, the backward-reference is also specified in the same dependent Address class. This is in this line:
user = relationship('User', backref='addresses')
Notice, this is not a Column. This is instead an attribute Address class. This user attribute in Address class specifies the relationship to the User class. It has a directive called backref. What this means is that User class will have a collection of addresses attached to it. So, you’ll have Address.user and User.addresses

Hence Address.user will point to the User entity related to the address object. User.addresses will be the list of Address entities related to the User.

So, we have here, in a very verbose way, described:

  • How the two classes are related (via the user = relationship('User', backref='addresses'))
  • separately from

  • How the two database tables are related (via user_id = sa.Column(sa.Integer, sa.ForeignKey('user.id'))
  • SqlAlchemy also does not assume that a database model will have a primary key called id and it will be integer. This makes it verbose, but by default, that is how it is. You can automate primary-key creation using mixins but by default, it is verbose and you have to explicitly specify your primary key.

    So, lets create all tables by calling Base.metadata.create_all(engine). This will check if a table exists. If it does, creation is ignored, if it does not, it is created.

    If you will list any user in the User table, you will see that now that user has gained an empty “addresses” collection.

    Lets add some addresses to ‘john galt’:

    .addresses is a python list. So, you can use the regular python methods like .append() to add items to the list.

    Also, whenever you add an item to the list addresses associated with a User entity, an append event gets fired and that newly added Address entity automatically also gets set a foreign key reference to the User entity. You can actually catch these events. There are similar events for all sorts of features.

    We will now see a feature of sqlalchemy called Save-Update cascade.

    We create a transient User entity and attach addresses to it. In this case new_user. We only add this entity to the session. When we do session.new, we see that related Address entities have been added to the session as well. This is save-update cascade. Meaning that you have this object that you added to the Session. Then sqlalchemy will traverse all other objects that are mapped to the newly added object and add them to the session. This is done recursively so if this map is multiple hierarchies with object related to another which is related to another, they all get added to the session. So, if you have a bunch of objects, with one lead object, like the document object, you just add this document object and the rest of the mapped objects just cascade in. All these cascaded objects will be in pending state and upon a flush, they will all be inserted into the database.

    Finally do session.commit() to commit everything in one shot.

    Note that here, we never specified primary keys for any of the Address entites that got added. ORM took care of auto-incrementing them when adding to the database.
    So, you can see the time saved here. ORMs really shine on the persistence side. You just added one lead object to the session and called session.commit() and ORM took care of the rest. It really saved you a lot of time otherwise spent in manually saving everything to database, manually setting the foreign-key relationships for each new update etc.

    Now, since the transaction got committed, new_user.addresses will actually start a new transaction and read from the database again. But until his transaction ends, these addresses are cached in memory. If you did new_user.addresses again within the same transaction, the cached values will be returned.

    In a web application, within a request, its generally good to call session.close(). This will automatically result in a rollback, so you don’t have to. This is when during the request, we did only select queries. There is nothing to commit. So, rollback won’t cause any state change in the database. This is because connections are pooled and when we put the connection back in the pool, we want to ensure rollback was called before because connection can have state in it. If the connection wasn’t committed, it can have locks on tables that you want to rollback. So, when you send the connection back to pool, at the very least it has to roll back or you commit it.

    Collections and references are updated by manipulating objects, not primary/foreign keys. Let’s see an example.

    Here we have two user objects with their collections of addresses. We now want to assign one of the addresses of john_galt to new_user. We can do this by directly changing the foreign key field (user_id) but then sqlalchemy won’t be able to update the addresses collection of both the users. Instead we do it like this:

    The key part is this:>>> john_galt.addresses[2].user = new_user
    This is what causes the change and now the objects are updated properly.

    Now we do commit and all the foreign keys are automatically set.

    So, you could have worked both ways but sqlalchemy is opinionated and it wants you to work in that direction.

    Joins

    Implicit Joins

    Query can select from multiple tables at once. Below is an implicit join. (implicit means that term JOIN is not present in the generated SQL)

    Notice the WHERE user.id = address.user_id line in the generated SQL. This is what creates an implicit join. If we didn’t have this, we would have simply received a cartesian product of the two tables.

    Explicit Joins

    join() is used to create an explicit JOIN.

    BAD WAY first. This is very error prone:

    This approach will only work for simple joins. Very error prone. Avoid. This will only give the right result when there is no ambiguity in joining the two tables with just table names provided. If there is any ambiguity involved, and requires further detail specification on how to join, it will return bad results.

    BETTER WAY:

    Note, there is a convention here: join will by default look at the first entry in this list session.query(User,Address), which is User. And this will be the left side of the JOIN. Notice the line FROM user JOIN address in generated SQL. user is on left side of JOIN.

    BEST WAY:
    The most succinct and accurate way to join() is to use the relationship()-bound attribute to specify ON.

    Inside a join query, either User or Address may be referred to anywhere in the query.

    If we want to break the convention of .query(User,Address), User being the left side as it is first element. We can break this convention by specifying an explicit FROM using select_from()

    Notice FROM address JOIN user in generated SQL. address is on left side of JOIN.

    A query that refers to the same entity more than once in the FROM clause requires aliasing

    Here only those users will be chosen who have both addresses ‘m.reese@r.com’ and ‘john_galt@galt.com’

    Note that we didn’t have to create aliased names like address_1 and address_2 as we see in the generated SQL. Its taken care of. We just have to create aliased objects.

    We can also join with subqueries. subquery() returns an alias construct for us to use.

    Here we have an example where we count the number of addresses each user has:

    Note, we need to use COALESCE because we are using OUTERJOIN and we want to avoid having null.

    Now lets load up the query in python shell and execute:

    Notice: .func() is a core function that lets us run any SQL function like SUM, COUNT, COALESCE etc

    func is not a module. Its is a module generator. In sqlalchemy, func is aware what database it is connected to and will call the SQL functions native to that database.

    .subquery() object when executed, returns a set of rows and can be used inside a query like a table. Note that query object subq is turned into a subquery by calling subquery() at the end in subq = session.query(func.count(Address.id).label('count'), User.id.label('user_id')).join(Address.user).group_by(User.id).subquery()

    Eager Loading

    The N plus one problem refers to the many SELECT statements emitted when loading collections against a parent result.




    No Comments


    You can leave the first : )



    Leave a Reply

    Your email address will not be published. Required fields are marked *