- Exactly like namespaces in SQL Server (dbo in SQL Server is equivalent to public in PostgreSQL. The default namespace.)
- Logical container of tables and other database objects
- You can have multiple schemas in a database
Schemas (or namespaces) are useful in production environment in that we can assign users to them and give different permissions to schemas to restrict access to objects inside schema. For example we can make a “finance” schema and add all database objects concerning finance there, as well as everyone in the finance department. You would not want this kind public because this contains confidential information. Schemas are good in that they act as a logical container to hold database related objects. Inside a schema, you can further lock down specific table to even smaller group of users within that schema.
A tablespace is a location on disk where PostgreSQL stores data files containing database objects e.g., indexes., and tables. PostgreSQL uses a tablespace to map a logical name to a physical location on disk.
PostgreSQL comes with two default tablespaces:
pg_defaulttablespace stores all user data (i.e. any data related to the users).
pg_globaltablespace stores all global data (i.e. system related data).
They are virtual tables that are used to help simplify complex tables. We can also use views as a kind of security layer to protect certain records from others.
Views are treated just like a table. You can execute queries on a view, just like you would on a table.
You can use views to hide information or protect certain parts of the table. We usually create view from a table.
A function is a reusable block of SQL code. The code can return the scalar value of a set of rows. Function can also return composite objects.
So, if there are repetitive tasks that you perform, build them into a function. Then just execute that function to perform those tasks.
Cast and Operators
Use cast to convert one data type into another data type. Cast are generally used with functions to perform conversion. Postgres has custom cast but you can create custom casts to override default casts.
Operators in postgres are essentially a symbolic function. Postgres allows you to define your own custom operators. Custom casts are listed under “Casts” in the gui client. Custom operators are created inside the schema and will be listed there in the gui client.
Sequences in postgres are used to manage columns that have autoincrement on them, like the id column. When you create a sequence, the naming convention is that you have _seq at the end in the sequence name. Sequences are listed inside schema in gui client.
Extensions in postgres are containers to wrap other objects to manage them. They are listed under Schema in the GUI client. The default extension is called plpgsql.
char(n) : fixed length string of upto n characters. If the stored string is smaller, padding is applied to fill the remaining spaces.
varchar(n): variable length string of upto n characters. No padding applied for smaller strings.
text : unlimited length string.
There are two kinds of numbers in postgres.
- small integer (smallint) : 2 bytes in length
- Integer (int) : 4 bytes in length
- Serial : like int except that the values are directly populated into the column.
So, if you have an id column and you want its values to be auto-populated, use serial datatype.
- float(n) : normal float with 4 bytes of precision
- real or float8: double precision with 8 bytes of precision
- Numeric or numeric(p,s) : real number with p digits and s digits after decimal point.
Temporal Data Type
This kind of data type stores date and time values. 5 kinds of these:
- date : stores date values only
- time : stores time of day values
- timestamp : stores date and time
- interval : stores periods of time
- timestamptz : store both timestamp and time zone data
Array Data Type
Whereas a variable like int or float can contain only one value, and array can contain multiple values. You can store an array of string, an array of integers, etc.
Postgres executes queries with subqueries like so: It first executes the subquery. Then the result of the subquery is used to run the outer query. So, a query with an inner query causes database to be hit twice.
Also known as analytic functions.
- Windows functions compute an aggregated value based on a group of rows
- Key difference between analytic functions and aggregate functions is that unlike aggregate functions, they can return multiple rows for each group. While aggregate functions return a single result.
The term “window” basically describes the set of rows on which a window function operates. So the window function will return a value from the rows in a window. The window function also performs a calculation across a set of table rows that are somehow related to the correct window.
PARTITION BY expression
ORDER BY expression)
You can use any built-in or user defined aggregate function as a window function. In addition, PostgreSQL has some extra built-in window functions that can be used here as well.
Next we have the OVER clause. This clause is very important and it is this that makes the whole thing analytic (or a window function). What the OVER does is that it determines the partitioning and the ordering of a row-set before the associated window function is applied. So, the OVER clause defines a window or a user-specified set of rows within a query result-set. And then the window function will compute a value for each row in the window. The PARTITION BY is used to divide the rows into groups or partitions. If you omit the PARTITION BY clause, the whole result-set is then treated as a single partition. The we have ORDER BY clause. This ORDER BY clause is used to sort the rows within a partition. So, what happens is that the window function will process the rows in the order specified by the ORDER BY clause. This is particularly important for the window functions that are sensitive to the order like FIRST_VALUE and LAST_VALUE.
Lets create some tables with sample data to play with window functions.
create table product_groups ( group_id serial primary key, group_name varchar(255) not null ); create table products ( product_id serial primary key, product_name varchar(255) not null, price decimal(11,2), group_id int not null, foreign key (group_id) references product_groups (group_id) ); insert into product_groups (group_name) values ('Smartphone'), ('Laptop'), ('Tablet'); insert into products (product_name, group_id, price) values ('Microsoft Lumia', 1, 200), ('HTC One', 1, 400), ('iPhone', 1, 500), ('Nexus', 1, 900), ('HP Elite', 2, 1200), ('Lenovo Thinkpad', 2, 700), ('Sony VAIO', 2, 700), ('Dell Vostro', 2, 800), ('iPad', 3, 700), ('Kindle Fire', 3, 150), ('Samsung Galaxy Tab', 3, 200);
Before going forward, lets see the key difference between simple aggregate function and window function:
select avg(price) from products;
Here, the aggregate function can only return one value. It performs the operation on all the rows within a table but returns a single value.
Now, lets see how to use exactly the same AVG aggregate function as a window function. Aggregate function behaves in the opposite direction. It can return multiple values on multiple rows.
select avg(price) from products; select product_name, price, group_name, avg(price) over (partition by group_name) from products inner join product_groups using (group_id);
here, AVG aggregate function is working as a window function that operates on a set of rows specified by the OVER clause. So this means that each set of rows is called a window. So this gives the query an analytic edge. When you run the above query, you will see multiple values for each group.
So what happens here behind the scenes is that postgresql will sort the rows by the values in the group_name column and then PARTITION BY will partition them into groups. Then the AVG function calculates the average price for each product group.
A window function will perform calculation on the result set after the JOIN (or in cases you have WHERE or GROUP BY or HAVING) is complete. So, it will do all this but this will be done before the final ORDER BY clause. So, if you have several ORDER BY clauses, the window function will kick in before the final ORDER BY clause. But it would do so after a JOIN (i.e. once a result has been calculated after a join) or where there is a WHERE clause or where there is a GROUP BY clause or where there is a HAVING clause. Once all that is completed, it will then apply the window function.
Row Number: Assign integer values to rows in a result set
Note: to not complicate things, I’ve not used CTE here. But the last query can be done with CTE as well.
-- simple query with row_number select product_id, product_name, group_id, row_number () over (order by product_id) as num from products; -- simple query that also uses partition by select product_id, product_name, group_id, row_number () over (partition by group_id order by product_id) as row_num from products; -- we use a subquery to select only distinct prices and then give them row numbers select price, row_number () over (order by price desc) as nth_price from (select distinct price from products) as prices -- get nth most expensive product. select price from (select price, row_number () over (order by price desc) as nth_price from (select distinct price from products) as prices) as sorted_price where nth_price=3
Also, please don’t get distracted by prices, sorted_price. Postgres just requires that all subqueries be aliased to some name. They are not used in the query.
Rank function: Assign ranking within an ordered partition
If the values of two rows are the same, they are given the same rank, with the next ranking skipped.
select pr.product_name, pg.group_name, pr.price, rank() over (partition by pr.group_id order by pr.price) from products as pr inner join product_groups as pg using (group_id)
Dense_Rank function: Assign ranking within an ordered partition
If the values of two rows are the same, they are given the same rank, with the but the next ranking is not skipped. In other words, ranks are consecutive.
select pr.product_name, pg.group_name, pr.price, dense_rank() over (partition by pr.group_id order by pr.price) from products as pr inner join product_groups as pg using (group_id)
First_Value : Returns the first value from the first row of the ordered set
For example, if you wanted to return the lowest price per product group, you can use first_value() function to achieve this
select product_name, group_name, price, first_value(price) over(partition by group_id order by price) from products inner join product_groups using (group_id);
Last_Value : Returns the last value from the last row of the ordered set
For example, if you wanted to return the highest price per product group, you can use last_value() function to achieve this
select product_name, group_name, price, last_value(price) over(partition by group_id order by price range between unbounded preceding and unbounded following) from products inner join product_groups using (group_id);
Note: Since we want the last entry in each result set partition, its important to include
range between unbounded preceding and unbounded following.
This extends the search space from first row to last row in that partition. If you don’t specify this, the window function will implicitly end at the current row. Unbounded preceding refers to the first row. Unbounded following refers to the last row. So, the rows preceding and following the current row are the rows that will be retrieved.
This function has the ability to access data from the previous row.
expression can be a column or an expression to compute the return value.
offset refers to the number of rows preceding the lag function of the current row.
default value is null. The default refers to the return value if the offset is outside of the scope of the window. Default is used if offset is outside the scope of the window. The normal value of default is null. Null means value is not known.
Here is display the previous price as well as the difference of price between current and previous row.
select product_name, group_name, price, lag(price,1) over (partition by group_name order by price) as prev_price, price - lag(price,1) over (partition by group_name order by price) as cur_price_diff from products inner join product_groups using (group_id);
Lead analytic function allows you to access data from the next row.
expression can be a column or an expression to compute the return value.
offset refers to the number of rows following the lead function of the current row.
The default refers to the return value if the offset is outside of the scope of the window. Default is used if offset is outside the scope of the window. The normal value of default is null. Null means value is not known.
Here is display the next price as well as the difference of price between current and next row.
select product_name, group_name, price, lead(price,1) over (partition by group_name order by price) as next_price, price - lead(price,1) over (partition by group_name order by price) as cur_next_diff from products inner join product_groups using (group_id);
PostgreSQL Backup Tools
pg_dump (single database backup – psql based)
pg_dumpall (all database backup – psql based)
pgadmin (gui based)
We will backup single database, called test_database. Use the terminal to first create a directory where the backed up dump file will goto. Let’s say that this is called backupdir and located here:
pg_dump -U postgres -W -F t test_database > /home/helios/backups/backupdir/test_database.tar
-U : user used to connect to postgres server
-W : forces pg_dump to prompt for the password before connecting to the postgres server, before allowing the backup to proceed
-F : specified the format of the output file (we can have it in .tar or .sql or even a custom format). NOTE: .sql is also called Plain or Plain-text format.
t : represents the .tar format of the backup
Restoring a database from a dump
The easiest way is to use the pg_restore tool. Note that restore also depends on how the dump was created. In this case, lets restore the database into a database called example_database.
I have already uncompressed the dump into a tarball. Its here
I also do
Log into postgres:
sudo su postgres
then enter cli:
first create a database.
CREATE DATABASE example_database;
you can do
\l to check the created database.
Exit from cli:
pg_restore -U postgres -d example_database example_database.rar
Note that I am in the same directory as example_database.rar. It should be extracted and restored.