PostgreSQL Basics

Schema

PostgreSQL Schema:

  • Exactly like namespaces in SQL Server (dbo in SQL Server is equivalent to public in PostgreSQL. The default namespace.)
  • Logical container of tables and other database objects
  • You can have multiple schemas in a database

Schemas (or namespaces) are useful in production environment in that we can assign users to them and give different permissions to schemas to restrict access to objects inside schema. For example we can make a “finance” schema and add all database objects concerning finance there, as well as everyone in the finance department. You would not want this kind public because this contains confidential information. Schemas are good in that they act as a logical container to hold database related objects. Inside a schema, you can further lock down specific table to even smaller group of users within that schema.

Tablespace

A tablespace is a location on disk where PostgreSQL stores data files containing database objects e.g., indexes., and tables. PostgreSQL uses a tablespace to map a logical name to a physical location on disk.

PostgreSQL comes with two default tablespaces:

  • pg_default tablespace stores all user data (i.e. any data related to the users).
  • pg_global tablespace stores all global data (i.e. system related data).

Views

They are virtual tables that are used to help simplify complex tables. We can also use views as a kind of security layer to protect certain records from others.

Views are treated just like a table. You can execute queries on a view, just like you would on a table.

You can use views to hide information or protect certain parts of the table. We usually create view from a table.

Function

A function is a reusable block of SQL code. The code can return the scalar value of a set of rows. Function can also return composite objects.

So, if there are repetitive tasks that you perform, build them into a function. Then just execute that function to perform those tasks.

Cast and Operators

Use cast to convert one data type into another data type. Cast are generally used with functions to perform conversion. Postgres has custom cast but you can create custom casts to override default casts.

Operators in postgres are essentially a symbolic function. Postgres allows you to define your own custom operators. Custom casts are listed under “Casts” in the gui client. Custom operators are created inside the schema and will be listed there in the gui client.

Sequences

Sequences in postgres are used to manage columns that have autoincrement on them, like the id column. When you create a sequence, the naming convention is that you have _seq at the end in the sequence name. Sequences are listed inside schema in gui client.

Extensions

Extensions in postgres are containers to wrap other objects to manage them. They are listed under Schema in the GUI client. The default extension is called plpgsql.

Char

char(n) : fixed length string of upto n characters. If the stored string is smaller, padding is applied to fill the remaining spaces.

varchar(n): variable length string of upto n characters. No padding applied for smaller strings.

text : unlimited length string.

Number

There are two kinds of numbers in postgres.

Integers:

  • small integer (smallint) : 2 bytes in length
  • Integer (int) : 4 bytes in length
  • Serial : like int except that the values are directly populated into the column.

So, if you have an id column and you want its values to be auto-populated, use serial datatype.

Floats:

  • float(n) : normal float with 4 bytes of precision
  • real or float8: double precision with 8 bytes of precision
  • Numeric or numeric(p,s) : real number with p digits and s digits after decimal point.

Temporal Data Type

This kind of data type stores date and time values. 5 kinds of these:

  1. date : stores date values only
  2. time : stores time of day values
  3. timestamp : stores date and time
  4. interval : stores periods of time
  5. timestamptz : store both timestamp and time zone data

Array Data Type

Whereas a variable like int or float can contain only one value, and array can contain multiple values. You can store an array of string, an array of integers, etc.

ADVANCED BASICS


Subqueries

Postgres executes queries with subqueries like so: It first executes the subquery. Then the result of the subquery is used to run the outer query. So, a query with an inner query causes database to be hit twice.

Window Functions

Also known as analytic functions.

  • Windows functions compute an aggregated value based on a group of rows
  • Key difference between analytic functions and aggregate functions is that unlike aggregate functions, they can return multiple rows for each group. While aggregate functions return a single result.

The term “window” basically describes the set of rows on which a window function operates. So the window function will return a value from the rows in a window. The window function also performs a calculation across a set of table rows that are somehow related to the correct window.

syntax: window_function(arg1, arg2) OVER (PARTITION BY expression ORDER BY expression)

You can use any built-in or user defined aggregate function as a window function. In addition, PostgreSQL has some extra built-in window functions that can be used here as well.

Next we have the OVER clause. This clause is very important and it is this that makes the whole thing analytic (or a window function). What the OVER does is that it determines the partitioning and the ordering of a row-set before the associated window function is applied. So, the OVER clause defines a window or a user-specified set of rows within a query result-set. And then the window function will compute a value for each row in the window. The PARTITION BY is used to divide the rows into groups or partitions. If you omit the PARTITION BY clause, the whole result-set is then treated as a single partition. The we have ORDER BY clause. This ORDER BY clause is used to sort the rows within a partition. So, what happens is that the window function will process the rows in the order specified by the ORDER BY clause. This is particularly important for the window functions that are sensitive to the order like FIRST_VALUE and LAST_VALUE.

Lets create some tables with sample data to play with window functions.

Before going forward, lets see the key difference between simple aggregate function and window function:

Here, the aggregate function can only return one value. It performs the operation on all the rows within a table but returns a single value.

Now, lets see how to use exactly the same AVG aggregate function as a window function. Aggregate function behaves in the opposite direction. It can return multiple values on multiple rows.

here, AVG aggregate function is working as a window function that operates on a set of rows specified by the OVER clause. So this means that each set of rows is called a window. So this gives the query an analytic edge. When you run the above query, you will see multiple values for each group.

So what happens here behind the scenes is that postgresql will sort the rows by the values in the group_name column and then PARTITION BY will partition them into groups. Then the AVG function calculates the average price for each product group.

A window function will perform calculation on the result set after the JOIN (or in cases you have WHERE or GROUP BY or HAVING) is complete. So, it will do all this but this will be done before the final ORDER BY clause. So, if you have several ORDER BY clauses, the window function will kick in before the final ORDER BY clause. But it would do so after a JOIN (i.e. once a result has been calculated after a join) or where there is a WHERE clause or where there is a GROUP BY clause or where there is a HAVING clause. Once all that is completed, it will then apply the window function.

Row Number: Assign integer values to rows in a result set

Note: to not complicate things, I’ve not used CTE here. But the last query can be done with CTE as well.

Also, please don’t get distracted by prices, sorted_price. Postgres just requires that all subqueries be aliased to some name. They are not used in the query.

Rank function: Assign ranking within an ordered partition

If the values of two rows are the same, they are given the same rank, with the next ranking skipped.

Dense_Rank function: Assign ranking within an ordered partition

If the values of two rows are the same, they are given the same rank, with the but the next ranking is not skipped. In other words, ranks are consecutive.

First_Value : Returns the first value from the first row of the ordered set

For example, if you wanted to return the lowest price per product group, you can use first_value() function to achieve this

Last_Value : Returns the last value from the last row of the ordered set

For example, if you wanted to return the highest price per product group, you can use last_value() function to achieve this

Note: Since we want the last entry in each result set partition, its important to include range between unbounded preceding and unbounded following.
This extends the search space from first row to last row in that partition. If you don’t specify this, the window function will implicitly end at the current row. Unbounded preceding refers to the first row. Unbounded following refers to the last row. So, the rows preceding and following the current row are the rows that will be retrieved.

Lag Function

This function has the ability to access data from the previous row.

LAG(expression [,offset][,default])

expression can be a column or an expression to compute the return value.

offset refers to the number of rows preceding the lag function of the current row.

default value is null. The default refers to the return value if the offset is outside of the scope of the window. Default is used if offset is outside the scope of the window. The normal value of default is null. Null means value is not known.

Here is display the previous price as well as the difference of price between current and previous row.

LEAD Function

Lead analytic function allows you to access data from the next row.

LEAD(expression [,offset][,default])

expression can be a column or an expression to compute the return value.

offset refers to the number of rows following the lead function of the current row.

The default refers to the return value if the offset is outside of the scope of the window. Default is used if offset is outside the scope of the window. The normal value of default is null. Null means value is not known.

Here is display the next price as well as the difference of price between current and next row.

PostgreSQL Backup Tools

pg_dump (single database backup – psql based)
pg_dumpall (all database backup – psql based)
pgadmin (gui based)

pg_dump

We will backup single database, called test_database. Use the terminal to first create a directory where the backed up dump file will goto. Let’s say that this is called backupdir and located here: /home/helios/backups/backupdir

-U : user used to connect to postgres server

-W : forces pg_dump to prompt for the password before connecting to the postgres server, before allowing the backup to proceed

-F : specified the format of the output file (we can have it in .tar or .sql or even a custom format). NOTE: .sql is also called Plain or Plain-text format.

t : represents the .tar format of the backup

Restoring a database from a dump

The easiest way is to use the pg_restore tool. Note that restore also depends on how the dump was created. In this case, lets restore the database into a database called example_database.

I have already uncompressed the dump into a tarball. Its here ~/postgres/temp/example_database.tar

I also do cd ~/postgres/temp

Log into postgres: sudo su postgres

then enter cli: psql

first create a database. CREATE DATABASE example_database;

you can do \l to check the created database.

Exit from cli: \q

pg_restore -U postgres -d example_database example_database.rar

Note that I am in the same directory as example_database.rar. It should be extracted and restored.

To check:psql




No Comments


You can leave the first : )



Leave a Reply

Your email address will not be published. Required fields are marked *