Database notes


In Postgres, cluster means multiple databases, not multiple instances. One cluster is one instance.

Each database can have multiple schemas. Schemas are containers of objects. We use schemas to divide our objects into groups with logical meaning. We have the default public schema. Then we can have domain specific schemas wile HR etc.

You cannot do koins between tables in two separate databases, even though they are in same instance. Databases are really isolated from eachother. You only do a join through a database link.

Internally, in the host, each database is a directory.

For bulk import and exports, use: COPY

Query Execution plan. Command EXPLAIN allows us to observe the Plan Tree. EXPLAIN ANALYZE gets real data after a real execution.

Heap implements the sequential scan in postgresql.

*******POSTGRES**********
– ORDER BY is usually used at the end of a statement

– GROUP BY : used to divide rows returned from a SELECT statement into groups. You can also apply aggregate functions like SUM and COUNT

– HAVING clause is used to eliminate groups of rows that do not satisfy a specific condition. HAVING is used after GROUP BY.

– TRUNCATE : remove all data from the table

– DROP : delete the table and the data.

– SUBQUERY : query nested inside another query. Used in SELECT, UPDATE and INSERT statements. To construct a subquery, the second query is placed in brackets. You also use WHERE clause as expression in the subquery.You can also use the subquery with IN operator and HAVING operator

So subquery is a query statement where you have another query embedded in the WHERE or HAVING clauses

*************************
Windows Frame – it is a filtered portion of partition
units available for frame are rows and range.
*************************
***Within OVER() we are required to give an ORDER BY clause.
***In scenarios where we don’t care about the ordering, we can do like this: OVER (ORDER BY (SELECT NULL))
***See Example

*************************
Window Functions can only be included within SELECT or ORDER BY clauses.

Functions Available:
Aggregate – COUNT, SUM, MIN, MAX, AVG
Ranking – ROW_NUMBER, RANK, DENSE_RANK, NTILE
Offset – FIRST_VALUE, LAST_VALUE, LEAD, LAG
Statistical – PERCENT_RANK, CUME_DIST, PERCENTILE_CONT, PERCENTILE_DIST

Windows Functions also have FRAMES
ROWS
RANGE
*************************

************************

************************

*************************
*** window funcs with group by ***

*************************
*** Window funs – Calculate running totals/ averages ***
*** Scenario: We want to calculate the running total and average of customers spend ***

**************************
***Lag and Lead are useful for performing trend analysis, in the example I show how we can display a customer spending trend.

***Lag will show the previous value.
***Lead will show the next value.

***Lag and Lead accept multiple parameters as demonstrated in the video:

***LAG([Column], [Offset], [Value if NULL])

***The example of LAG and LEAD in the video can be shown by executing the below SQL query:

**************************
***Window Functions – Calculate Running Differences

******************************
***Show Totals with GROUP BY using OVER