
ADVANCED QUERIES
Joining Tables in SQL
A join allows data to be selected from two or more tables by one query. A join returns all possible combinations of rows from the specified tables that satisfy the WHERE clause.
Indicate the tables to be joined in the FROM clause:
table-name join-type JOIN table-name ON join-condition
where join-type is one of:
INNER, LEFT OUTER, RIGHT OUTER or FULL OUTER
The "join-condition" is a condition that specifies a relationship
between the tables (usually, a column in one table equal to a column in
another
table).
Examples:
In an INNER join, only matched rows are returned. The following SELECT command uses an inner join, with the ON clause, to display manager number, name, portfolio number and cash from the MANAGER and PORTFOLIO tables:
select manager.managerno,managername,portfoliono,cash
from manager inner join portfolio
on manager.managerno = portfolio.managerno;
The result of this SELECT is:
MA
MANAGERNAME
PORT
CASH
-- --------------- ---- ----------
M2
Brown
P001 25000
M3
Jones
P004 10000
M2
Brown
P005 50000
M3
Jones
P008 37500
The SELECT statement can include a WHERE clause to restrict the rows
returned.
To display the stock symbol and current value (numbershares x
current
price) in portfolio P008:
select holding.stocksymbol,numbershares *
currentprice
from holding join stock
on holding.stocksymbol = stock.stocksymbol
where portfoliono = 'P008';
Note that the keyword INNER is optional; an inner join is the default.
select managername,
p.portfoliono,
numbershares, currentprice
from holding h join stock s on h.stocksymbol =
s.stocksymbol
join portfolio p on
p.portfoliono=h.portfoliono
join manager m on
m.managerno=p.managerno
order by managername;
The first join can reference columns in
holding and
stock. The second join can reference
columns in all 3 tables and the third, in all 4 tables.
Natural Join
The
joining column(s) don’t need to be
named. The columns with the same names,
in both tables, are used for the join.
If more than one column matches, in the joined tables, the using clause could be used to specify
the column which should be used for the join.
Outer Joins
The join performed in the examples above is an inner join. In an inner join, rows are only included in the result if there is a match. In the first example, manager M1 would not appear in the result as there are no portfolios for manager M1 in the Portfolio table. To include all rows from the Manager table even if they have no match in the Portfolio table, an outer join must be used.
The following SELECT statement uses a left outer join to include all rows from the Manager table (table on the left in the from clause) in the display; rows not containing any match in the Portfolio table will be assigned a null value in the join.
select manager.managerno,managername,portfoliono,cash
from manager left outer join portfolio
on manager.managerno = portfolio.managerno;
The result of this SELECT would be:
MA
MANAGERNAME
PORT
CASH
-- --------------- ---- ----------
M2
Brown
P001 25000
M3
Jones
P004 10000
M2
Brown
P005 50000
M3
Jones
P008 37500
M1 Smith
A right outer join will include all rows from the table on the right even if there is no match.
A full outer join includes all rows from both tables even if there is no match.
Cross Join
A cross join produces the "product" of two tables - all rows from one
table are joined to all rows of the second table. For example:
select
manager.managerno,managername,portfoliono,cash
from manager cross join portfolio;
would produce the following output:
MA
MANAGERNAME
PORT
CASH
-- --------------- ---- ----------
M1
Smith
P001 25000
M2
Brown
P001 25000
M3
Jones
P001
25000
M1
Smith
P004 10000
M2
Brown
P004 10000
M3
Jones
P004 10000
M1
Smith
P005 50000
M2
Brown
P005 50000
M3
Jones
P005 50000
M1
Smith
P008 37500
M2
Brown
P008 37500
M3
Jones
P008 37500
M1
Smith
P011 8000
M2
Brown
P011 8000
M3
Jones
P011 8000
Subqueries
The value or set of values in the search condition of a SELECT, UPDATE or DELETE command can be obtained from another query called a "subquery"; a subquery may also appear in a HAVING clause. The subquery must be enclosed in parentheses.
The following rules apply:
- subqueries must be
placed on the
right of the search-condition
- subqueries can't be used with the BETWEEN or LIKE functions
- a subquery can have only a single item in the SELECT clause
- a subquery can not contain an ORDER BY clause
Examples:
To list the stock number and number of numbershares for holdings managed by manager M2:
select stocksymbol,numbershares
from holding
where portfoliono in
(select portfoliono
from portfolio
where managerno = 'M2');
To display the portfolio number of the holdings with the largest number
of
select portfoliono
from holding
where numbershares =
(select max(numbershares)
from holding
where stocksymbol = 'COS');
Subqueries can be nested: To display the manager name of managers who manage holdings containing stock symbol NTL:
select managername
from manager
where managerno in
(select managerno
from
portfolio
where
portfoliono in
(select portfoliono
from
holding
where
stocksymbol = 'NTL'));
To display the portfolio number of portfolios that contain JDS stock but not NTL stock:
select portfoliono
from portfolio
where portfoliono in
(select portfoliono
from
holding
where
stocksymbol = 'JDS')
and portfoliono not in
(select portfoliono
from
holding
where
stocksymbol = 'NTL');
Multiple-Row Subquery Operators
IN
–
equal to any member in the list
ANY -
<
ANY – less than maximum
> ANY – greater than
minimum
ALL -
<
ALL – less than minimum
> ALL – greater than
maximum
Set Operators: Union, Minus and Intersect
These three operators correspond to the relational algebra operators - union, difference and intersection.
They can be used to combine the results of two queries.
Format:
select-statement {UNION| UNION ALL | MINUS| INTERSECT} select-statement
MINUS: the result table contains all rows from the first select that do
not
have a corresponding row returned by the second select.
INTERSECT: the result table contains rows that are returned by both
select
statements
Example:
Display the stock symbol of all stock that are in a holding with more than 200 shares and have a current price of more than $50.
select stocksymbol from holding where numbershares >200
intersect
select stocksymbol from stock where currentprice > 50;
select portfoliono,
purchaseDate, stocksymbol, numbershares
from holding
union
select portfoliono, to_date(null),
stocksymbol, 0
from holding_history
order by portfoliono;
Correlated Queries
When a subquery is coded as part of a query, the subquery is evaluated only once and the resulting values returned to the outer query. Correlation allows a subquery to be repeated more than once.
For example, suppose the following is required:
Display the portfolio number of all portfolios that contain stock with the number of numbershares greater than the average number of numbershares for that stock.
To evaluate this query, it will be necessary to fix a row in the holding table and then to loop through the rows in the holding table with the same portfolio number and calculate the average numbershares; this value must then be compared to the numbershares in the current row. The subquery to evaluate the average must be repeated for each row in the holding table.
A correlated query is the method used by SQL to code this type of loop. To code a correlated query:
- place a correlation
variable (any
name up to 18 characters beginning with a letter) following the table
name in
the FROM clause
- in the subquery, qualify a variable using the correlation variable
when you
wish to reference a variable in the outer query
Logically, the subquery is repeated for each row in the outer query. The following correlated query would solve the problem above.
select portfoliono
from holding x
where numbershares >
(select avg(numbershares)
from
holding
where
stocksymbol=x.stocksymbol)
SQL performs a correlated subquery when the subquery references a column from a table in the "outer" query. The presence of the alias or correlation variable, X, indicates that this is a correlated query. In evaluating the query, instead of evaluating the subquery once, SQL will fix a row in the HOLDING table (outer query), and proceed to evaluate the subquery. The variable, X.STOCKSYMBOL, represents the value of STOCKSYMBOL in the current row in the outer query. Thus the average is calculated for all holdings with that value for STOCKSYMBOL.
The query will be evaluated as follows:
- SQL examines the
first row of the
HOLDING table; the value for STOCKSYMBOL is ST2
- SQL now evaluates the subquery with that value of STOCKSYMBOL
(X.STOCKSYMBOL); thus the subquery will calculate the average
numbershares for
all rows that have an STOCKSYMBOL value of ST2
- the current row will be returned if the share value is greater than
the
average calculated by the subquery
- SQL then proceeds to the second row of the HOLDING table etc. until
the end
of the HOLDING table is reached
The Exists Operator
The Exists Operator
tests for the
existence of rows in the results from a subquery. For
example:
Select stockname
From stock s
Where not exists
(Select
'x'
From
holding
Where
s.stocksymbol = stocksymbol);
Top-N
Analysis
Top-N Analysis
provides the n
largest or smallest values of a column.
Select [column list],
Rownum
From (Select [column
list]
From
tablename
Order by Top_N_Column)
Where rownum <= N;
Rownum
is a pseudocolumn which assigns a sequential
value starting with 1 to each of the rows returned from a subquery.
Example:
Display the two top earning managers from the
manager table:
Select rownum as
Rank, managername,
salary
From (Select
managername, salary
From manager
Order by salary DESC)
Where Rownum <=2;
Extensions
of Group By
Rollup
– extension
of Group By to produce cumulative aggregates.
Select [column,]
group-function
(column)
From table
[Where condition]
[Group by [Rollup]
group-by-expression]
[Having
having-expression];
Example:
Select portfoliono,
stocksymbol,
sum(numbershares)
From holding
Group by
Rollup(portfoliono,
stocksymbol);
Cube
– extension of
Group By to produce cross-tabulation values with a single select
statement.
Select [column,]
group-function
(column)
From table
[Where condition]
[Group by [Cube]
group-by-expression]
[Having
having-expression];
Example:
Select portfoliono,
stocksymbol,
sum(numbershares)
From
holding
Group by
Cube(portfoliono,
stocksymbol);
Grouping
Function –
differentiate stored Null values from null values created by Rollup or
Cube. The Grouping function returns 0
(stored null value) or 1 (null result of rollup/cube).
Select [column,]
group-function
(column), Grouping (expr)
From table
[Where condition]
[Group by
[Rollup/Cube]
group-by-expression]
[Having
having-expression];
Example:
Select portfoliono,
stocksymbol,
sum(numbershares), grouping(portfoliono) GRP_Port,
grouping(stocksymbol)
GRP_Stock
From
holding
Group by
Rollup(portfoliono,
stocksymbol);
Grouping
Sets –
define multiple queries in same query
Select portfoliono,
stocksymbol,
purchaseDate, sum(numbershares)
From
holding
Group by Grouping
Sets
((portfoliono, stocksymbol), (stocksymbol, purchaseDate));