Normalization

Atomicity

An attribute is atomic, if and only if,

Either, its major uses within a database and associated applications are such that it need not be divided further
Or, a type exists for that attribute within the database engine such that component parts of the attribute can be treated as distinct and indivisible attributes.

Examples

The attribute courseID, with values like ECE208, ECE250, ECE356, is not atomic in the context of a university database, since there are other departmental prefixes, such as SE, CS and MATH. However, in the context of a departmental database, where the only departmental prefix is ECE, the attribute courseID is atomic, since the division into course prefixes and course codes does not serve a major purpose given the context of the database.

Additionally, a datetime object in the MySQL database engine can be described as atomic. While it is divisible into second, minute, hour, day, month and year, this object type is defined in the MySQL database and is made up of distinct, indivisible attributes. Hence it can be considered atomic.

Simple Functions

A function is a simple function if and only if it is computable without the use of explicit mapping.

Examples

$\large f(x) = x^2$ is a simple function.

$\large f(x)$ as defined by the table

$\Large f(x)$	$\Large x$
0	0
1	1
2	4
3	9

is not a simple function.

Non-Redundancy

An attribute is non-redundant if and only if it is neither an input to, nor the result of, a simple function.

In contrast, an attribute is redundant if and only if it is either an input to, or the result of, a simple function.

Examples

Consider a section relation defined with attributes as follows:


xxxxxxxxxx
section(course_id, sec_id, semester, year, building, room_number)

The attribute semester is a term code combined with the last two digits of the year. For eg. W21 represents "winter 2021".

Hence, there exists a simple function that can extract the year out of the semester.

This means neither of the attributes semester or year are non-redundant.

Lossless Decomposition

$\large r$ $\large r_1$ $\large r_2$ is lossless if and only if

r = r_{1} ⋈ r_{2}

$\large r_1$ $\large r_2$ $\large r$ .

$\large r(R)$ $\large A_i$ $\large i = 1, 2, 3, ..., 10$ :

R = (A_{1}, A_{2}, A_{3}, . . ., A_{n})

$\large r_1(R_1)$ $\large r_2(R_2)$ $\large r(R)$ are defined as:

\begin{matrix} R_{1} = (A_{i}, A_{j}, . . ., A_{k}, A_{l}) \\ R_{2} = (A_{i}, A_{j}, . . ., A_{p}, A_{q}) \end{matrix}

$\large R_1$ $\large R_2$ $\large A_i, A_j$ $\large R_1$ $\large A_k, A_l$ $\large R_2$ $\large A_p, A_q$ $\large R_1$ $\large R_2$ .

Functional Dependencies

Given attributes in a relation, a functional dependency is a function that uniquely maps the values of a set of attributes to the values of another set of attributes. The notation for functional dependencies is as follows:

a_{1}, a_{2}, . . ., a_{n} \to b_{1}, b_{2}, . . ., b_{m}

Formal Definition

$\large R$ $\large \alpha$ $\large \beta$ $\large R$ :

α \subseteq R, β \subseteq R

Then, the functional dependency,

α \to β

$\large r(R)$ $\large t_1$ $\large t_2$ $\large \alpha$ $\large \beta$ . That is,

t_{1} [α] = t_{2} [α] ⟹ t_{1} [β] = t_{2} [β]

Examples

SIN $\large \to$ Name. This is because a person's social insurance number functionally determines their name. In other words, if we know the SIN, we know their name.

Properties

Identity

$\Large \alpha \to \alpha$

Reflexivity

$\Large \beta \sube \alpha \implies \alpha \to \beta$

Augmentation

$\Large \alpha \to \beta \implies \gamma\alpha \to \beta$

Transitivity

$\Large \alpha \to \beta, \beta \to \gamma \implies \alpha \to \gamma$

Closures

$\large F$ closure $\large F$ $\large F^+$ $\large F$ $\large F \sube F^+$

Attribute-Set Closures

$\large r$ $\large F$ $\large \alpha$ attribute-set closure $\large \alpha^+$ $\large \alpha$ .

Extraneous Attributes

$\large F$ $\large (\alpha \to \beta) \in F$ $\large A \in \alpha$ extraneous $\large \alpha \to \beta$ if and only if,

F ⟺ (F - {α \to β}) \cup {(α - A) \to β}

$\large A$ $\large \alpha \to \beta$ .

Examples

$\large F$ :

F = {A \to C, A B \to C}

$\large B$ $\large AB \to C$ $\large A \to C$ $\large B$ is not needed in that functional dependency.

Canonical Covers

$\large F$ $\large F$ $\large F_c$ such that

$\large f \in F_c$ $\large f$ $\large F_c - f$
there exists functional dependency with extraneous attributes.

Examples

$\large F$ :

F = {A \to B, B \to C, A \to C}

We know that,

A \to B, B \to C ⟹ A \to C

$\large A \to C$ $\large F$ $\large F_c$ defined as,

F_{c} = {A \to B, B \to C}

is a canonical cover.

Keys

Super Keys

$\large R$ $\large r$ $\large \alpha \sube R$ $\large \alpha$ super key $\large r$ $\large \alpha^+ \supe R$ .

$\large \alpha$ $\large r$ $\large \alpha$ .

Candidate Keys

$\large \alpha$ $\large r$ $\large \beta \sub \alpha$ $\large \alpha$ candidate key $\large r$ .

$\large \alpha$ $\large \alpha$ will result in it losing its super key status.

Determining Candidate Key Attributes

$\large R$ $\large r$ $\large F_c$ $\large r$ .

Required Present

$\large f \in F_c$ must be present $\large r$ .

Required Absent 1

$\large f \in F_c$ $\large f \in F_c$ must be absent $\large r$ .

Required Absent 2

$\large f \in F_c$ must be absent $\large r$ .

Maybe Present

$\large f \in F_c$ $\large f \in F_c$ must be absent in at least one $\large r$ .

First Normal Form

$\large r$ $\large R$ $\large r$ first normal form $\large R$ $\large r$ is not in first normal form.

Third Normal Form

$\large r$ $\large R$ $\large F$ $\large r$ third normal form $\large \alpha \to \beta$ $\large F$ , at least one of the following is true:

$\Large \beta \sube \alpha$
$\Large \alpha$ $\Large R$
$\Large \forall_{B \in \beta - \alpha} \:\: B$ is an attribute that exists in some candidate key.

$\large \alpha$ $\large \beta$ in at least one candidate key, it is enforcing that every attribute in this relation is a part of the same entity and cannot be neatly broken down into multiple entities.

Third Normal Form Decomposition

$r$ to multiple relations that are in third normal form are as follows:

$\large (\alpha \to \beta) \in F$ $\large (\alpha, \beta)$ and add it to the schema
If any attributes are left out, add them in a relation with a candidate key
Optimize the relations to get rid of relations that are redundant

Boyce-Codd Normal Form

$\large r$ $\large R$ $\large F$ $\large r$ Boyce-Codd normal form $\large \alpha \to \beta$ $\large F$ , at least one of the following is true:

$\Large \beta \sube \alpha$
$\Large \alpha$ $\large R$

Boyce-Codd Normal Form Decomposition

$r$ to multiple relations that are in Boyce-Codd normal form are as follows:

$\large (\alpha \to \beta) \in F$ $\large R_1 = \alpha \cup \beta$ $\large R_2 = R - \beta$
$\large R_1$ $\large R_2$ .