Learning Group - 19 Jun., 2017

An Introduction to

Tensor Decomposition

Presenter: Chaoran Huang
chaoran.huang@unsw.edu.au

What is a tensor?

An $n$th-rank tensor in $m$-dimensional space is a mathematical object
that has $n$ indices and $m^n$ components and obeys certain transformation
rules.

Each index of a tensor ranges over the number of dimensions of space.

What is a tensor?

What is a tensor?

$2$nd-rank tensor: Dyadic (Dyad)

$\begin{align} \mathbf{a} &= a_x\mathbf{i} + a_y\mathbf{j} + a_z\mathbf{k}\\ \mathbf{b} &= b_x\mathbf{i} + b_y\mathbf{j} + b_z\mathbf{k}\\ \mathbf{A} & \equiv \mathbf{a}\mathbf{b}\\ &= a_x b_x\mathbf{i}\mathbf{i} + a_x b_y\mathbf{i} \mathbf{j} + a_x b_z\mathbf{i}\mathbf{k}\\ &+ a_y b_x\mathbf{i}\mathbf{i} + a_y b_y\mathbf{i} \mathbf{j} + a_y b_z\mathbf{i}\mathbf{k}\\ &+ a_z b_x\mathbf{i}\mathbf{i} + a_z b_y\mathbf{i} \mathbf{j} + a_z b_z\mathbf{i}\mathbf{k}\\\\ \longrightarrow & \quad \begin{bmatrix} a_x b_x & a_x b_y & a_x b_z \\ a_y b_x & a_y b_y & a_y b_z \\ a_z b_x & a_z b_y & a_z b_z \end{bmatrix} \end{align}$

$3$rd-rank tensor: Triad

$\mathcal{T} \equiv \mathbf{A}\mathbf{c} = \mathbf{a}\mathbf{b}\mathbf{c}$

$4$th-rank tensor: Quartad

$\mathcal{Q} \equiv \mathcal{T}\mathbf{d} = \mathbf{a}\mathbf{b}\mathbf{c}\mathbf{d}$

$n$th-rank tensor: $n$-th-ad

$\mathcal{N} \equiv \mathcal{Q}\dots\mathbf{n} = \mathbf{a}\mathbf{b}\mathbf{c}\mathbf{d}\dots\mathbf{n}$

Basic Operations

For $\mathbf{A} \in \mathbb{R}^{I\times{J}}$ and $\mathbf{B} \in \mathbb{R}^{K\times{L}}$

Kronecker Product

$\mathbf{A}\otimes\mathbf{B} = \begin{bmatrix} \mathbf{a}_{11}\mathbf{B} & \mathbf{a}_{12}\mathbf{B} & \cdots & \mathbf{a}_{1J}\mathbf{B}\\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{a}_{I1}\mathbf{B} & \mathbf{a}_{I2}\mathbf{B} & \cdots & \mathbf{a}_{IJ}\mathbf{B} \end{bmatrix}$

Khatri-Rao Product(a.k.a matching columnwise Kronecker Product)

Aussuming $J = L$

$\mathbf{A}\odot\mathbf{B} = \begin{bmatrix} \mathbf{a}_{1}\otimes\mathbf{b}_{1} & \mathbf{a}_{2}\otimes\mathbf{b}_{2} & \cdots & \mathbf{a}_{K}\otimes\mathbf{b}_{K}\\ \end{bmatrix}$

Basic Operations

Hadamard Product(elementwise product)

Aussuming $I = K$, $J = L$

$\mathbf{A}\circledast\mathbf{B} = \begin{bmatrix} \mathbf{a}_{11}\mathbf{b}_{11} & \mathbf{a}_{12}\mathbf{b}_{12} & \cdots & \mathbf{a}_{1J}\mathbf{b}_{1J}\\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{a}_{I1}\mathbf{b}_{I1} & \mathbf{a}_{I2}\mathbf{b}_{I2} & \cdots & \mathbf{a}_{IJ}\mathbf{b}_{IJ}\\ \end{bmatrix}$

Interesting Property

$(\mathbf{A}\odot\mathbf{B})^\top=\mathbf{A}^\top\circledast\mathbf{B}^\top\mathbf{B}$

$(\mathbf{A}\odot\mathbf{B})^\ast= ((\mathbf{A}^\top\mathbf{A})\circledast(\mathbf{B}^\top\mathbf{B}))^{\ast}(\mathbf{A}\odot\mathbf{B})^\top$

Matricization

Fibers and Slices

Matricization

Matricization(unfolding)

Let $\mathcal{X} \in \mathbb{R}^{\mathbf{I}_1\times\mathbf{I}_2\times\cdots\mathbf{I}_N}$ be a size $(\mathbf{I}_1\times\mathbf{I}_2\times\cdots\mathbf{I}_N)$ tensor, then elements $(\mathbf{i}_1\times\mathbf{i}_2\times\cdots\mathbf{i}_N)$ of $\mathcal{X}$ maps to elements $(\mathbf{i}_n, j)$, the mode-$n$ unfolding of $\mathcal{X}$, where

$j = 1 + \sum_{k=1\\k\neq{n}}^N\begin{bmatrix}(\mathbf{i}-1)\prod_{m=1\\m\neq{n}}^{k-1}\mathbf{I}_m\end{bmatrix}$

Matricization

Example(contiune)

$\mathcal{X}$ of size $(3, 4, 2)$:

$\mathcal{X}=\begin{bmatrix} \begin{bmatrix} 0 & 2 & 4 & 6\\ 8 & 10 & 12 & 14\\ 16 & 18 & 20 & 22 \end{bmatrix} ,\\ \begin{bmatrix} 1 & 3 & 5 & 7\\ 9 & 11 & 13 & 15\\ 17 & 19 & 21 & 23 \end{bmatrix} \end{bmatrix}$

modes

$\mathbf{X}_{(0)}=\begin{bmatrix} 0 & 2 & 4 & 6 & 1 & 3 & 5 & 7\\ 8 & 10 & 12 & 14 & 9 & 11 & 13 & 15\\ 16 & 18 & 20 & 22 & 17 & 19 & 21 & 23 \end{bmatrix}\\ \mathbf{X}_{(1)}=\begin{bmatrix} 0 & 8 & 16 & 1 & 9 & 17\\ 2 & 10 & 18 & 3 & 11 & 19\\ 4 & 12 & 20 & 5 & 13 & 21\\ 6 & 14 & 22 & 7 & 15 & 23\\ \end{bmatrix}\\ \mathbf{X}_{(2)}=\begin{bmatrix} 0 & 8 & 16 & 2 & 10 & 18 & 4 & 12 & 20 & 6 & 14 & 22\\ 1 & 9 & 17 & 3 & 11 & 19 & 5 & 13 & 21 & 7 & 15 & 23\\ \end{bmatrix}$

CP Decomposition

Review of Matrix Factoriztion

CP Decomposition

Solve :
$\quad\quad {\textrm{argmin}}_{\mathcal{X}^{'}}||\mathcal{X}-\mathcal{X}^{'}||_F$ , where

$\quad\quad \mathcal{X}\approx \mathcal{X}^{'} = \sum_{r=1}^{R} \mathbf{a}_{r}\mathbf{b}_{r}\mathbf{c}_{r}$
or elementwise,

$\quad\quad \mathcal{X}_{ijk} \approx \mathcal{X}_{ijk}^{'} = \sum_{r=1}^{R} \mathbf{a}_{ir}\mathbf{b}_{jr}\mathbf{c}_{kr}$

CP Decomposition

Different names of CANDECOMP/ PARAFAC Decomposition

Name	Source
Polyadic Form of a Tensor	Hitchcock, 1927
PARAFAC (Parallel Factors)	Harshman, 1970
CANDECOMP or CAND (Canonical decomposition)	Carroll and Chang, 1970
Topographic Components Model	Möcks, 1988
CP (CANDECOMP/PARAFAC)	Kiers, 2000

h

CP Decomposition

Why the sudden fascination with tensors?

Offer more natural representations of data

  - Videos;
  - Complex subject-items relationships;
  - ...

Boosting in computing power recently

- Google Tensor Processing Unit (TPU)/ 28-40 W @ 45 TFlops
- NVIDIA® Tesla® V100/ 300W @ 120 TFlops

h

CP Decomposition

The modal unfoldings

if $\mathcal{X}\approx\mathcal{A}\mathcal{B}\mathcal{C}= \begin{bmatrix}\mathcal{A}_{1}\\\mathcal{A}_{2}\end{bmatrix} \begin{bmatrix}\mathcal{B}_{1}\\\mathcal{B}_{2}\\\mathcal{B}_{3}\end{bmatrix} \begin{bmatrix}\mathcal{C}_{1}\\\mathcal{C}_{2}\end{bmatrix}$ ,
then
$\begin{align} \mathbf{X}_{(0)}\approx& \begin{bmatrix} \mathcal{A}_{1}\mathcal{B}_{1}\mathcal{C}_{1} && \mathcal{A}_{1}\mathcal{B}_{2}\mathcal{C}_{1} && \mathcal{A}_{1}\mathcal{B}_{3}\mathcal{C}_{1} && \mathcal{A}_{1}\mathcal{B}_{1}\mathcal{C}_{2} && \mathcal{A}_{1}\mathcal{B}_{2}\mathcal{C}_{2} && \mathcal{A}_{1}\mathcal{B}_{3}\mathcal{C}_{2} \\ \mathcal{A}_{2}\mathcal{B}_{1}\mathcal{C}_{1} && \mathcal{A}_{2}\mathcal{B}_{2}\mathcal{C}_{1} && \mathcal{A}_{2}\mathcal{B}_{3}\mathcal{C}_{1} && \mathcal{A}_{2}\mathcal{B}_{1}\mathcal{C}_{2} && \mathcal{A}_{2}\mathcal{B}_{2}\mathcal{C}_{2} && \mathcal{A}_{2}\mathcal{B}_{3}\mathcal{C}_{2} \\ \end{bmatrix}\\=& \begin{bmatrix} \mathcal{A}_{1}(\mathcal{B}\odot\mathcal{C})^\top\\ \mathcal{A}_{2}(\mathcal{B}\odot\mathcal{C})^\top\\ \end{bmatrix}\\=& \mathcal{A}(\mathcal{B}\odot\mathcal{C})^\top \end{align}$

Accordingly, we have
$\mathbf{X}_{(1)}\approx\mathcal{B}(\mathcal{C}\odot\mathcal{A})^\top$
$\mathbf{X}_{(2)}\approx\mathcal{C}(\mathcal{B}\odot\mathcal{A})^\top$

$\Longrightarrow\mathcal{X} \approx ⟦\mathcal{A},\mathcal{B},\mathcal{C}⟧ \equiv \sum_{r=1}^{R} \mathbf{a}_{r}\mathbf{b}_{r}\mathbf{c}_{r}$

h

CP Decomposition

The modal unfoldings(continue)

Assume that columns normalized to length $1$, we denote a weight $\mathbf{\lambda}\in\mathbb{R}^{R}$,
so that

$\mathcal{X} \approx ⟦\mathbf{\lambda};\mathcal{A},\mathcal{B},\mathcal{C}⟧ \equiv \sum_{r=1}^{R} \mathbf{\lambda}_{r}\mathbf{a}_{r}\mathbf{b}_{r}\mathbf{c}_{r}$

Generally, for $N$-th order tensor $\mathcal{X}\in\mathbb{R}^{\mathbf{I_1}\times\mathbf{I_2}\cdots\times\mathbf{I_N}}$, the CP decomposition is

$\mathcal{X} \approx ⟦\mathbf{\lambda};\mathcal{A}^{(1)},\mathcal{A}^{(2)}, \dots,\mathcal{A}^{(N)}⟧ \equiv \sum_{r=1}^{R} \mathbf{\lambda}_{r}\mathbf{a}^{(1)}_{r}\mathbf{a}^{(2)}_{r}\dots\mathbf{a}^{(N)}_{r}$

In this case, the matricized form is

$\mathcal{X} \approx \mathbf{A}^{(n)}\mathbf{\Lambda}(\mathbf{A}^{N}\odot\cdots\odot \mathbf{A}^{(n+1)}\odot\mathbf{A}^{(n-1)}\odot\cdots\odot\mathbf{A}^{(1)})^\top$

and here $\mathbf{\Lambda} = \textrm{diag}(\mathbf{\lambda})$.

h

CP Decomposition

Computing the CP Decomposition(Alternating Least Square)

Take a 3rd-rank tensor $\mathcal{X}$ as an example:

$\mathbf{X}_{(0)}\approx\mathcal{A}(\mathcal{C}\odot\mathcal{B})^\top$
$\mathbf{X}_{(1)}\approx\mathcal{B}(\mathcal{C}\odot\mathcal{A})^\top$
$\mathbf{X}_{(2)}\approx\mathcal{C}(\mathcal{B}\odot\mathcal{A})^\top$

$\Longrightarrow$

${\textrm{argmin}}_{\mathbf{X}_{(0)}} ||\mathbf{X}_{(0)}-\mathcal{A}(\mathcal{C}\odot\mathcal{B})^\top||_F$
${\textrm{argmin}}_{\mathbf{X}_{(1)}} ||\mathbf{X}_{(1)}-\mathcal{B}(\mathcal{C}\odot\mathcal{A})^\top||_F$
${\textrm{argmin}}_{\mathbf{X}_{(2)}} ||\mathbf{X}_{(2)}-\mathcal{C}(\mathcal{B}\odot\mathcal{A})^\top||_F$

Here we apply ALS to solve the decomposition:

First, we fix $\mathcal{B}$ and $\mathcal{C}$ to solve for $\mathcal{A}$, if denotes $\hat{\mathcal{A}}=\mathcal{A}\cdot\textrm{diag}(\mathbf{\lambda})$,
the optimal then is

$\begin{align} \hat{\mathcal{A}}=&\mathbf{X}_{(0)}[(\mathcal{C}\odot\mathcal{B})^\top]^{\ast}\\ =&\mathbf{X}_{(0)}(\mathcal{C}\odot\mathcal{B}) (\mathcal{C}^\top\mathcal{C}\circledast\mathcal{B}^\top\mathcal{B}) \end{align}$

Similarly, we can have solution for $\mathcal{B}$ and $\mathcal{C}$.

CP Decomposition

One Application Case
- C.Huang's Tensor

Tensor Factorization based Expert Recommendation

One Application Case
- C.Huang's Tensor

$\begin{align} f(\mathbf{U_1}, \mathbf{U_2}, \mathbf{U_3}, \mathbf{U_4}, \mathbf{S}, \mathbf{A}, \mathbf{T})&= Tensor(\mathbf{U_1}, \mathbf{U_2}, \mathbf{U_3}, \mathbf{U_4})\\ &+Weight(\mathbf{U}_1) + Networks(\mathbf{S}, \mathbf{A})\\ &+ Topic(\mathbf{T}, \mathbf{A}) + Site(\mathbf{S}, \mathbf{U_2}) \end{align}$
where

$\begin{align} Tensor(\mathbf{U_1}, \mathbf{U_2}, \mathbf{U_3}, \mathbf{U_4})&=\frac{1}{2} \|\mathcal{X}-{⟦ \mathbf{U_1}, \mathbf{U_2}, \mathbf{U_3}, \mathbf{U_4} ⟧}\|_F^2 \quad \leftarrow \textrm{Tensor decomposition}\\\\ &+ \frac{\lambda_{\mathcal{X}}}{2} (\|\mathbf{U_1}\|_F^2 + \|\mathbf{U_2}\|_F^2 + \|\mathbf{U_3}\|_F^2 + \|\mathbf{U_4}\|_F^2) \quad \leftarrow \textrm{Moreau-Yosida regularization} \end{align}$
and here

$ \mathcal{X}$ denotes a Question × Questioner × Score × Expertise tensor.

Weekly Report

Draft of the tensor paper

Prepared text corpus

- 98 GiB plain text...
- Bad formatted texts...

Training language model

  - almost finished
  - est. 3200 CPU hrs/40 threads
    @ 24k words/thread/sec

Finished: Revising Introduction
Revised some part of methodology

Weekly Report

What's next...

Draft of the tensor paper

  - Expanding methodology
  - Waiting for language model
  - Implementing algorithms

Learning Group - 19 Jun., 2017

An Introduction to

Tensor Decomposition

What is a tensor?

What is a tensor?

Example in Euclidean space:$\mathbf{a} = a_x\mathbf{i} + a_y\mathbf{j} + a_z\mathbf{k}$

What is a tensor?

$\mathcal{T} \equiv \mathbf{A}\mathbf{c} = \mathbf{a}\mathbf{b}\mathbf{c}$

$\mathcal{Q} \equiv \mathcal{T}\mathbf{d} = \mathbf{a}\mathbf{b}\mathbf{c}\mathbf{d}$

$\mathcal{N} \equiv \mathcal{Q}\dots\mathbf{n} = \mathbf{a}\mathbf{b}\mathbf{c}\mathbf{d}\dots\mathbf{n}$

Basic Operations

For $\mathbf{A} \in \mathbb{R}^{I\times{J}}$ and $\mathbf{B} \in \mathbb{R}^{K\times{L}}$

$\mathbf{A}\otimes\mathbf{B} = \begin{bmatrix} \mathbf{a}_{11}\mathbf{B} & \mathbf{a}_{12}\mathbf{B} & \cdots & \mathbf{a}_{1J}\mathbf{B}\\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{a}_{I1}\mathbf{B} & \mathbf{a}_{I2}\mathbf{B} & \cdots & \mathbf{a}_{IJ}\mathbf{B} \end{bmatrix}$

Aussuming $J = L$ $\mathbf{A}\odot\mathbf{B} = \begin{bmatrix} \mathbf{a}_{1}\otimes\mathbf{b}_{1} & \mathbf{a}_{2}\otimes\mathbf{b}_{2} & \cdots & \mathbf{a}_{K}\otimes\mathbf{b}_{K}\\ \end{bmatrix}$

Basic Operations

$(\mathbf{A}\odot\mathbf{B})^\top=\mathbf{A}^\top\circledast\mathbf{B}^\top\mathbf{B}$ $(\mathbf{A}\odot\mathbf{B})^\ast= ((\mathbf{A}^\top\mathbf{A})\circledast(\mathbf{B}^\top\mathbf{B}))^{\ast}(\mathbf{A}\odot\mathbf{B})^\top$

Matricization

Matricization

$j = 1 + \sum_{k=1\\k\neq{n}}^N\begin{bmatrix}(\mathbf{i}-1)\prod_{m=1\\m\neq{n}}^{k-1}\mathbf{I}_m\end{bmatrix}$

Matricization

$\mathcal{X}$ of size $(3, 4, 2)$:

$\mathcal{X}=\begin{bmatrix} \begin{bmatrix} 0 & 2 & 4 & 6\\ 8 & 10 & 12 & 14\\ 16 & 18 & 20 & 22 \end{bmatrix} ,\\ \begin{bmatrix} 1 & 3 & 5 & 7\\ 9 & 11 & 13 & 15\\ 17 & 19 & 21 & 23 \end{bmatrix} \end{bmatrix}$

CP Decomposition

$\mathbf{R}=$

$\mathbf{R}\approx\hat{\mathbf{R}}=\mathbf{P}\times\mathbf{Q}^\top=$

CP Decomposition

CP Decomposition

CP Decomposition

CP Decomposition

Accordingly, we have $\mathbf{X}_{(1)}\approx\mathcal{B}(\mathcal{C}\odot\mathcal{A})^\top$ $\mathbf{X}_{(2)}\approx\mathcal{C}(\mathcal{B}\odot\mathcal{A})^\top$

$\Longrightarrow\mathcal{X} \approx ⟦\mathcal{A},\mathcal{B},\mathcal{C}⟧ \equiv \sum_{r=1}^{R} \mathbf{a}_{r}\mathbf{b}_{r}\mathbf{c}_{r}$

CP Decomposition

Assume that columns normalized to length $1$, we denote a weight $\mathbf{\lambda}\in\mathbb{R}^{R}$, so that $\mathcal{X} \approx ⟦\mathbf{\lambda};\mathcal{A},\mathcal{B},\mathcal{C}⟧ \equiv \sum_{r=1}^{R} \mathbf{\lambda}_{r}\mathbf{a}_{r}\mathbf{b}_{r}\mathbf{c}_{r}$

In this case, the matricized form is $\mathcal{X} \approx \mathbf{A}^{(n)}\mathbf{\Lambda}(\mathbf{A}^{N}\odot\cdots\odot \mathbf{A}^{(n+1)}\odot\mathbf{A}^{(n-1)}\odot\cdots\odot\mathbf{A}^{(1)})^\top$ and here $\mathbf{\Lambda} = \textrm{diag}(\mathbf{\lambda})$.

CP Decomposition

Take a 3rd-rank tensor $\mathcal{X}$ as an example:

$\mathbf{X}_{(0)}\approx\mathcal{A}(\mathcal{C}\odot\mathcal{B})^\top$ $\mathbf{X}_{(1)}\approx\mathcal{B}(\mathcal{C}\odot\mathcal{A})^\top$ $\mathbf{X}_{(2)}\approx\mathcal{C}(\mathcal{B}\odot\mathcal{A})^\top$

$\Longrightarrow$

CP Decomposition

CP Decomposition via Alternating Least Squares, where $N$-th order tensor $\mathcal{X}$ of size $\mathit{I}_1\times\mathit{I}_2\times ... \times\mathit{I}_N$ is decomposite into $R$ components

One Application Case - C.Huang's Tensor

One Application Case - C.Huang's Tensor

Weekly Report

Weekly Report

Example in Euclidean space:
$\mathbf{a} = a_x\mathbf{i} + a_y\mathbf{j} + a_z\mathbf{k}$

Aussuming $J = L$

$\mathbf{A}\odot\mathbf{B} = \begin{bmatrix} \mathbf{a}_{1}\otimes\mathbf{b}_{1} & \mathbf{a}_{2}\otimes\mathbf{b}_{2} & \cdots & \mathbf{a}_{K}\otimes\mathbf{b}_{K}\\ \end{bmatrix}$

$(\mathbf{A}\odot\mathbf{B})^\top=\mathbf{A}^\top\circledast\mathbf{B}^\top\mathbf{B}$

$(\mathbf{A}\odot\mathbf{B})^\ast= ((\mathbf{A}^\top\mathbf{A})\circledast(\mathbf{B}^\top\mathbf{B}))^{\ast}(\mathbf{A}\odot\mathbf{B})^\top$

Accordingly, we have
$\mathbf{X}_{(1)}\approx\mathcal{B}(\mathcal{C}\odot\mathcal{A})^\top$
$\mathbf{X}_{(2)}\approx\mathcal{C}(\mathcal{B}\odot\mathcal{A})^\top$

Assume that columns normalized to length $1$, we denote a weight $\mathbf{\lambda}\in\mathbb{R}^{R}$,
so that

$\mathcal{X} \approx ⟦\mathbf{\lambda};\mathcal{A},\mathcal{B},\mathcal{C}⟧ \equiv \sum_{r=1}^{R} \mathbf{\lambda}_{r}\mathbf{a}_{r}\mathbf{b}_{r}\mathbf{c}_{r}$

In this case, the matricized form is

$\mathcal{X} \approx \mathbf{A}^{(n)}\mathbf{\Lambda}(\mathbf{A}^{N}\odot\cdots\odot \mathbf{A}^{(n+1)}\odot\mathbf{A}^{(n-1)}\odot\cdots\odot\mathbf{A}^{(1)})^\top$

and here $\mathbf{\Lambda} = \textrm{diag}(\mathbf{\lambda})$.

$\mathbf{X}_{(0)}\approx\mathcal{A}(\mathcal{C}\odot\mathcal{B})^\top$
$\mathbf{X}_{(1)}\approx\mathcal{B}(\mathcal{C}\odot\mathcal{A})^\top$
$\mathbf{X}_{(2)}\approx\mathcal{C}(\mathcal{B}\odot\mathcal{A})^\top$

CP Decomposition via Alternating Least Squares, where $N$-th order tensor $\mathcal{X}$
of size $\mathit{I}_1\times\mathit{I}_2\times ... \times\mathit{I}_N$ is decomposite into $R$ components

One Application Case
- C.Huang's Tensor

One Application Case
- C.Huang's Tensor