READING GROUP

Generating Videos

with Scene Dynamics

Chaoran Huang, chaoranh@cse.unsw.edu.au

Generating Videos with Scene Dynamics

NIPS 2016

Carl Vondrick

rich predictive models

computer vision and machine learning

Ph.D. Student, MIT

https://github.com/cvondrick

Generative Adversarial Networks

Spatio-temporal 3D Models

Discriminator Network (D)

Video Generator Network (G)

apply SGD on:

$\quad \min_{\omega_G} \max_{\omega_D} \mathbb{E}_{x \sim p_x(x)}[\log D(x;\omega_D)] \\ + \mathbb{E}_{z \sim p_z(z)}[\log (1 - D(G(z;\omega_G);\omega_D))]$

Fig1. - Discriminator Network

$Recognize \left\{ \begin{array}{l} scenes\\ motions \end{array} \right.$

One Stream Architecture

Consistent in both time and space

Low dimension input, high dimension output

Only object moves

Two Stream Architecture

Enforced static background (picture )

Moving foreground

Summarize with mask

Fig2. - Video Generator Network

Fig3. - A example of two stream architecture

Data

2 years of Flickr Videos

9 TB, 35 million clips

5,000+ hours length

26 TB raw data

READING GROUP

Generating Videos

with Scene Dynamics

Chaoran Huang, chaoranh@cse.unsw.edu.au