Self-Distilled StyleGAN: Towards Generation from Internet Photos

Abstract

StyleGAN is known to produce high-fidelity images, while also offering unprecedented semantic editing. However, these fascinating abilities have been demonstrated only on a limited set of datasets, which are usually structurally aligned and well curated. In this paper, we show how StyleGAN can be adapted to work on raw uncurated images collected from the Internet. Such image collections impose two main challenges to StyleGAN: they contain many outlier images, and are characterized by a multi-modal distribution. Training StyleGAN on such raw image collections results in degraded image synthesis quality. To meet these challenges, we proposed a StyleGAN-based self-distillation approach, which consists of two main components: (i) A generative-based self-filtering of the dataset to eliminate outlier images, in order to generate an adequate training set, and (ii) Perceptual clustering of the generated images to detect the inherent data modalities, which are then employed to improve StyleGAN's "truncation trick" in the image synthesis process. The presented technique enables the generation of high-quality images, while minimizing the loss in diversity of the data. Through qualitative and quantitative evaluation, we demonstrate the power of our approach to new challenging and diverse domains collected from the Internet.

Image Generation Results for a Variety of Domains

Our Self-Distilled StyleGAN framework enables high-quality and diverse image generation from uncurated Internet datasets.
* Click on the images for viewing them in full size.

Lions

Giraffes

Elephants

Parrots

Dogs

LSUN - Bicycles

LSUN - Horses

Cluster Center Visualization

Raw uncurated images collected from the internet tend to be rich and diverse, consisting of multiple modalities, which constitute different geometry and texture characteristics. To maintain the diversity of the generated images while improving their visual quality, we introduce a multi-modal truncation trick. Our key idea is to incorporate multiple cluster centers, and then truncate each sampled code towards the most similar center.

Here we show random walks between our cluster centers in the latent space of various domains. As can be seen, the cluster centers are highly diverse and captures well the multi-modal nature of the data.

Lions	Dogs	Parrots	Potted Plant

Semantic Editing Examples

Our approach retains StyleGAN's remarkable semantic editing capabilities, enabling us to perform various editing to each domain.

Lions

Roar

Add / Remove Mane

Head Pane Rotation

Parrots

Head Pane Rotation

Zoom In/Out

Horses

Head Up/Down

Running/Standing Still

Paper

"Self-Distilled StyleGAN: Towards Generation from Internet",
Ron Mokady, Michal Yarom, Omer Tov, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani and Inbar Mosseri.

[PDF]

Results / Supplementary Material

[Link]

Datasets and Models

[Datasets&Code]

Bibtex

@misc{mokady2022selfdistilled,
title={Self-Distilled StyleGAN: Towards Generation from Internet Photos},
author={Ron Mokady and Michal Yarom and Omer Tov and Oran Lang and
Daniel Cohen-Or, Tali Dekel, Michal Irani and Inbar Mosseri},
year={2022},
eprint={2202.12211},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

Last updated: Apr 2021