Jargons
•
Scene graph
A scene graph is a general data structure commonly used by vector-based graphics editing applications and modern computer games, which arranges the logical and often spatial representation of a graphical scene. It is a collection of nodes in a graph or tree structure.
•
Scene layout
Spatial layout of objects
Task
Image generation from scene graph
Challenges
•
Previous studies used image-like representation of scene graphs, often in the scene layout
◦
-
scene layouts are crafted manually and are not specifically designed to facilitate the alignment between images and graphs
Some relations such as behind, inside, and in front of, all corresponds to similar spatial relations in scene layouts
Goal
Image generation from scene graph without using scene layout
→ learning intermediate representations that explicitly maximize the alignment between scene graphs and images
Methods
•
Dataset
◦
Visual Genome (VG) dataset with 108,077 scene graph & image pairs
◦
COCO-Stuff dataset with pixel-wise annotations with 40,000 training images and 5,000 validation images with corresponding bounding boxes and segmentation masks
•
Method
◦
input : Scene graph & image pairs
◦
output : Synthetic image
◦
Detailed method
Stage 1 : Masked contrastive pre-training (i.e. SG encoder)
Stage 2 : Diffusion-based scene graph to image generation
Results
•
Comparison with other methods
•
Semantic image manipulation