Taskonomy: Disentangling Task Transfer Learning

Tags

Multitask learning

Transfer learning

Affiliation

Stanford University

Article type

Research

Date

2022/12/14

Journal

CVPR

Published Year

2018

keywords

Computer vision

STEP 1 : task-specific modeling

STEP 2: Train transfer functions among tasks

High-order transfers

Transfer results

STEP 3: Get task affinities

STEP 4: Compute global taxonomy

Results

Sanity test (test-specific network가 얼마나 잘 학습되었는지)

Evaluation of computed taxonomies

Generalization to novel task

Significance test of the taxonomy structure

Evaluation on external dataset

openaccess.thecvf.com

https://openaccess.thecvf.com/content_cvpr_2018/papers/Zamir_Taskonomy_Disentangling_Task_CVPR_2018_paper.pdf

taskonomy/taskbank at master · StanfordVL/taskonomy

This repository shares a unified bank of pretrained models for 25 vision tasks spanning a wide range of 2D, 3D, and semantic problems. Given a query image, the produced 25 estimations give a broad visual understanding useful for different purposes. The networks can be used individually as well.

https://github.com/StanfordVL/taskonomy/tree/master/taskbank

[한국어 요약/설명]

Challenges

•

Indoor scene inference

◦

Task가 많기 때문에

◦

Task label을 얻는 것도 힘들고, 서로 유사한 task도 많다

Research question

Do visual tasks have a relationship, or are they unrelated?

•

최적의 source task 및 target task를 고를 수 있는 fully-computational한 방법을 제시

•

task structure 추론

◦

“structure” is a collection of computationally found relations specifying which tasks supply useful information to another, and by how much

Hypothesis

•

Task 간의 관계가 밝혀진다면,

•

Task 중 몇개만 full annotation, 나머지는 다른 task로부터 fine-tuning해서 

•

task 전반을 잘 학습할 수 있지 않을까?

Goal

•

mapping from X to Y

•

computes an affinity matrix among tasks based on whether the solution for one task can be sufficiently easily read out of the representation trained for another task

•

computationally found directed hypergraph that captures the notion of task transferability over any given task dictionary.

Methods

Notations

•

task dictionary V=T∪SV = T \cup SV=T∪S

◦

TTT : set of tasks which we want solved (target)

◦

SSS : set of tasks that can be trained (source)

Methods

•

input : 4 million images for indoor scene inference

◦

26 tasks

◦

training (120k), validation (16k), and test (17k) images

•

output : A hypergraph of tasks

•

Schematic overview

STEP 1 : task-specific modeling

•

fully supervised task-specific network for each task in S

•

encoder-decoder architecture homogeneous across all tasks

•

Encoder's architecture : a fully convolutional ResNet-50 without pooling, identical across all task-specific networks

•

Decoder's architecture : depend on the task as the output structures of different tasks vary

•

Task-specific networks are trained on the training set

STEP 2: Train transfer functions among tasks

•

Transfer network from s∈Ss \in Ss∈S  to  t∈Tt \in Tt∈T

•

Learns readout function Ds→t D_{s \rightarrow t}Ds→t​

◦

ft(I)f_t(I)ft​(I) : ground truth of ttt for image III

•

Transfer’s architecture: identical shallow networks with 2 conv layers (concatenated channel-wise if higher-order

•

Transfer networks are trained on a subset of validation set, ranging from 1k images to 16k, in order to model the transfer patterns under different data regimes

High-order transfers

•

Same as first order but receive multiple representations in the input

•

e.g.)  s1→s2→ts_1 \rightarrow s_2 \rightarrow ts1​→s2​→t

•

a sampling procedure with the goal of filtering out higher-order transfers that are less likely to yield good results, without training them: a beam search

Beam search

Transfer results

STEP 3: Get task affinities

•

Derived from Analytic Hierarchy Process

•

WtW_tWt​ 구하기

◦

For each ttt, we construct WtW_tWt​ a pairwise tournament matrix between all feasible sources for transferring to ttt

◦

(i,j)(i,j)(i,j)의 값: tournament ratio: D_test에서 sis_isi​가 sjs_jsj​보다 transfer를 잘 한 images의 비율

•

Wt’W_t^’Wt’​: Normalize WtW_tWt​

•

We quantify the final transferability of sis_isi​ to ttt as the corresponding (ithi^{th}ith) component of the principal eigenvector of Wt′W^′_tWt′​ (normalized to sum to 1)

STEP 4: Compute global taxonomy

•

A global transfer policy which maximizes collective performance across all task

•

Formulated as subgraph selection where tasks are nodes and transfers are edge

Solve with Boolean Integer Programming (BIP)

Results

•

Evaluation metric : Win rate (%)→ 크면 클수록 좋다
: the proportion of test set images for which a baseline is beaten

•

Gain → lower bound 0.5

◦

win rate (%) against a network trained from scratch using the same training data as transfer networks. That is, the best that could be done if transfer learning was not utilized.

•

Quality → upper bound 0.5 

◦

win rate (%) against a fully supervised network trained with 120k images (gold standard)

Sanity test (test-specific network가 얼마나 잘 학습되었는지)

Evaluation of computed taxonomies

Generalization to novel task

Significance test of the taxonomy structure

Evaluation on external dataset

•

Fine tuned our task-specific networks on other datasets (MIT Places for
scene classification, ImageNet for object classification)