1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute...
Transcript of 1 arXiv:2004.12276v1 [cs.CV] 26 Apr 2020 · Fashionpedia: Ontology, Segmentation, and an Attribute...
Fashionpedia: Ontology, Segmentation, and anAttribute Localization Dataset
Menglin Jia?1, Mengyun Shi?1,4, Mikhail Sirotenko?3, Yin Cui?3,Claire Cardie1, Bharath Hariharan1, Hartwig Adam3, Serge Belongie1,2
1Cornell University 2Cornell Tech 3Google Research 4Hearst Magazines
Abstract. In this work we explore the task of instance segmentationwith attribute localization, which unifies instance segmentation (detectand segment each object instance) and fine-grained visual attribute cat-egorization (recognize one or multiple attributes). The proposed taskrequires both localizing an object and describing its properties. To illus-trate the various aspects of this task, we focus on the domain of fash-ion and introduce Fashionpedia as a step toward mapping out the vi-sual aspects of the fashion world. Fashionpedia consists of two parts:(1) an ontology built by fashion experts containing 27 main apparelcategories, 19 apparel parts, 294 fine-grained attributes and their re-lationships; (2) a dataset with everyday and celebrity event fashion im-ages annotated with segmentation masks and their associated per-maskfine-grained attributes, built upon the Fashionpedia ontology. In orderto solve this challenging task, we propose a novel Attribute-Mask R-CNN model to jointly perform instance segmentation and localized at-tribute recognition, and provide a novel evaluation metric for the task.We also demonstrate instance segmentation models pre-trained on Fash-ionpedia achieve better transfer learning performance on other fash-ion datasets than ImageNet pre-training. Fashionpedia is available at:https://fashionpedia.github.io/home/index.html.
Keywords: Dataset, Ontology, Instance Segmentation, Fine-Grained,Attribute, Fashion
1 Introduction
Recent progress in the field of computer vision has advanced machines’ abil-ity to recognize and understand our visual world, showing significant impacts infields including autonomous driving [52], cancer detection [29], product recog-nition [33,14], etc. These real-world applications are fueled by various visualunderstanding tasks with the goals of naming, describing (attribute recognition),or localizing objects within an image.
Naming and localizing objects is formulated as an object detection task (Fig-ure 1(a-c)). As a hallmark for computer recognition, this task is to identify and
? equal contribution.
arX
iv:2
004.
1227
6v1
[cs
.CV
] 2
6 A
pr 2
020
2 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
(a) (b) (c) (d)
Relationships: Part ofTextile Finishing Textile Pattern Silhouette Opening TypeLength Nickname Waistline
Jacket
Tops
Ankle length
Symmetrical
Above-the-hip length
Single-breasted
Dropped shoulder
Regular (fit)
Plain
Neckline
Shoe
SleeveSleeve PocketPocket
Above-the-hip length
Fly (Opening)
Slim (fit)
Regular (fit)
Symmetrical
Washed
Distressed
Plain
Plain
Normal Waist
Collar
Pants
Glasses
Bag
Shoe
Ensemble
Instance Segmentation
Localized Attribute
Annotations: +
Symmetrical
Above-the-hip length
Single-breasted
Dropped shoulder
Regular (fit)
Plain
Washed
Above-the-hip length
Regular (fit)
Plain
Ankle length
Fly (Opening)
Slim (fit)
Symmetrical
Distressed
Plain
Normal Waist
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS
_ATT
RIB
UTE
HAS_ATT
RIBUTE
HAS_ATTRIB
UTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIB
UTE
HAS_ATTR
IBUTE
HAS
_ATT
RIB
UTE
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_A
TTR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TE
HA
S_A
TTR
IBU
TE
HA
S_A
TTR
IBU
TE HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATT
RIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_A
TTRIB
UTE
HAS_ATTRIB
UTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATT
RIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATT
RIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS
_ATT
RIB
UTE
HAS_ATT
RIBUTE
HAS_ATTRIB
UTE
HAS_ATT
RIBUTE
HAS_ATTRIB
UTE
HAS
_ATT
RIB
UTE
HAS_ATTRIB
UTE
HAS_ATT
RIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TTR
IBU
TE
HAS
_ATT
RIB
UTE
HAS_ATT
RIBUTE
HAS
_ATT
RIB
UTE
HAS_ATT
RIBUTE
HA
S_A
TTR
IBU
TE
HAS_A
TTRIB
UTE
HAS_A
TTRIB
UTE
HAS
_ATT
RIB
UTE
HA
S_A
TTR
IBU
TE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TTR
IBU
TE
HA
S_A
TTR
IBU
TE
HAS
_ATT
RIB
UTE
HA
S_A
TTR
IBU
TE
HAS
_ATT
RIB
UTE
HA
S_A
TTR
IBU
TE
HAS
_ATT
RIB
UTE
HAS
_ATT
RIB
UTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATT
RIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATT
RIBUTE
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_A
TTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_AT
TR
IBU
TE
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HAS_ATTR
IBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTR
IBUTE
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE H
AS_ATTRIBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATT
RIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_A
TTRIB
UTE
HAS_ATTRIB
UTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS
_ATT
RIB
UTE
HAS_ATT
RIBUTE
HA
S_A
TTR
IBU
TEH
AS
_ATT
RIB
UTE
HA
S_A
TTR
IBU
TEH
AS
_ATT
RIB
UTE
HA
S_A
TTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRI…
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATT
RIBUTE
HA
S_AT
TR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_A
TTR
IBU
TE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HA
S_AT
TR
IBU
TE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HA
S_AT
TR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HA
S_A
TTR
IBU
TE
HA
S_AT
TR
IBU
TE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HA
S_A
TTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HAS_A
TTRIB
UTE
HA
S_A
TTR
IBU
TE
HA
S_A
TTR
IBU
TE
HAS
_ATT
RIB
UTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HAS_ATTR
IBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATTR
IBUTE
HAS_ATTRIB
UTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTR
IBUTE
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HAS_A
TTRIB
UTE
HAS_ATTR
IBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TEH
AS
_ATT
RIB
UT
E
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TEH
AS
_AT
TR
IBU
TE
HAS_ATTRIBUTE
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEH
AS
_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HA
S_A
TT
RIB
UT
EHAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTR
IBUTEH
AS
_ATT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TTR
I…
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HA
S_AT
TR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TEHA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HA
S_AT
TR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS
_ATT
RIB
UTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_AT
TR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS
_ATT
RIB
UTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TTR
IBU
TE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
EH
AS
_AT
TR
IBU
TE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTR
IBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIB
UTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIB
UTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HA
S_A
TTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIB
UTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_A
TTRIB
UTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
EH
AS
_ATT
RIB
UT
E
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
EH
AS
_ATT
RIB
UT
E
HA
S_ATTR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HA
S_AT
TR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HA
S_AT
TR
IBU
TE
HA
S_ATTR
IBU
TE
HAS
_ATTRIBU
TE
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_AT
TRIB
UTE
HA
S_AT
TR
IBU
TE
HAS
_ATT
RIB
UTE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_ATTR
IBU
TE
HA
S_AT
TR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HA
S_A
TT
RIB
UT
E
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HA
S_A
TTR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HAS_ATTR
IBUTE
HA
S_ATTR
IBU
TE
HA
S_ATTR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HA
S_A
TTR
IBU
TE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HA
S_ATTR
IBU
TE
HAS_ATTR
IBUTE
HA
S_A
TT
RIB
UT
E
HA
S_AT
TR
IBU
TE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
Asymm-etrical
Symm-etrical
Peplum
Circle
Flare
Fit and Flare
Trumpet
Merm-aid
Balloon /Bubble
Bell
Bell Bottom
Bootcut /Boot cut
Peg
Pencil
Straight
A-Line / ALine
Tent /Trapeze
Baggy
Wide Leg
High Low
Curved (fit)
Tight (fit) /Slim (fit) /Skinny (fit)
Regul-ar (fit)
Loose (fit)
Oversized /Oversize
EmpireWaistline
DroppedWaistline /
High waist /
NormalWaist /
Normal-w…
Low waist /Low Rise
Basque(wasitline)
No waistline
Above-the-…
Hip (length)
Micro (length)
Mini(length) /Mid-thigh(length)
Above-the-k…
Knee(Length) /Knee High
(length)
Below TheKnee
(length) /Below-th…
Midi /Mid-calf /mid calf /
Tea-length /tea len…
Maxi(length) /
Ankle(length)
Floor(length) / Full
(length)
Sleeveless
Short (length)
Elbow-length
Threequarter
(length) /
Wristlength
Singlebreasted
Doublebreasted
Lace Up /Lace-up
Wrapping /Wrap
Zip-up
Fly (Opening)
Chained(Opening)
Buckled(Opening)
Toggled(Opening)
no opening
Plastic
Rubber
Metal
StrawFeather
Gem /Gemstone
Bone
Ivory
Fur / Faux-fur
Leather /Faux-leather
Suede
Shearling
Crocodile
Snakeskin
Wood
nonon-textilematerial
Burnout
Distressed /Ripped
Washed
Embossed
FrayedPrinted
Ruched /Ruching
Quilted
Pleat /pleated
Gathering /gathered
Smocking /Shirring
Tiered /layered
Cutout
Slit
Perfor-ated
Lining / lined
Applique /Embroidery /
Patch
Bead
Rivet / Stud /Spike
Sequin
no specialmanufactur…
Plain(pattern)
Abstract
Cartoon
Letters &Numbers
Camou-flage
Check / Plaid/ Tartan
Dot
Fair Isle
Floral
Geometric
Paisley
Stripe
Houndstooth(pattern)
Herringbone(pattern)
Chevron
Argyle
Animal
Leopard
Snakeskin(pattern)
Cheetah
Peacock
Zebra
Giraffe
Toile de Jouy
Plant
Classic(T-shirt)
Polo (Shirt)
Under-shirt
Henley (Shirt)
Ringer(T-shirt)
Raglan(T-shirt)
Rugby (Shirt)
Sailor (Shirt)
Crop (Top) /Midriff (Top)
Halter (Top)
Camisole
Tank (Top)
Peasant(Top)
Tube (Top) /bandeau
(Top)
Tunic (Top)
Smock (Top)
Hoodie
Blazer
Pea (Jacket)
Puffer(Jacket) /
Down(Jacket)
Biker(Jacket) /
Moto(Jacket)
Trucker(Jacket) /
Denim(Jacket)
Bomber(Jacket)
Anorak
Safari(Jacket) /
Utility(Jacket) /
Cargo
Mao (Jacket)
Nehru(Jacket)
Norfolk(Jacket)
Classicmilitary(Jacket)
Track(Jacket)
Windbreaker Chanel(Jacket)
Bolero
Tuxedo(Jacket)
Varsity(Jacket)
Crop (Jacket)
Jeans
Sweatpants/ Jogger(Pants)
Leggings
Hip-huggers(Pants) / Hiphuggers (…
Cargo(Pants) /Military(Pants)
Culottes
Capri (Pants)
Harem(Pants)
Sailor (Pants)
Jodhpur /Breeches(Pants)
Peg (Pants)
Camo(Pants)
Track (Pants)
Crop (Pants)
Short(Shorts) / Hot
(Pants)
Booty(Shorts)
Berm-uda(Shorts)
Cargo(Shorts)
Trunks
Boardshorts/ Board(Shorts)
Skort
Roll-Up(Shorts) /Boyfriend(Shorts)
Tie-up(Shorts)
Culotte(Shorts)
Lounge(Shorts)
Bloom-ers
Tutu (Skirt) /Ballerina
(Skirt)
Kilt
Wrap (Skirt)
Skater (Skirt)
Cargo (Skirt)
Hobble(Skirt) /Wiggle(Skirt)
Sheath (Skirt)
Ball Gown(Skirt)
Gypsy(Skirt) /
Broomstick(Skirt)
Rah-rah(Skirt)
Prairie (Skirt)
Flame-nco
(Skirt)
Accordion(Skirt)
Sarong(Skirt)
Tulip (Skirt)
Dirndl (Skirt)
Godet (Skirt)
Blanket(Coat)
Parka
Trench (Coat)
Pea (Coat) /Reefer (Co…
Shearling(Coat)
TeddyBear (Coat)
/ Teddy(Coat) / Fur
(Coat)
Puffer (Coat)
Duster (Coat)
Raincoat
Kimono
Robe
Dress (Coat
Duffle(Coat) /
Duffel (Co…
Wrap (Coat)
Military(Coat)Swing
(Coat)
Halter(Dress)
Wrap (Dress)
Chemise(Dress)
Slip (Dress)
Cheongsams/ Qipao
Jumper(Dress)
Shift (Dress)
Sheath(Dress)
Shirt (Dress)
Sundress /Sun (Dress)
Kaftan /Caftan
Bodycon(Dress)
Nightgown
Gown
Sweater(Dress)
Tea (Dress)
Blouson(Dress)
Tunic (Dress)
Skater(Dress)
Asymmetric(Collar)
Regular(Collar)
Shirt (Collar)
Polo(Collar) / FlatKnit (Collar)
Chelsea(Collar)
Banded(Collar)
Mandarin(Collar) /Nehru(Collar)
Peter Pan
(Collar)
Bow(Collar) / Tie
(Collar) /Ascot
(Collar)
Stand-away(Collar)
Jabot (Collar)
Sailor (Collar)
Oversized(Collar)
Notched(Lapel)
Peak (Lapel)
Shawl (Lapel)
Napoleon(Lapel)
Oversized(Lapel)
Set-in sleeve
Dropped-sh…
Raglan(Sleeve)
Cap (Sleeve)
Tulip(Sleeve) /
Petal(Sleeve)
Puff (Sleeve)
Bell(Sleeve) /
Circularflounce
(Sleeve) /Ruffle
(Sleeve)
Poet (Sleeve)
Dolman(Sleeve) &
Batwing(Sleeve)
Bishop(Sleeve)
Leg ofmutton
(Sleeve)
Kimono(Sleeve)
Cargo(Pocket) /
Gusset(Pocket) /
Bellow
Patch(Pocket)
Welt (Pocket)
Kang-aroo(Pocket)
Seam(Pocket)
Slash(Pocket) /Slashed
(Pocket) /slant (…
Curved(Pocket)
Flap (Pocket)
Collarless
Asymmetric(Neckline)
Crew (Neck)
Round(Neck) /
Roundneck
V-neck / V(Neck)
Surplice(Neck)
Oval (Neck)
U-neck / U(Neck)
Sweetheart(Neckline)
Queen anne(Neck)
Boat (Neck)
Scoop (Neck)
Square(Neckline)
Plunging(Neckline) /
Plunge(Neckline)
Keyhole(Neck)
Halter (Neck)
Crossover(Neck)
Choker(Neck)
High (Neck) /
Bottle (Neck)
Turtle(Neck) /
Mock (Neck)/ Polo(Neck)
Cowl (Neck)
Straightacross(Neck)
Illusion(Neck)
Off-the-sho…
One shoulder
Jacket
Shirt / Blouse
Tops
Sweater(pullover, without
opening)
Cardigan (withopening)
Vest / Gilet Pants
ShortsSkirt
CoatDress
Jumpsuit
CapeCollar
Sleeve
Neckline
Lapel
Scarf
Umbrella
Epaulette
Buckle
Zipper
Applique
Bead
Bow
Ribbon
Rivet
Ruffle
Sequin
Tassel
Shoe
Bag, wallet
Flower
Fringe
Headband,
Hair Accessory Tie
Glove
Watch
Belt
Leg warmer
Tights, stockings
Sock
Glasses
Hat
ModaNetDeepFashion2
Apparel CategoriesFine-grained Attributes
(f)
JacketTops
Sweater
Cardigan (withopening)
Vest / Gilet Pants
ShortsSkirt
CoatDress
Jumpsuit
CapeCollar
Sleeve
Neckline
Lapel
Scarf
Umbrella
Epaulette
Buckle
Zipper
Applique
Bead
Bow
Ribbon
Rivet
Ruffle
Sequin
Tassel
Shoe
Bag, wallet
Flower
Fringe
Tie
Glove
Watch
Belt
Leg warmer
Sock
Glasses
Hat
ModaNetDeepFashion2
Apparel CategoriesFine-grained Attributes
(e)
Fig. 1. An illustration of the Fashionpedia dataset and ontology: (a) main gar-ment masks; (b) garment part masks; (c) both main garment and garment part masks;(d) fine-grained apparel attributes; (e) an exploded view of the annotation diagram: theimage is annotated with both instance segmentation masks (white boxes) and per-maskfine-grained attributes (black boxes); (f) visualization of the Fashionpedia ontology: wecreated Fashionpedia ontology and separate the concept of categories (yellow nodes)and attributes [39] (blue nodes) in fashion. It covers pre-defined garment categoriesused by both Deepfashion2 [10] and ModaNet [54]. Mapping with DeepFashion2 alsoshows the versatility of using attributes and categories. We are able to present all 13garment classes in DeepFashion2 with 11 main garment categories, 1 garment part,and 7 attributes. Best viewed digitally
indicate the boundaries of objects in the form of bounding boxes or segmentationmasks [11,38,17]. Attribute recognition [5,30,8,37] (Figure 1(d)) instead focuseson describing and comparing objects, since an object also has many other prop-erties or attributes in addition to its category. Attributes not only provide acompact and scalable way to represent objects in the world, as pointed out byFerrari and Zisserman [8], attribute learning also enables transfer of existingknowledge to novel classes. This is particularly useful for fine-grained visualrecognition, with the goal of distinguishing subordinate visual categories such asbirds [47] or natural species [44].
In the spirit of mapping the visual world, we propose a new task, instancesegmentation with attribute localization, which unifies object detection and fine-grained attribute recognition. As illustrated in Figure 1(e), this task offers astructured representation of an image. Automatic recognition of a rich set of at-tributes for each segmented object instance complements category-level objectdetection and therefore advance the degree of complexity of images and sceneswe can make understandable to machines. In this work, we focus on the fash-ion domain as an example to illustrate this task. Fashion comes with rich and
Fashionpedia 3
complex apparel with attributes, influences many aspects of modern societies,and has a strong financial and cultural impact. We anticipate that the proposedtask is also suitable for other man-made product domains such as automobileand home interior.
Structured representations of images often rely on structured vocabular-ies [28]. With this in mind, we construct the Fashionpedia ontology (Figure 1(f))and image dataset (Figure 1(a-e)), annotating fashion images with detailed seg-mentation masks for apparel categories, parts, and their attributes. Our pro-posed ontology provides a rich schema for interpretation and organization ofindividuals’ garments, styles, or fashion collections [24]. For example, we cancreate a knowledge graph (see supplementary material for more details) by ag-gregating structured information within each image and exploiting relationshipsbetween garments and garment parts, categories, and attributes in the Fash-ionpedia ontology. Our insight is that a large-scale fashion segmentation andattribute localization dataset built with a fashion ontology can help computervision models achieve better performance on fine-grained image understandingand reasoning tasks.
The contributions of this work are as follows:A novel task of fine-grained instance segmentation with attribute localization.
The proposed task unifies instance segmentation and visual attribute recognition,which is an important step toward structural understanding of visual content inreal-world applications.
A unified fashion ontology informed by product descriptions from the internetand built by fashion experts. Our ontology captures the complex structure offashion objects and ambiguity in descriptions obtained from the web, containing46 apparel objects (27 main apparels and 19 apparel parts), and 294 fine-grainedattributes (spanning 9 super categories) in total. To facilitate the development ofrelated efforts, we also provide a mapping with categories from existing fashionsegmentation datasets, see Figure 1(f).
A dataset with a total of 48,825 clothing images in daily-life, street-style,celebrity events, runway, and online shopping annotated both by crowd workersfor segmentation masks and fashion experts for localized attributes, with thegoal of developing and benchmarking computer vision models for comprehensiveunderstanding of fashion.
A new model, Attribute-Mask R-CNN, is proposed with the aim of jointlyperforming instance segmentation and localized attribute recognition; a novelevaluation metric for this task is also provided. We further demonstrate instancesegmentation models pre-trained on Fashionpedia achieve better transfer learn-ing performance on DeepFashion2 and ModaNet than ImageNet pre-training.
2 Related Work
The combined task of fine-grained instance segmentation and attribute lo-calization has not received a great deal of attention in the literature. On onehand, COCO [32] and LVIS [15] represent the benchmarks of object detection
4 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
Table 1. Comparison of fashion-related datasets: (Cls. = Classification, Segm. = Seg-mentation, MG = Main Garment, GP = Garment Part, A = Accessory, S = Style,FGC = Fine-Grained Categorization). To the best of our knowledge, we include allfashion-related datasets focusing on visual recognition.
Name Category Annotation Type Attribute Annotation Type
Cls. BBox Segm. Unlocalized Localized FGC
Clothing Parsing [50] MG, A - - - - -Chic or Social [49] MG, A - - - - -Hipster [26] MG, A, S - - - - -Ups and Downs [19] MG - - - - -Fashion550k [23] MG, A - - - - -Fashion-MNIST [48] MG - - - - -
Runway2Realway [45] - - MG, A - - -ModaNet [54] - MG, A MG, A - - -Deepfashion2 [10] - MG MG - - -
Fashion144k [41] MG, A - - X - -Fashion Style-128 Floats [42] S - - X - -UT Zappos50K [51] A - - X - -Fashion200K [16] MG - - X - -FashionStyle14 [43] S - - X - -
Main Product Detection [40] - MG - X - -
StreetStyle-27K [36] - - - X - X
UT-latent look [21] MG, S - - X - XFashionAI [6] MG, GP, A - - X - XiMat-Fashion Attribute [14] MG, GP, A, S - - X - X
Apparel classification-Style [4] - MG - X - XDARN [22] - MG - X - XWTBI [25] - MG, A - X - XDeepfashion [33] S MG - X - X
Fashionpedia - MG, GP, A MG, GP, A - X X
for common objects. Panoptic segmentation is proposed to unify both seman-tic and instance segmentation, addressing both stuff and thing classes [27]. Inspite of the domain differences, Fashionpedia has comparable mask qualitieswith LVIS and the similar total number of segmentation masks as COCO. Onthe other hand, we have also observed an increasing effort to curate datasets forfine-grained visual recognition, evolved from CUB-200 Birds [47] to the recentiNaturalist dataset [44]. The goal of this line of work is to advance the state-of-the-art in automatic image classification for large numbers of real world, fine-grained categories. A rather unexplored area of these datasets, however, is toprovide a structured representation of an image. Visual Genome [28] providesdense annotations of object bounding boxes, attributes, and relationships in thegeneral domain, enabling a structured representation of the image. In our work,we instead focus on fine-grained attributes and provide segmentation masks inthe fashion domain to advance the clothing recognition task.
Clothing recognition has received increasing attention in the computer vi-sion community recently. A number of works provide valuable apparel-related
Fashionpedia 5
datasets [50,4,49,26,45,41,22,25,19,42,33,23,48,51,16,43,40,36,21,54,6,10]. Thesepioneering works enabled several recent advances in clothing-related recognitionand knowledge discovery [9,35]. Table 1 summarizes the comparison among dif-ferent fashion datasets regarding annotation types of clothing categories andattributes. Our dataset distinguishes itself in the following three aspects.
Exhaustive annotation of segmentation masks: Existing fashion datasets[45,54,10] offer segmentation masks for the main garment (e.g., jacket, coat,dress) and the accessory categories (e.g., bag, shoe). Smaller garment objectssuch as collars and pockets are not annotated. However, these small objectscould be valuable for real-world applications (e.g., searching for a specific collarshape during online-shopping). Our dataset is not only annotated with the seg-mentation masks for a total of 27 main garments and accessory categories butalso 19 garment parts (e.g., collar, sleeve, pocket, zipper, embroidery).
Localized attributes: The fine-grained attributes from existing datasets[22,33,40,14] tend to be noisy, mainly because most of the annotations are col-lected by crawling fashion product attribute-level descriptions directly from largeonline shopping websites. Unlike these datasets, fine-grained attributes in ourdataset are annotated manually by fashion experts. To the best of our knowl-edge, ours is the only dataset to annotate localized attributes: fashion experts areasked to annotate attributes associated with the segmentation masks labeled bycrowd workers. Localized attributes could potentially help computational modelsdetect and understand attributes more accurately.
Fine-grained categorization: Previous studies on fine-grained attributecategorization suffer from several issues including: (1) repeated attributes be-longing to the same category (e.g., zip,zipped and zipper) [33,21]; (2) basiclevel categorization only (object recognition) and lack of fine-grained catego-rization [50,4,49,26,25,45,42,43,23,16,48,54,10]; (3) lack of fashion taxonomieswith the needs of real-world applications for the fashion industry, possibly dueto the research gap in fashion design and computer vision; (4) diverse taxonomystructures from different sources in fashion domain. To facilitate research in theareas of fashion and computer vision, our proposed ontology is built and veri-fied by fashion experts based on their own design experiences and informed bythe following four sources: (1) world-leading e-commerce fashion websites (e.g.,ZARA, H&M, Gap, Uniqlo, Forever21); (2) luxury fashion brands (e.g., Prada,Chanel, Gucci); (3) trend forecasting companies (e.g., WGSN); (4) academicresources [7,3].
3 Dataset Specification and Collection
3.1 Ontology specification
We propose a unified fashion ontology (Figure 1(f)), a structured vocabu-lary that utilizes the basic level categories and fine-grained attributes [39]. TheFashionpedia ontology relies on similar definitions of object and attributes as pre-vious well-known image datasets. For example, a Fashionpedia object is similar
6 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
-gown-sequin-floor (length)-flare-high waist-lining-zip-up-fair isle-ruched-bead-plain (pattern)-asymmetrical-no non-textile material
-single breasted-floral-symmetrical
-shirt (collar)
-wrist-length-plain (pattern) -above-the-hip -no non-textile material -tight (fit) -symmetrical
-plain (pattern) -above-the-hip (length) -no non-textile material -trucker (jacket) -washed -regular (fit) -single breasted-symmetrical -no waistline
-regular (collar)
-plain (pattern) -no non-textile material -regular (fit) -fly (opening) -peg (pants) -maxi (length) -symmetrical -peg
-welt (pocket)
-welt (pocket)
-wrist-length-set-in sleeve
-plain (pattern) -no non-textile material -double breasted -regular (fit) -trench (coat) -mini (length) -symmetrical -no waistline
-above-the-hip (length)-plain (pattern)-normal waist-no non-textile material-tight (fit)-classic (t-shirt)-symmetrical
-above-the-hip (length)-normal waist-no non-textile material-regular (fit)-single breasted-symmetrical-dot
-plain (pattern)-no non-textile material-regular (fit)-mini (length)-blanket (coat)-single breasted-lining-symmetrical-no waistline
-v-neck
-wrist-length-set-in sleeve
-wrist-length
-plain (pattern)-jeans-no non-textile material-regular (fit)-washed-low waist-maxi (length)-symmetrical-fly (opening)-straight
-patch (pocket)
-curved (pocket)
total masks: 53 total masks: 55
total categories: 16 total categories: 16 total categories: 15 total categories : 15
(a) (b)
(c) (d) (e) (f) (g) (h) (i)
Fig. 2. Image examples with annotated segmentation masks (a-f) and fine-grainedattributes (g-i).
to “item” in Wikidata [46], or “object” in COCO [32] and Visual Genome [28]).In the context of Fashionpedia, objects represent common items in apparel (e.g.,jacket, shirt, dress). In this section, we break down each component of the Fash-ionpedia ontology and illustrate the construction process. With this ontologyand our image dataset, a large-scale fashion knowledge graph can be built as anextended application of our dataset (more details can be found in the supple-mentary material).
Apparel categories. In the Fashionpedia dataset, all images are annotatedwith one or multiple main garments. Each main garment is also annotated withits garment parts. For example, general garment types such as jacket, dress,pants are considered as main garments. These garments also consist of severalgarment parts such as collars, sleeves, pockets, buttons, and embroideries. Maingarments are divided into three main categories: outerwear, intimate and acces-sories. Garment parts also have different types: garment main parts (e.g., collars,sleeves), bra parts, closures (e.g., button, zipper) and decorations (e.g., embroi-dery, ruffle). On average, each image consists of 1 person, 3 main garments,3 accessories, and 12 garment parts, each delineated by a tight segmentationmask (Figure 1(a-c)). Furthermore, each object is assigned to a synset ID in ourFashionpedia ontology.
Fine-grained attributes. Main garments and garment parts can be asso-ciated with apparel attributes (Figure 1(e)). For example, “button” is part ofthe main garment “jacket”; “jacket” can be linked with the silhouette attribute“symmetrical”; the garment part “button” could contain the attribute “metal”with a relationship of material. The Fashionpedia ontology provide attributes for13 main outerwear garments categories, and 5 out of 19 garments parts (“sleeve”,“neckline”, “pocket”, “lapel”, and “collar”). Each image has 16.7 attributes onaverage (max 57 attributes). As with the main garments and garment parts, wecanonicalize all attributes to our Fashionpedia ontology.
Relationships. Relationships can be formed between categories and at-tributes. There are three main types of relationships (Figure 1(e)): (1) outfitsto main garments, main garments to garment parts: meronymy (part-of) rela-
Fashionpedia 7
tionship; (2) main garments to attributes or garment parts to attributes: theserelationship types can be garment silhouette (e.g., peplum), collar nickname(e.g., peter pan collars), garment length (e.g., knee-length), textile finishing (e.g.,distressed), or textile-fabric patterns (e.g., paisley), etc.; (3) within garments,garment parts or attributes: there is a maximum of four levels of hyponymy(is-an-instance-of) relationships. For example, weft knit is an instance of knitfabric, and the fleece is an instance of weft knit.
3.2 Image Collection and Annotation Pipeline
Image collection. A total of 50,527 images were harvested from Flickrand free license photo websites, which included Unsplash, Burst by Shopify,Freestocks, Kaboompics, and Pexels. Two fashion experts verified the quality ofthe collected images manually. Specifically, the experts checked a scenes’ diversityand made sure clothing items were visible in the images. Fdupes [34] is used toremove duplicated images. After filtering, 48,825 images were left and used tobuild our Fashionpedia dataset.
Annotation Pipeline. Expert annotation is often a time-consuming pro-cess. In order to accelerate the annotation process, we decoupled the work be-tween crowd workers and experts. We divided the annotation process into thefollowing two phases.
First, segmentation masks with apparel objects are annotated by 28 crowdworkers, who were trained for 10 days before the annotation process (with pre-pared annotation tutorials of each apparel object, see supplementary materialfor details). We collected high-quality annotations by having the annotators fol-low the contours of garments in the image as closely as possible (See section 4.2for annotation analysis). This polygon annotation process was monitored dailyand verified weekly by a supervisor and by the authors.
Second, 15 fashion experts (graduate students in the apparel domain) wererecruited to annotate the fine-grained attributes for the annotated segmenta-tion masks. Annotators were given one mask and one attribute super-category(“textile pattern”, “garment silhouette” for example) at a time. An additionaltwo options, “not sure” and “not on the list” were added during the annotation.The option “not on the list” indicates that the expert found an attribute thatis not on the proposed ontology. If “not sure” is selected, it means the expertcan not identify the attribute of one mask. Common reasons for this selctioninclude occlusion of the masks and viewing angles of the image; (for example,a top underneath a closed jacket). More details can be found in Figure 2. Eachattribute supercategory is assigned to one or two fashion experts, dependingon the number of masks. The annotations are also checked by another expertannotator before delivery.
We split the data into training, validation and test sets, with 45,623, 1158,2044 images respectively. The dataset will be publicly available by the timeof paper publication. More details of the dataset creation can be found in thesupplementary material.
8 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
4 Dataset Analysis
This section will discuss a detailed analysis of our dataset using the train-ing images. We begin by discussing general image statistics, followed by ananalysis of segmentation masks, categories, and attributes. We compare Fash-ionpedia with four other segmentation datasets, including two recent fashiondatasets DeepFashion2 [10] and ModaNet [54], and two general domain datasetsCOCO [32] and LVIS [15].
4.1 Image Analysis
We choose to use images with high resolutions during the curating processsince Fashionpedia ontology includes diverse fine-grained attributes for both gar-ments and garment parts. The Fashionpedia training images have an average di-mension of 1710 (width) × 2151 (height). Images with high resolutions are ableto show apparel objects in detail, leading to more accurate and faster annotationsfor both segmentation masks and attributes. These high resolution images canalso benefit downstream tasks such as detection and image generation. Examplesof detailed annotations can be found in Figure 2.
4.2 Mask Analysis
We define “masks” as one apparel instance that may have more than oneseparate components (the jacket in Figure 1 for example), “polygon” as a disjointarea.
Mask quantity. On average, there are 7.3 (median 7, max 74) number ofmasks per image in the Fashionpedia training set. Figure 3(a) shows that theFashionpedia has the largest median value in the 5 datasets used for comparison.Fashionpedia also has the widest range among three fashion datasets, and a com-parable range with COCO, which is a dataset in a general domain. Compared toModaNet and Deepfashion2 datasets, Fashionpedia has the widest range of maskcount distribution. However, COCO and LVIS maintain a wider distribution overFashionpedia owing to their more common objects in their dataset. Figure 3(d)illustrates the distribution within the Fashionpedia dataset. One image usuallycontains more garment parts and accessories than outerwears.
Mask sizes. Figure 3(b) and 3(e) compares relative mask sizes within Fash-ionpedia and against other datasets. Ours has a similar distribution as COCOand LVIS, except for a lack of larger masks (area > 0.95). DeepFashion2 has aheavier tail, meaning it contains a larger portion of garments with a zoomed-inview. Unlike DeepFashion2, our images mainly focused on the whole ensemble ofclothing. Since ModaNet focuses on outwears and accessories, it has more maskswith relative area between 0.2 and 0.4. whereas ours has an additional 19 apparelparts categories. As illustrated in Figure 3(e), garment parts and accessories arerelatively small compared to the outerwear (e.g., “dress”, “coat”).
Mask quality. Apparel categories also tend to have complex silhouettes.Table 2 shows that the Fashionpedia masks have the most complex boundaries
Fashionpedia 9
101 102
Number of mask per image
10 3
10 2
10 1
100
101
102
Perc
ent o
f im
ages
DeepFashion2 [2|8]ModaNet [5|16]COCO2017 [4|93]LVIS [6|556]Fashionpedia [7|74]
(a) Mask count per image across
datasets.
0.0 0.2 0.4 0.6 0.8 1.0Relative segmentation mask size
10 4
10 3
10 2
10 1
100
101
Perc
ent o
f ins
tanc
es
DeepFashion2ModaNetCOCO2017LVISFashionpedia
(b) Relative mask size across
datasets.
2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0Number of categories per image
10 3
10 2
10 1
100
101
102
Perc
ent o
f im
ages
DeepFashion2ModaNetCOCO2017LVISFashionpedia
(c) Category diversity per image
across datasets.
100 101
Number of mask per image
10 3
10 2
10 1
100
101
Perc
ent o
f ins
tanc
es
Outerwears [1|6]Garment Parts [3|69]Accessories [2|18]
(d) Mask count per image in
Fashionpedia.
0.0 0.2 0.4 0.6 0.8 1.0Relative segmentation mask size
10 3
10 2
10 1
100
101
Perc
ent o
f ins
tanc
es
OuterwearsGarment PartsAccessories
(e) Relative mask size in Fash-
ionpedia.
0 50 100 150 200 250 300Categories and attributes
100
101
102
103
104
105
Coun
ts
CategoriesAttributes
(f) Categories and attributes
distribution in Fashionpedia.
Fig. 3. Dataset statistics: First row presents comparison among datasets. Secondrow presents comparison within Fashionpedia. Y-axes are in log scale. Relative seg-mentation mask size were calculated same as [15]. Relative mask size was rounded toprecision of 2. For mask count per image comparisons (Figure 3(a) and 3(d)), legendsfollow [median | max ] format, X-axes are in log scale. Values in X-axis for Figure 3(a)was discretized for better visual effect. Best viewed digitally
amongst the five datasets (according to the measurement used in [15]). This sug-gests that our masks are better able to represent complex silhouettes of apparelcategories more accurately than ModaNet and DeepFashion2. We also report thenumber of vertices per polygons, which is a measurement representing how gran-ularity of the masks produced. Table 2 shows that we have the second-highestaverage number of vertices among five datasets, next to LVIS.
4.3 Category and Attributes Analysis
There are 46 apparel categories and 294 attributes presented in the Fash-ionpedia dataset. On average, each image was annotated with 7.3 instances, 5.4categories, and 16.7 attributes. Of all the masks with categories and attributes,each mask has 3.7 attributes on average (max 14 attributes). Fashionpedia hasthe most diverse number of categories within one image among three fashiondatasets, while comparable to COCO (Figure 3(c)), since we provide a com-prehensive ontology for the annotation. In addition, Figure 3(f) shows the dis-tributions of categories and attributes in the training set, and highlights thelong-tailed nature of our data.
During the fine-grained attributes annotation process, we also ask the ex-perts to choose “not sure” if they are uncertain to make a decision, “not on thelist” if they find another attribute that not provided. the majority of “not sure”comes from three attributes superclasses, namely “Opening Type”, “Waistline”,“Length”. Since there are masks only show a limited portion of apparel (a top
10 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
Table 2. Comparison of segmentation mask complexity amongst segmentation datasetsin both fashion and general domain (COCO 2017 instance training data was used).Each statistic (mean and median) represents a bootstrapped 95% confidence intervalfollowing [15]. Boundary complexity was calculated according to [15,2]. Reported maskboundary complexity for COCO and LVIS was different compared with [15] due todifferent image resolution and image sets. The number of vertices per polygon is calcu-lated as the number of vertices in one polygon. Polygon is defined as one disjoint area.Masks (and polygons) with zero area were ignored.
Dataset Boundary complexity No. of vertices per polygon Images countmean median mean median
COCO [32] 6.65 - 6.66 6.07 - 6.08 21.14 - 21.21 15.96 - 16.04 118,287LVIS [15] 6.78 - 6.80 5.89 - 5.91 35.77 - 35.95 22.91 - 23.09 57,263ModaNet [54] 5.87 - 5.89 5.26 - 5.27 22.50 - 22.60 18.95 - 19.05 52,377Deepfashion2 [10] 4.63 - 4.64 4.45 - 4.46 14.68 - 14.75 8.96 - 9.04 191,960
Fashionpedia 8.36 - 8.39 7.35 - 7.37 31.82 - 32.01 20.90 - 21.10 45, 623
inside a jacket for example), the annotators are not sure how to identify thoseattributes due to occlusion. Less than 15% of masks for each attribute super-classes account for “not on the list”, which illustrates the comprehensiveness ofour proposed ontology (see supplementary material for more details of the extradataset analysis).
5 Evaluation Protocol and Baselines
5.1 Evaluation metric
In object detection, a true positive (TP) for each category c is defined as asingle detected object that matches a ground truth object with a Intersectionover Union (IoU) over a threshold τIoU. COCO’s main evaluation metric usesaverage precison averaged across all 10 IoU thresholds τIoU ∈ [0.5 : 0.05 : 0.95]and all 80 categories. We denote such metric as APIoU.
In the case of instance segmentation and attribute localization , we extendstandard COCO metric by adding one more constraint: the macro F1 score forpredicted attributes of single detected object with category c (see supplimentarymaterial for the average choice of f1-score). We denote the F1 threshold as τF1
,and it has the same range as τIoU (τF1 ∈ [0.5 : 0.05 : 0.95]). The main metricAPIoU+F1
reports averaged precision score across all 10 IoU thresholds, all 10macro F1 scores, and all the categories. Our evaluation API, code and trainedmodels will be released.
5.2 Attribute-Mask R-CNN
We develop baseline models to perform two tasks on Fashionpedia: (1) ap-parel instance segmentation (ignoring attributes) using Mask R-CNN; (2) in-stance segmentation with attribute localization. For the second task, we present
Fashionpedia 11
Jacket
Symmetrical
Above-the-hip length
Single-breasted
Dropped shoulder
Regular (fit)Plain
Washed
Instance Segmentation
Localized Attribute
Attributes
Class
Box
1024
Mask
28x28x80
28x28x256
14x14x256
14x14x256 x4 RoI
RoI 7x7x256
+
1024
ResNetFPN
Fig. 4. Attribute-Mask R-CNN adds a multi-label attribute prediction head upon MaskR-CNN for instance segmentation with attribute localization.
a strong baseline model named Attribute-Mask R-CNN that is built upon MaskR-CNN [17] for Fashionpedia. As illustrated in Figure 4, we extend existing MaskR-CNN heads to include an additional multi-label attribute prediction head. Sig-moid cross-entropy loss is used for attribute head. Attribute-Mask R-CNN canbe trained end-to-end for jointly performing instance segmentation and localizedattribute recognition.
We leverage ResNet-50/101 (R-50/101) [18] with feature pyramid network(FPN) [31] as backbone. The input image is resized to 1024 of the longer edgeto feed the network. Random horizontal flipping and scale augmentation with arandom ratio between [0.8, 1.2] is applied during the training. We use an open-srouced Tensorflow [1] codebase1 for implementation and all models are trainedon a single Cloud TPU v3 with a batch size of 64. We follow the 1× trainingschedule used in Detectron [12], except the learning rate is set to 4× larger usinglinear learning rate scaling suggested by Goyal et al . [13].
Table 3. Baseline results of Mask R-CNN and Attribute-Mask R-CNN on Fashionpe-dia. The big performance gap between APIoU and APIoU+F1 suggests the challengingnature of our proposed task.
Model Backbone APboxIoU APmask
IoU APboxIoU+F1
APmaskIoU+F1
Mask R-CNNR-50-FPN 38.2 34.9 - -R-101-FPN 40.2 35.9 - -
Attribute-Mask R-CNNR-50-FPN 38.5 35.1 26.5 25.3R-101-FPN 40.7 36.4 27.8 26.3
1 https://github.com/tensorflow/tpu/tree/master/models/official/detection
12 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
Table 4. Per super-category results (for masks) using Attribute-Mask R-CNN with R-101-FPN backbone. We follow the same COCO sub-metrics for overall and three super-categories for apparel objects. Result format follows [APIoU / APIoU+F1 ] or [ARIoU /ARIoU+F1 ] (see supplementary material for per-class results).
Category AP AP50 AP75 APl APm APs
overall 36.4 / 26.3 53.0 / 35.1 40.8 / 29.9 46.9 / 28.7 33.6 / 22.4 10.7 / 6.40
outerwear 54.7 / 30.5 68.1 / 38.1 62.8 / 35.0 57.2 / 32.0 36.8 / 22.5 9.10 / 4.50parts 15.1 / 12.8 28.5 / 21.0 14.3 / 13.6 29.1 / 16.1 17.6 / 15.7 7.40 / 5.80accessory 48.3 / - 72.3 / - 56.4 / - 57.3 / - 54.4 / - 17.1 / -
5.3 Results Discussion
Attribute-Mask R-CNN. From results in Table 3, we have the follow-ing observations: (1) Our baseline models achieve promising performance onchallenging Fashionpedia dataset. (2) Compared with Mask R-CNN, Attribute-Mask R-CNN achieves slightly better performance on both box and mask APIoU.This suggests learning attributes with an additional attribute prediction headis helpful for detection. (3) There is a significant drop (e.g ., from 40.7 to 27.8for R-101-FPN) in box AP if we add τF1 as another constraint for true posi-tive. This is further verified by per super-category mask results in Table 4. Thissuggests that joint instance segmentation and attribute localization is a signifi-cantly more difficult task than instance segmentation, leaving much more roomfor future improvements.
Main apparel detection analysis. We also provide in-depth detector anal-ysis following COCO detection challenge evaluation [32] inspired by Hoiem etal . [20]. Figure 5 illustrates a detailed breakdown of bounding boxes false posi-tives produced by the detectors.
Figure 5(a) and 5(b) compare two detectors trained on Fashionpedia andCOCO. Errors of the COCO detector are dominated by imperfect localization(AP is increased by 28.3 from overall AP at τIoU = 0.75) and background confu-sion (+15.7) (5(b)). Unlike the COCO detector, no mistake in particular domi-nates the errors produced by the Fashionpedia detector. Figure 5(a) shows thatthere are errors from localization (+8.0), classification (+6.6), background con-fusions (+7.8). Due to the space constraint, we leave super-category analysis inthe supplementary material.
Generalization to other fashion datasets. Other fashion datasets such asDeepFashion2 [10] and ModaNet [54] also contain instance segmentation masks.To demonstrate models trained on Fashionpedia generalize well to other fashiondatasets, we conduct instance segmentation transfer learning on DeepFashion2and ModaNet by fine-tuning Mask R-CNN (R-101-FPN) pre-trained on Fash-ionpedia. From results in Table 5, we can see that Fashionpedia pre-trainingoutperforms ImageNet pre-training. We believe this is because the mask anno-tations in Fashionpedia have higher quality therefore Fashionpedia pre-trainingprovides additional benefits to DeepFashion2 and ModaNet. In addition, the per-
Fashionpedia 13
overall-all-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1pr
ecis
ion
[.408] C75[.530] C50[.610] Loc[.655] Sim[.676] Oth[.754] BG[1.00] FN
(a) Fashionpedia.
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
overall-all-all
[.399] C75[.589] C50[.682] Loc[.695] Sim[.713] Oth[.870] BG[1.00] FN
(b) COCO (2015 challenge winner).
Fig. 5. Main apparel detectors analysis. Each plot shows 7 precision recall curves whereeach evaluation setting is more permissive than the previous. Specifically, C75: strictIoU (τIoU = 0.75); C50: PASCAL IoU (τIoU = 0.5); Loc: localization errors ignored(τIoU = 0.1); Sim: supercategory False Positives (FPs) removed; Oth: category FPsremoved; BG: background (and class confusion) FPs removed; FN: False Negatives areremoved. Two plots are a comparison between two detectors trained on Fashionpediaand COCO respectively. The results are averaged over all categories. Legends presentthe area under each curve (corresponds to AP metric) in brackets as well. Best vieweddigitally.
formance of same Mask R-CNN (R-101-FPN) is much worse on Fashionpedia(Table 3) compared with DeepFashion2 and ModaNet, suggesting Fashionpediais a much more challenging benchmark.
Table 5. Transfer learning of instance segmentation using Mask R-CNN (R-101-FPN).Models pre-trained on Fashionpedia achieve better performance compared with Ima-geNet pre-training.
Dataset Pre-training APbox APmask
DeepFashion2ImageNet 68.3 63.9Fashionpedia 69.7 (+1.4) 65.1 (+1.2)
ModaNetImageNet 59.4 55.1Fashionpedia 62.6 (+3.2) 58.2 (+3.1)
Prediction visualization. Baseline outputs (with both segmentation masksand localized attributes) are also visualized in Figure 6. Our Attribute-Mask R-CNN achieves good results even for small objects like shoes and glasses. Modelcan correctly predict fine-grained attributes for some masks on one hand (e.g.,Figure 6 (b) ). On the other hand, it also incorrectly predict the wrongnickname (welt) to pocket (e.g., Figure 6 (c) ). These results show that
14 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
Pants : length: maxi (length)textile pattern: plain (pattern)
Sleeve 1 : length: wrist-lengthnickname: set-in sleeve
Jacket : silhouette: symmetricaltextile pattern: plain (pattern)opening type: single breasted
Pants : textile pattern: plain (pattern)length: maxi (length)
Sleeve 1 : nickname: set-in sleevelength: wrist-lengthlength: short (length)
Collar : nickname: shirt (collar)nickname: regular (collar)
Top : silhouette: symmetricalwaistline: no waistlinelength: above-the-hip (length)
Skirt : silhouette: symmetricaltextile pattern: plain (pattern)length: mini (length)
Sleeve 1 : nickname: set-in sleevelength: wrist-length
Sleeve 2 : nickname: set-in sleevelength: wrist-length
Jacket : silhouette: symmetricaltextile pattern: plain (pattern)opening type: single breasted
Top : silhouette: symmetricaltextile pattern: plain (pattern) length: above-the-hip (length)
Sleeve 2 : nickname: set-in sleevelength: wrist-lengthlength: short (length)
Pants
Sleeve 1
Sleeve 2
Jacket
Sleeve 2
Sleeve 1
Top
Pants
Jacket
Top
Skirt
Pocket 1 : nickname: patch (pocket) nickname: flap (pocket) nickname: welt (pocket)
Neckline : neckline type: round (neck) Pocket 1 : nickname: patch (pocket)nickname: welt (pocket)nickname: flap (pocket)silhouette: symmetrical
Sleeve 1 : nickname: set-in sleevelength: short (length)
Sleeve 2 : nickname: set-in sleevelength: short (length)
Pocket 1
Neckline
Collar
Lapel Pants
Sleeve 2
Sleeve 1 Collar
Pocket 2Pocket 1
Pocket 3
Sleeve 2Sleeve 1
Shoe 1
Shoe 2
Glasses
Glasses
Glasses
Shoe 1 Shoe 2
Neckline
Shoe 2
Shoe 1
Neckline : neckline type: v-neckneckline type: round (neck)neckline type: oval (neck)
Collar
Top
(a) (b) (c) (d)
Pocket : nickname: patch (pocket) nickname: flap (pocket) nickname: welt (pocket)
Fig. 6. Attribute-Mask R-CNN results on the Fashionpedia validation set, using MaskR-CNN with R-101-FPN backbone. Masks, bounding boxes, and apparel categories(category score > 0.6) are shown. The localized attributes from the top 5 masks (thatcontain attributes) on each image are also shown. Correctly predicted categories andlocalized attributes are bolded. Best view digitally.
there is headroom remaining for future development of more advanced computervison models on this task (see supplementary material for more details of thebaseline analysis).
6 Conclusion
In this work, we focus on a new task that unifies instance segmentation andattribute recognition. To solve challenging problems entailed in this task, weintroduced the Fashionpedia ontology and dataset. To the best of our knowl-edge, Fashionpedia is the first dataset that combines part-level segmentationmasks with fine-grained attributes. We presented Attribute-Mask R-CNN, anovel model for this task, along with a novel evaluation metric. We expect mod-els trained on Fashionpedia can be applied to many applications including betterproduct recommendation in online shopping, enhanced visual search results, andresolving ambiguous fashion-related words for text queries. We hope Fashionpe-dia will contribute to the advances in fine-grained image understanding in thefashion domain.
Fashionpedia 15
7 Acknowledgements
We thank Kavita Bala, Carla Gomes, Dustin Hwang, Rohun Tripathi, OmidPoursaeed, Hector Liu, and Nayanathara Palanivel for their helpful feedback anddiscussion in the development of Fashionpedia dataset. We also thank Zeqi Gu,Fisher Yu, Wenqi Xian, Chao Suo, Junwen Bai, Paul Upchurch, Anmol Kabra,and Brendan Rappazzo for their help developing the fine-grained attribute an-notation tool.
8 Supplementary Material
In our work we presented the new task of instance segmentation with at-tribute localization. We introduced a new ontology and dataset, Fashionpedia,to further describe the various aspects of this task. We also proposed a novelevaluation metric and Attribute-Mask R-CNN model for this task. In the sup-plemental material, we provide the following items that shed further insight onthese contributions:– More comprehensive experimental results (§ 8.1)– An extended discussion of Fashionpedia ontology and potential knowledge
graph applications (§ 8.2)– More details of dataset analysis (§ 8.3)– Additional information of annotation process (§ 8.4)– Other concerns about Fashionpedia (§ 8.5)– Attribute-Mask R-CNN inference demo (§ 8.6)
8.1 Attribute-Mask R-CNN
Per-class evaluation. Fig. 7 presents detailed evaluation results per super-category and per category. In Fig. 7, we follow the same metrics from COCOleaderboard (AP, AP50, AP75, APl, APm, APs, AR@1, AR@10, AR@100,ARs@100, ARm@100, ARl@100), with τIoU and τF1 if possible. Fig. 7 showsthat metrics considering both constraint τIoU and τF1 are always lower than us-ing τIoU alone across all the supercategories and categories. This further demon-strates the challenging aspect of our proposed task.
In general, categories belong to “garment parts” have a lower AP and AR,comparing with “outerwear” and “accessories”. Interestingly, the category “shirt,blouse” does not have any predictions. Yet there is relatively large number ofinstances in this category (see Fig. 8). We hypothesize that the model mightbe confused with the other similar categories such as top, t-shirt, sweatshirtor sweater. Fig. 9(a) also confirms that one of the main errors of detectors isclass-similarity confusion.
A detailed breakdown of detection errors is presented in Fig. 9 for super-categories and three main categories. In terms of supercategories in Fashionpe-dia, “outerwear” errors are dominated by within supercategory class confusions(Fig. 9(a)). Within this supercategory class, ignoring localization errors would
16 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
APAP50AP75APl APm APsAR@1AR@10AR@100ARs@100ARm@100ARl@100IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1IoU IoU+F1 IoU
overall
supercls-accessory
Max
Min
outerw
ear
parts
Min
Fig. 7. Detailed results (for masks) using Mask R-CNN with R-101-FPN backbone.We present the same metrics as COCO leaderboard for overall categories, three su-per categories for apparel objects, and 46 fine-grained apparel categories. We use bothconstraints (for example, APIoU and APIoU+F1) if possible. For categories withoutattributes, the value represents APIoU or ARIoU. “top” is short for “top, t-shirt, sweat-shirt”. “head acc” is short for “headband, head covering, hair accessory”
Fashionpedia 17
sleev
esho
e
neckl
inepo
cket
dress
top, t-
shirt,
swea
tshirt
pants col
larzip
per
jacket
bag,
wallet
belt
shirt,
blouse lap
elbe
ad skirt
rivet
glasse
s
tights
, stock
ings
appliq
ue
hair a
cc*watc
hbu
ckle
coat
shorts soc
k hat
ruffle
swea
ter tieglo
vesca
rfflo
werho
od
cardig
anseq
uin
jumpsu
it
epau
lette ve
stfrin
ge bow
tassel
ribbo
ncap
e
umbre
lla
leg warm
er
Apparel categories (with masks)
0
10000
20000
30000
40000
50000
60000
Coun
ts
5944
8
4637
4
3425
8
2717
9
1873
9
1654
8
1241
4
1015
9
7991
7833
7217
6851
6161
5972
5084
5046
4893
4855
4326
3529
3470
3389
3300
3124
2756
2582
2518
2407
1494
1457
1385
1374
1367
1226
1107
929
922
874
719
588
528
335
274
152
135
112
Fig. 8. Mask counts per apparel category
only raise AP slightly from 68.1 to 69.7 (+1.6). A similar trend can be observedin class “skirt”, which belongs to “outerwear” (Fig. 9(d)). Detection errors of“part” (Fig. 9(b) 9(e)) and “accessory”(Fig. 9(c) 9(f)) on the other hand, aredominated by both background confusion and localization. “part” also has alower AP in general, compared with other two super-categories. A possible rea-son is that objects belong to “part” usually have smaller sizes and lower counts.
F1 score calculation. Since we measure the f1 score of predicted attributesand groundtruth attributes per mask, we consider the both options of multi-labelmulti-class classification with 294 classes for one instance, and binary classifica-tion for 294 instances. Multi-label multi-class classification is a straightforwardtask, as it is a common setting for most of the fine-grained classification tasks. Inbinary classification scenario, we consider the 1 and 0 of the multi-hot encodingof both results and ground-truth labels as the positive and negative classes re-spectively. There are also two averaging choices: “micro” and “macro”. “Micro”averaging calculates the score globally by counting the total true positives, falsenegatives and false positives. “Macro” averaging calculates the metrics for eachattribute class and reports the unweighted mean. In sum, there are four optionsof f1-score averaging methods: 1) “micro”, 2) “macro”, 3) “binary-micro”, 4)“binary-macro”.
As shown in Fig. 10, we present the APF1=τF1
IoU+F1, with τIoU averaged in therange of [0.5 : 0.05 : 0.95]. τF1 is increased from 0.0 to 1.0 with a increment of0.01. Fig. 10 illustrates that as the value of τF1 increases, APF1=τF1
IoU+F1 decreasesin different rates given different choices of f1 score calculation. There are 294attributes in total, and an average of 3.7 attributes per mask in Fashionpediatraining data. It’s not surprising to observe that“Binary-micro” produces highf1-scores in general (higher than 0.97), as the APIoU+F1 score only decreases ifthe τF1 ≥ 0.97. On the other hand, “macro” averaging in multi-label multi-classclassification scenario gives us extremely low f1-scores (0.01 − 0.03). This fur-ther demonstrates the room for improvement for localized attribute classificationtask. We used “binary-macros” as our main metric.
Result visualization. More baseline results are also visualized in Figure 11.It shows that our simple baseline model can detect most of the apparel categoriescorrectly. However, it also produces false positives sometimes. For example, it
18 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
overall-outerwear-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
[.628] C75[.681] C50[.697] Loc[.825] Sim[.833] Oth[.845] BG[1.00] FN
(a) Super category: outerwear
overall-part-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
[.143] C75[.285] C50[.421] Loc[.434] Sim[.458] Oth[.596] BG[1.00] FN
(b) Super category: parts
overall-accessory-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
[.564] C75[.723] C50[.786] Loc[.799] Sim[.827] Oth[.883] BG[1.00] FN
(c) Super category: accessory
outerwear-skirt-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
[.741] C75[.788] C50[.795] Loc[.894] Sim[.894] Oth[.911] BG[1.00] FN
(d) Outerwear: skirt
part-pocket-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
[.264] C75[.468] C50[.686] Loc[.690] Sim[.700] Oth[.851] BG[1.00] FN
(e) Parts: pocket
accessory-tights, stockings-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
[.719] C75[.855] C50[.866] Loc[.868] Sim[.877] Oth[.921] BG[1.00] FN
(f) Accessory: tights, stock-
ingsFig. 9. Main apparel detectors analysis. Each plot shows 7 precision recall curves whereeach evaluation setting is more permissive than the previous. Specifically, C75: strictIoU (τIoU = 0.75); C50: PASCAL IoU (τIoU = 0.5); Loc: localization errors ignored(τIoU = 0.1); Sim: supercategory False Positives (FPs) removed; Oth: category FPsremoved; BG: background (and class confusion) FPs removed; FN: False Negativesare removed. Two plots in the first row are a comparison between two detectors trainedon Fashionpedia and COCO respectively. The results are averaged over all categories.The first row (overall-[supercategory]-[size]) contains results for three supercategories inFashionpedia; the second row ([supercategory]-[category]-[size]) shows results for threefine-grained categories (one per supercategory). Legends present the area under eachcurve (corresponds to AP metric) in brackets as well
segments legs as tights and stockings (Figure 11(f) ). A possible reason isthat both objects have the same shape and stockings are worn on the legs.
Predicting fine-grained attributes, on the other hand, is a more challeng-ing problem for the baseline model. We summarize several issues: (1) predictmore attributes than needed: (Figure 11(a) , (b) , (c) ); (2) fail todistinguish among fine-grained attributes: for example, dropped-shoulder sleeve(ground truth) v.s. set-in sleeve (predicted) (Figure 11(e) ); (3) other falsepositives: Figure 11(e) has a double-breasted opening, yet the model pre-dicted it as the zip opening.
These results further show that there are rooms for improvement and fu-ture development of more advanced computer vision models on this instancesegmentation with attribute localization task.
Result visualization on other datasets. Other fashion datasets such asModaNet and DeepFashion2 also contain instance segmentation masks. Asidefrom the overall AP results presented in the main paper (see Table 5 of the
Fashionpedia 19
0.0 0.2 0.4 0.6 0.8 1.0F1 score thresholds
0.175
0.200
0.225
0.250
0.275
0.300
0.325
0.350
APIo
U+
F1 sc
ore
microsmacrosbinary-microsbinary-macros
Fig. 10. APF1=τF1IoU+F1 score with different τF1. The value presented are average over
τIoU ∈ [0.5 : 0.05 : 0.95]. We use “binary-macro” as our main metric
main paper), we present the qualitative analysis on the segmentation masksgenerated among Fashionpedia(Fig. 11), ModaNet(Fig. 12(a-f)), and DeepFash-ion2(Fig. 12(g-l)) datasets. Photos of the first row in Fig. 12 are from ModaNet.They show that the quality of the generated masks on ModaNet is fairly goodand comparable to Fashionpedia in general (Fig. 12(a)). We also have a coupleof observations of the failure cases: (1) fail to detect apparel objects: for exam-ple, the shoe from Fig. 12(c) is not detected. Parts of the pants (Fig. 12(c)) andcoat (Fig. 12(d)) are not detected; (2) fail to detect some categories: Fig. 12(e)shows that the shoes on the shoe rack and right foot are not detected, possi-bly due to a lack of such instances in the ModaNet training dataset. Similar toFashionpedia, ModaNet mostly consist of street style images. See Fig. 13(b) forexample predictions from model trained on Fashionpedia; 3) close-up images:ModaNet contains mostly full-body images. This might be the possible reasonto the decreased quality of predicted masks on close-up shot like Fig. 12(f).
For DeepFashion 2 (Fig. 12(g,h,k)), the generated segmentation masks tendsto not tightly follow the contours of garments in the images. The main reasonpossibly is that the average number of vertices per polygon is 14.7 for Deep-fashion2, which is lower than Fashionpedia and ModaNet (see Table 2 in themain text). Our qualitative analysis also shows that: 1) the model will generatethe segmentation masks of pants (Fig. 12(i)) and tops (Fig. 12(j)) that are notvisible in the images. Both of them are covered by a jacket. And we find thatin DeepFashion 2, some part of the garments which is covered by other objectsare indeed annotated with segmentation masks; 2) better performance on ob-jects that are not on human body (Fig. 12(l)): DeepFashion 2 contains manycommercial-customer image pairs (both images with and without human body)in the training dataset. In contrast, both Fashionpedia and ModaNet containmore images with human body than images without human body in the train-ing datasets.
Generalization to the other image domains. For Fashionpedia, we alsoinference on images found in online shopping websites, which usually displaysa single apparel category, with or without a fashion model. We found out thatthe learned model works reasonably well if the apparel item is worn by a model(Fig. 13).
20 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
Pants : opening type: fly (opening)length: maxi (length)textile pattern: plain (pattern)
Top : silhouette: symmetrical length: above-the-hip (length) textile pattern: plain (pattern)
: neckline type: round (neck) neckline type: oval (neck) neckline type: v-neck
: nickname: curved (pocket) nickname: slash (pocket) nickname: patch (pocket)
Bag : silhouette: symmetrical textile pattern: plain (pattern)
: nickname: patch (pocket)nickname: welt (pocket) nickname: slash (pocket) nickname: curved (pocket) nickname: flap (pocket)
: nickname: set-in sleeve length: wrist-length
Collar : nickname: shirt (collar) nickname: regular (collar)
: neckline type: round (neck)neckline type: oval (neck)
Belt : silhouette: symmetrical silhouette: regular (fit)textile pattern: plain (pattern)
: silhouette: symmetrical textile pattern: plain (pattern) opening type: single breasted
: nickname: set-in sleeve length: short (length)
: neckline type: round (neck)neckline type: v-neckneckline type: oval (neck) neckline type: scoop (neck)
: length: short (length) nickname: set-in sleeve
Dress : silhouette: symmetrical opening type: zip-up textile pattern: plain (pattern)
: nickname: set-in sleeve length: wrist-length
Dress : silhouette: symmetrical textile pattern: plain (pattern)
: nickname: set-in sleeve length: wrist-length
Coat : silhouette: symmetrical textile pattern: plain (pattern)
Belt : silhouette: symmetrical textile pattern: plain (pattern)
Collar : nickname: shirt (collar) nickname: regular (collar)
: nickname: patch (pocket) nickname: flap (pocket)
Bag : silhouette: symmetrical waistline: high waist waistline: normal waist length: mini (length) opening type: zip-up textile pattern: plain (pattern)
Dress : silhouette: symmetrical textile pattern: plain (pattern) opening type: zip-up
Bag : silhouette: symmetrical textile pattern: plain (pattern)
: length: wrist-length nickname: set-in sleeve
: nickname: set-in sleeve length: wrist-length
: silhouette: symmetrical textile pattern: plain (pattern) textile finishing, manufacturing techniques: lining
Dress : silhouette: symmetrical textile pattern: plain (pattern) opening type: zip-up
Top : silhouette: symmetrical textile pattern: plain (pattern) length: above-the-hip (length)
Pants : silhouette: symmetrical textile pattern: plain (pattern) opening type: fly (opening) silhouette: regular (fit)
: neckline type: round (neck) silhouette: symmetrical neckline type: oval (neck)
: silhouette: symmetrical silhouette: tight (fit) length: maxi (length) textile pattern: plain (pattern) nickname: leggings
(a) (b) (c) (d) (e) (f)
Pants
TopNeckline
PocketBag
Shoe 1
Shoe 2
Shoe 1Shoe 1
Shoe 1 Shoe 1Shoe 2Shoe 2
Shoe 2
Shoe 2
Glasses
Pocket 1
Sleeve
Collar
Neckline
Belt
Jacket
Top
Pants Buckle
Neckline Sleeve 2Sleeve 1
Dress
BagBag
Sleeve 1
Sleeve 2
Jacket
Coat
Dress
Sleeve 1 Sleeve 2Dress
Pants
Coat
Sleeve 3
Neckline
Tights 1 Tights 2
Belt
Collar
BagDress
Coat
Pocket 2
Tights 1Tights 1
Tights 2
Tights
Neckline
Neckline
Neckline
Neckline
Pocket 1
Sleeve
Sleeve 1
Sleeve 2
Sleeve 1
Sleeve 2
Sleeve 1
Sleeve 2
Jacket
Jacket
Tights
: nickname: leggingssilhouette: tight (fit)length: maxi (length)textile pattern: plain (pattern)
Tights 1
: nickname: leggingssilhouette: tight (fit)length: maxi (length)textile pattern: plain (pattern)
Tights 2
: nickname: patch (pocket)nickname: welt (pocket) nickname: slash (pocket) nickname: curved (pocket) nickname: flap (pocket)
Pocket 2
: silhouette: tight (fit) length: maxi (length) textile pattern: plain (pattern) nickname: leggingsopening type: fly (opening)
Tights
Fig. 11. Baseline results on the Fashionpedia validation set, using R-101-FPN. Masks,bounding boxes, and apparel categories (category score > 0.6) are shown. Attributesfrom top 10 masks (that contain attributes) from each image are also shown. Correctpredictions of objects and attributes are bolded
8.2 Fashionpedia Ontology and Knowledge Graph
Fig. 14 presents our Fashionpedia ontology in detail. Utilizing the proposedontology and the image dataset, a large-scale fashion knowledge graph can beconstructed to represent the fashion world in the product level. Fig. 15 illustratesa subset of the Fashionpedia knowledge graph.
Apparel graphs. Integrating the main garments, garment parts, attributes,and relationships presented in one outfit ensemble, we can create an apparelgraph representation for each outfit in an image. Each apparel graph is a struc-tured representation of an outfit ensemble, containing certain types of gar-ments. Nodes in the graph represent the main garments, garment parts, andattributes. Main garments and garment parts are linked to their respective at-tributes through different types of relationships. Figure 16 shows more imageexamples with apparel graphs.
Fashionpedia knowledge graph. While apparel graphs are localized rep-resentations of certain outfit ensembles in fashion images, we can also create asingle Fashionpedia knowledge graph (Figure 15). The Fashionpedia knowledge
Fashionpedia 21
(a) (b) (c) (d) (e) (f)
(g) (h) (i) (j) (k) (l)
Mod
aNet
Deep
Fashion2
Fig. 12. Baseline results on ModaNet and DeepFashion2 validation set
(a) (b)
Fig. 13. Generated masks on online-shopping images [53]. (a) and (b) show the sametypes of shoes in different settings. Our model correctly detects and categorizes thepair of shoes worn by a fashion model, yet mistakenly detects shoes as jacket and abag in (b)
graph is the union of all apparel graphs and includes entire main garments, gar-ment parts, attributes, and relationships in the dataset. In this way, we are ableto represent and understand fashion images in a more structured way.
We expect our Fashionpedia knowledge graph and the database to have ap-plicability to extending the existing knowledge graph (such as WikiData [46])with novel domain-specific knowledge, improving the underlying fashion prod-uct recommendation system, enhancing search engine’s results for fashion visualsearch, resolving ambiguous fashion-related words for text search, and more.
8.3 Dataset Analysis
Fig.16 shows more annotation examples, represented in the exploded viewsof annotation diagrams. Table 6 displays the details about “not sure” and “noton the list” results during attribute annotation process. We present the resultper super-categories of attributes. Label “not sure” means the expert annotatoris uncertain about the choice given the segmentation mask. “Not on the list”means the annotator is certain that the given mask presents another attributes
22 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
(a) Categories (b) Attributes (partially)
Fig. 14. Apparel categories (a) and fine-grained attributes (b) hierarchy in Fashionpe-dia
that is not presented in the Fashionpedia ontology. Other than “nicknames”(which is the specific name for a certain apparel category), less than 6% of thetotal masks account for the “not on the list” category.
Fig. 17 and 18 also compare Fashionpedia and other images datasets in termsof image size and vertices per polygons.
We compare image resolutions between Fashionpedia and four other segmen-tation datasets (COCO and LVIS share the same images). Fig. 17 shows thatimages in Fashionpedia have the most diverse image width and height. WhileModaNet has the most consistent resolutions of images. Note that high resolu-tion images will burden the data loading process of training. With that in mind,we will release our dataset in both the resized and the original versions.
We also report the distribution of number of vertices per polygons in Fig. 18.This measures the annotation effort in mask annotation. Fashionpedia has thesecond-widest range, next to LVIS.
8.4 Fashionpedia dataset creation details
Image collection. To avoid photo bias, all the images are randomly collectedfrom Flickr and free license stock photo websites (Unsplash, Burst by Shopify,Free stocks, Kaboompics, and Pexels). The collected images are further verifiedmanually by two fashion experts. Specifically, they check the scenes diversity andmake sure the clothing items were visible and annotatable in the images. Theestimated image type breakdown is listed as follows: street style images (30%
Fashionpedia 23
CONTAIN
CO
NTA
IN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONT
AIN
CONT
AIN
CONTAIN
CO
NTA
IN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
CONTAINCONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CO
NTAIN
CONT
AIN
CONTAIN
CONTAIN
CONT
AIN
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CO
NTA
INCONT
AIN
CONTAIN
CONTAIN
CONT
AIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CO
NTAIN
CONTAIN
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
CONTAIN
CONT
AIN
CONTAIN
CONT
AIN
CONTAIN
CO
NTAIN
CONTAIN
HAS_ATTRIBUTE
CONTAIN
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
EHAS
_ATT
RIBUTE
CONTAIN
CONTAIN
CONTAIN
CO
NTAIN
CONTAIN
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CO
NTAIN
CONTAIN
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
CO
NTA
IN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
CONTAIN
CONTAIN
HAS_
ATTR
IBUT
EH
AS_ATTRIBU
TE
HAS
_ATT
RIB
UTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
EHAS
_ATT
RIBU
TE
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONT
AIN
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONT
AIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
CONT
AIN
CONTAIN
HAS_ATTRIBUTE
CONTAIN
CONTAIN
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTEHAS_ATTRIBUTEHA
S_AT
TRIB
UTE
HAS_ATTRIBUTE
CONTAIN
CONT
AIN
HAS_ATTRIBUTE
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTECONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONT
AIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_A
TTRIB
UTE
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTEHAS_ATTRIBUTE
CONT
AIN
HAS_ATT
RIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
CONTAIN
CO
NTA
IN
CONTAIN
HAS
_ATT
RIB
UTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTEHAS_
ATTR
IBUT
E
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
CONTAIN
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
CONT
AIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTEH
AS_A
TTR
IBU
TEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
HAS_ATTRIBUTEHAS_ATTRIB…
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONT
AIN
CONTAIN
CONTAIN
CONT
AIN
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATT…
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
CONTAIN
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONT
AIN
HAS_ATTRIBUTE
CONTAIN
HAS
_ATT
RIB
UTE
HAS_ATTR
IBUTE
HAS_
ATTR
IBUT
E HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
EHA
S_AT
TRIB
UTE
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONT
AIN
CONT
AIN
CO
NTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
HAS_ATT
RIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
CONT
AIN
HAS_ATTRIBUTE
CONTAIN
CONTAIN
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CO
NTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
CO
NTA
IN
CONTAIN
product_3
Jacket / Blazer
product_2
product_1
product_7
Coat
product_6
product_5
product_4
product_9
Shorts
product_8
product_13
Skirt
product_12
product_11
product_10
product_18
Tops / T-ShirtSweatshirt
product_17
product_16
product_15
product_14
product_20
Dress
product_19
Bolero
Collarless
Set-in sleeve
Symm-etrical
Regular (fit)
Above-the-…
Wrist-length
Crop (Jacket)
Plain(pattern)
EmpireWaistline
Asym-metrical
Pleat /pleated
Zip-up
Peplum
Leather /Faux-leather
Biker(Jacket) /
Napoleon(Lapel)
Threequarter
Puffer(Jacket) /
Down(Jacket)
Quilted
Oversized(Collar)
Above-the-k…
Check / Plaid/ Tartan
Dropped-sh…
Loose (fit)
Fur
Hip (length)
Washed
Distressed /Ripped
Slit
Fly (Opening)
Mini(length) /
Low waist /
Curved(Pocket)
NormalWaist /
Pencil
StraightTight (fit) /
Slim (fit) /Skinny (fit)
Skater (Skirt)
Circle
Flare
Off-the-shoulder
Straightacross
(Neck)
Tiered /layered
Sleeveless
Shirt (Collar)
Ringer(T-shirt)
Round(Neck) /
Roundneck Stripe
One shoulder
Perfo-rated
Dolman(Sleeve) &
CONTAIN
CONTAIN
CONTAIN
CO
NTA
IN
CONTAIN
CONTAINCONTAIN
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE
H…
HAS_ATTRIBUTE
HAS_ATTRIBUTEH
AS_A
…
HAS_
ATTR
IBUT
E
HAS_…
HAS_ATTRIBUTE
Jacket / Blazer
Set-in sleeve
Regular (fit)
Above-the-hip
Plain(pattern)
Asymmet -rical
Zip-up Leather
Biker(Jacket)
Napoleon(Lapel)
Threequarter(length) /
CO
NTAIN
CONTAIN
CONTAINCONTAIN
CONTAIN
CONTAIN
CONT
AIN
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_
ATTR
IBUT
E
Skirt
Symmet -rical
Plain(pattern) Above
-the-Knee
NormalWaist
Skater (Skirt)
Circle
FlareCONTAIN
CONTAIN
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRI…
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
Shorts
Symmetr -ical
Regular (fit)
Washed
Distressed /Ripped
Fly (Opening)
Mini(length)
CONTAINCONTAIN
CONTAIN
CONTAIN CONTAIN
CONT
AIN
CONTAIN
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
CoatCollarless
Set-in sleeve
Symmet -rical
Wrist-length
Fur
Hip (length)
Apparel CategoriesFine-grained Attributes
Fig. 15. Fashionpedia Knowledge Graph: we present a subset of the Fashionpediaknowledge graph by aggregating 20 annotated products. The knowledge graph can beused as a tool for generating structural information
Table 6. Percentage of attributes in Fashionpedia broken down by super-class. “Texfinish, manu-tech.” is short for “Textile finishing, Manufacturing techniques”. Sum-maries of “not sure” and “not on the list” during attributes annotations are also pre-sented. It was calculated by the counts divided by the total masks with attributes. “notsure” is mainly due to occlusion inside the images, which cause some super-classes (suchas waistline, opening type, and length) are unidentifiable in the images. The percentageof “not on the list” is less than 15%. This demonstrates the comprehensiveness of ourFashionpedia ontology
Super-category class count not sure not on the list
Length 15 12.79% 0.01%Nickname 153 9.15 % 12.76%Opening Type 10 32.69% 3.90%Silhouettes 25 2.90% 0.27%Tex finish, manu-tech 21 4.47% 1.34%Textile Pattern 24 2.18% 5.30%None-Textile Type 14 4.90% 4.07%Neckline 25 9.57% 3.38%Waistline 7 30.46% 0.17%
of the full dataset), celebrity events images (30%), runway show images (30%),and online shopping product images (10%). For gender distribution, the genderin 80% of images are female, and 20% of images are male.
We did aim to address the issue of photographic bias in the image collectionprocess. Our dataset includes the images that are not centered, not full shot andwith occlusion (see examples in Fig. 19). Furthermore, our focus is to identifyclothing items, not to identify people during the image collection process.
Crowd workers and 10-day training for mask annotation. In the spiritof sharing the same apparel vocabulary for all the annotators, we prepared adetailed tutorial (with text descriptions and image examples) for each categoryand attributes in the Fashionpedia ontology (see Fig. 20 for an example). Beforethe official annotation process, we spent 10 days on training the 28 crowd workersfor the following three main reasons.
24 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
Pants
Ensemble
Tops Fly (opening)
Ankle length
Normal waist
Plain
Collar
Glasses
SleeveSleeve PocketPocket
Jacket
Ensemble
Dress
Below-the-knee
Double breasted
Loose (fit)Check / Plaid
plain
Shoes
SleeveSleeve Lapel
Glasses
Plain
Single-breasted
Slim (fit)
Symmetrical
Symmetrical
Asymmetrical
SymmetricalRegular (fit)
Plain
ShoesCape
ScarfPantyhose
Coat
Ensemble
Tops
Mini length
Ruched
Symmetrical
Wrapping
Floor lengthLoose (fit)
Abstract (pattern)
Neckline
Shorts Glasses
RuffleRuffle RuffleRuffle
Slim (fit)
Fly (opening)
Slim (fit)
Printed
Washed
Frayed
Plain
Relationships: Part of
Textile Finishing Textile Pattern Silhouette Opening Type
Length Nickname Waistline
Fig. 16. Example images and annotations from our dataset: the images are annotatedwith both instance segmentation masks and fine-grained attributes (black boxes)
Fashionpedia 25
Imag
e Di
agon
al L
engt
hFig. 17. Image size comparison among Fashionpedia, ModeNet, DeepFashion2, andCOCO2017, LVIS. Only training images are shown. The Fashionpedia images has themost diverse resolutions. Note that COCO2017 and LVIS have higher resolution imagesfor annotation. The distribution presented here are the publicly available photos
0 250 500 750 1000 1250 1500 1750Number of vertices per polygon
10 4
10 3
10 2
10 1
100
101
102
Perc
ent o
f pol
ygon
DeepFashion2ModaNetCOCO2017LVISFashionpedia
Fig. 18. The number of vertices per polygon. This represents the quality of masks andthe efforts of annotators. Values in the x-axis were discretized for better visual effect.Y-axis is on log scale. Fashionpedia has the second widest range, next to LVIS
First, some apparel categories are commonly referred as other names in gen-eral. For example, “top” is a general term for “shirt”, “sweater”, “t-shirt”,“sweatshirt”. Some annotators can mistakenly annotate a “shirt” as a “top”.We need to train these workers so they have the same understanding of theproposed Fashionpedia ontology. Utilizing the prepared tutorials (see Fig. 20 foran example), we trained and educated annotators on how to distinguish amongdifferent apparel categories.
Second, there are fine-grained differences among apparel categories. For ex-ample, we observed that some workers initially had difficulty in understandingthe difference among different garment parts, such as tassel and fringe. To helpthem understand the difference of these objects, we ask them to practice andidentify more sample images before the annotation process. Fig. 21 shows ourtutorials for these two categories. We specifically shows some correct and wrongexamples of annotations.
Third, we ask for the quality of annotations. In particular, we ask the an-notators to follow the contours of garments in the images as closely as possible.The polygon annotation process is monitored and verified for a few days beforethe workers started the actual annotation process.
Quality control of debatable apparel categories. During the anno-tation process, we allow the annotators to ask questions about the uncertaincategories. Two fashion experts monitored the annotation process, by answering
26 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
Fig. 19. Example of different type (people position/gesture, full/half shot, occlusion,scenes, garment types, etc.) of images in Fashionpedia dataset
questions, checking the final annotation quality, and providing weekly feedbackto annotators.
Instead of asking annotators to rate their confidence level of each segmenta-tion mask, we asked them to send back all the uncertain masks to us during theannotation. The same two fashion experts made the final judgement and gavethe feedback to the workers on these debatable or unsure fashion categories.Some examples of debatable or fuzzy fashion items that we have documentedcan be found in Figure 22.
8.5 Other concern and thoughts
Does this dataset include the images or labels of previous datasets?We only include the previous datasets for comparison. Our dataset doesnt in-tentionally use any images or labels from previous datasets. All the images andlabels from Fashionpedia are newly collected and annotated.
Who were the fashion experts annotating localized attributes inFashionpedia dataset? The fashion experts are the 15 fashion graduate stu-dents that we recruited from one of the top fashion design institutes. For double-blind policy, we cannot mention the name of the university. But we will releasethe name of this university and the collaborators from this university in the finalversion of this paper.
Instance segmentation v.s. semantic segmentation. We didn’t con-duct semantic segmentation experiments on our dataset for the following tworeasons: 1) Although semantic segmentation is a useful task, we believe instancesegmentation is more meaningful for fashion images. For example, if we need todistinguish the different shoe style of a fashion image containing 3 pair of dif-ferent shoes, instance segmentation (Figure 23(a)) can help us distinguish eachshoe separately. However, semantic segmentation (Figure 23(b)) will mix all theshoe instances together. 2) Semantic segmentation is the sub-problem of instancesegmentation. If we merge the same detected object class from our instance seg-mentation experimental result, it yields the results for semantic segmentation.
Fashionpedia 27
Fig. 20. Annotation tutorial example for shirt and top
Which image has the most annotated masks? In Fashionpedia dataset,the maximum number of segmentation masks in an image is 74 (Fig. 24). Weobserve that most of the masks in this image is “rivets” (which belong to garmentparts).
What’s the difference between Fashionpedia and other fine-graineddatasets like CUB-200? We propose to localize fine-grained attributes withinsegmentation masks of images. This is a novel task with real-world applicationto the best of our knowledge. The differences between Fashionpedia and CUBare as follows: 1) CUB uses keypoints as annotation to indicate different loca-tions on birds, while Fashionpedia has segmentation masks of garments, garmentparts, and accessories; 2) Fashionpedia attributes are associated with garmentor garment part instances in images, whereas CUB provides global attributes,not associated with any specific keypoints.
8.6 Attribute-Mask R-CNN inference demo
Fig. 25 shows the inference code and results of Attribute-Mask R-CNN. Theinference code and model is available at: https://fashionpedia.github.io/home/Model_and_API.html
28 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
What is Fringe? Correct & Wrong way to trace Fringe
What is Tassel? Correct & Wrong way to trace Tassel
Fig. 21. Annotation tutorial for fringe and tassel
Fig. 22. Example of debatable fashion items in Fashionpedia dataset. The questionsare asked by the crowdworkers. The answers are provided by two fashion experts
8.7 Fashionpedia API
The Fashionpedia API is available at: https://fashionpedia.github.io/home/
Model_and_API.html.
8.8 iMat-Fashion Kaggle challenges
To advance state-of-the-art of visual analysis of clothing, we hosted two kagglechallenges (imaterialist-fashion) on Kaggle in 2019 and 2020 respectively. The challengelinks are available at: https://www.kaggle.com/c/imaterialist-fashion-2019-FGVC6and https://www.kaggle.com/c/imaterialist-fashion-2020-fgvc7.
Fashionpedia 29
Fig. 23. Instance segmentation (left) and semantic segmentation (right)
References
1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghe-mawat, S., Irving, G., Isard, M., et al.: Tensorflow: A system for large-scale machinelearning. In: OSDI (2016) 11
2. Attneave, F., Arnoult, M.D.: The quantitative study of shape and pattern percep-tion. Psychological bulletin (1956) 10
3. Bloomsbury.com: Fashion photography archive, retrieved May 9, 2019from https://www.bloomsbury.com/dr/digital-resources/products/
fashion-photography-archive/ 54. Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack, T., Van Gool, L.:
Apparel classification with style. In: ACCV (2012) 4, 55. Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their at-
tributes. In: CVPR (2009) 26. FashionAI: Retrieved May 9, 2019 from http://fashionai.alibaba.com/ 4, 57. Fashionary.org: Fashionpedia - the visual dictionary of fashion design, retrieved
May 9, 2019 from https://fashionary.org/products/fashionpedia 58. Ferrari, V., Zisserman, A.: Learning visual attributes. In: Advances in neural in-
formation processing systems (2008) 29. Fu, C.Y., Berg, T.L., Berg, A.C.: Imp: Instance mask projection for high accuracy
semantic segmentation of things. arXiv preprint arXiv:1906.06597 (2019) 510. Ge, Y., Zhang, R., Wang, X., Tang, X., Luo, P.: Deepfashion2: A versatile bench-
mark for detection, pose estimation, segmentation and re-identification of clothingimages. In: CVPR (2019) 2, 4, 5, 8, 10, 12
11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu-rate object detection and semantic segmentation. In: CVPR (2014) 2
12. Girshick, R., Radosavovic, I., Gkioxari, G., Dollar, P., He, K.: Detectron. https://github.com/facebookresearch/detectron (2018) 11
13. Goyal, P., Dollar, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tul-loch, A., Jia, Y., He, K.: Accurate, large minibatch sgd: Training imagenet in 1hour. arXiv preprint arXiv:1706.02677 (2017) 11
14. Guo, S., Huang, W., Zhang, X., Srikhanta, P., Cui, Y., Li, Y., Adam, H., Scott,M.R., Belongie, S.: The imaterialist fashion attribute dataset. In: ICCV Workshops(2019) 1, 4, 5
15. Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instancesegmentation. In: CVPR (2019) 3, 8, 9, 10
16. Han, X., Wu, Z., Huang, P.X., Zhang, X., Zhu, M., Li, Y., Zhao, Y., Davis, L.S.:Automatic spatially-aware fashion concept discovery. In: ICCV (2017) 4, 5
17. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: ICCV (2017) 2, 1118. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
In: CVPR (2016) 1119. He, R., McAuley, J.: Ups and downs: Modeling the visual evolution of fashion
trends with one-class collaborative filtering. In: WWW (2016) 4, 5
30 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
20. Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In:ECCV (2012) 12
21. Hsiao, W.L., Grauman, K.: Learning the latent look: Unsupervised discovery of astyle-coherent embedding from fashion images. In: ICCV (2017) 4, 5
22. Huang, J., Feris, R., Chen, Q., Yan, S.: Cross-domain image retrieval with a dualattribute-aware ranking network. In: ICCV (2015) 4, 5
23. Inoue, N., Simo-Serra, E., Yamasaki, T., Ishikawa, H.: Multi-label fashion imageclassification with minimal human supervision. In: ICCV (2017) 4, 5
24. Kendall, E.F., McGuinness, D.L.: Ontology engineering. Synthesis Lectures on TheSemantic Web: Theory and Technology (2019) 3
25. Kiapour, M.H., Han, X., Lazebnik, S., Berg, A.C., Berg, T.L.: Where to buy it:Matching street clothing photos in online shops. In: ICCV (2015) 4, 5
26. Kiapour, M.H., Yamaguchi, K., Berg, A.C., Berg, T.L.: Hipster wars: Discoveringelements of fashion styles. In: ECCV (2014) 4, 5
27. Kirillov, A., He, K., Girshick, R., Rother, C., Dollar, P.: Panoptic segmentation.In: CVPR (2019) 4
28. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S.,Kalantidis, Y., Li, L.J., Shamma, D.A., et al.: Visual genome: Connecting languageand vision using crowdsourced dense image annotations. IJCV (2017) 3, 4, 5
29. Kuan, K., Ravaut, M., Manek, G., Chen, H., Lin, J., Nazir, B., Chen, C., Howe,T.C., Zeng, Z., Chandrasekhar, V.: Deep learning for lung cancer detection: tack-ling the kaggle data science bowl 2017 challenge. arXiv preprint arXiv:1705.09435(2017) 1
30. Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile clas-sifiers for face verification. In: ICCV (2009) 2
31. Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Featurepyramid networks for object detection. In: CVPR (2017) 11
32. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P.,Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV (2014) 3, 5,8, 10, 12
33. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: Powering robust clothesrecognition and retrieval with rich annotations. In: CVPR (2016) 1, 4, 5
34. Lopez, A.: Fdupes is a program for identifying or deleting duplicate files residingwithin specified directories., retrieved May 9, 2019 from https://github.com/
adrianlopezroche/fdupes 735. Mall, U., Matzen, K., Hariharan, B., Snavely, N., Bala, K.: Geostyle: Discovering
fashion trends and events. In: ICCV (2019) 536. Matzen, K., Bala, K., Snavely, N.: StreetStyle: Exploring world-wide clothing styles
from millions of photos. arXiv preprint arXiv:1706.01869 (2017) 4, 537. Parikh, D., Grauman, K.: Relative attributes. In: ICCV (2011) 238. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detec-
tion with region proposal networks. In: Advances in neural information processingsystems (2015) 2
39. Rosch, E.: Cognitive representations of semantic categories. Journal of experimen-tal psychology: General (1975) 2, 5
40. Rubio, A., Yu, L., Simo-Serra, E., Moreno-Noguer, F.: Multi-modal embedding formain product detection in fashion. In: ICCV (2017) 4, 5
41. Simo-Serra, E., Fidler, S., Moreno-Noguer, F., Urtasun, R.: Neuroaesthetics infashion: Modeling the perception of fashionability. In: CVPR (2015) 4, 5
42. Simo-Serra, E., Ishikawa, H.: Fashion style in 128 floats: Joint ranking and classi-fication using weak data for feature extraction. In: CVPR (2016) 4, 5
Fashionpedia 31
43. Takagi, M., Simo-Serra, E., Iizuka, S., Ishikawa, H.: What makes a style: Experi-mental analysis of fashion prediction. In: ICCV (2017) 4, 5
44. Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam,H., Perona, P., Belongie, S.: The inaturalist species classification and detectiondataset. In: CVPR (2018) 2, 4
45. Vittayakorn, S., Yamaguchi, K., Berg, A.C., Berg, T.L.: Runway to realway: Visualanalysis of fashion. In: WACV (2015) 4, 5
46. Vrandecic, D., Krotzsch, M.: Wikidata: a free collaborative knowledge base (2014)5, 21
47. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSDBirds-200-2011 Dataset. Tech. Rep. CNS-TR-2011-001, California Institute ofTechnology (2011) 2, 4
48. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for bench-marking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017) 4,5
49. Yamaguchi, K., Berg, T.L., Ortiz, L.E.: Chic or social: Visual popularity analysisin online fashion networks. In: ACM MM (2014) 4, 5
50. Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashionphotographs. In: CVPR (2012) 4, 5
51. Yu, A., Grauman, K.: Semantic jitter: Dense supervision for visual comparisonsvia synthetic images. In: ICCV (2017) 4, 5
52. Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., Darrell, T.: Bdd100k:A diverse driving video database with scalable annotation tooling. arXiv preprintarXiv:1805.04687 (2018) 1
53. ZARA.com: Zara white leather flat ankle boots with top stitching size 5 bnwt,retrieved May 9, 2019 from https://www.zara.com 21
54. Zheng, S., Yang, F., Kiapour, M.H., Piramuthu, R.: Modanet: A large-scale streetfashion dataset with polygon annotations. In: ACM MM (2018) 2, 4, 5, 8, 10, 12
32 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, et al.
instances_attributes_train2019.json, image 27 has 74 masks
Original image Original image with masks
Detailed info of each mask with associated localized attributes:
instances_attributes_train2019.json, image 27 has 74 masks
Original image Original image with masks
Detailed info of each mask with associated localized attributes:
instances_attributes_train2019.json, image 27 has 74 masks
Original image Original image with masks
Detailed info of each mask with associated localized attributes:The detailed info of each mask with associated localized attributes:
instances_attributes_train2019.json, image 27 has 74 masks
Original image Original image with masks
Detailed info of each mask with associated localized attributes:
instances_attributes_train2019.json, image 27 has 74 masks
Original image Original image with masks
Detailed info of each mask with associated localized attributes:
Original image Original image with masks
Fig. 24. The image with 74 masks in Fashionpedia dataset
Fashionpedia 33
detection_classes = np.squeeze(detection_classes.astype(np.int32), axis=(0,))[0:,!num_detections]
detection_masks = np.squeeze(detection_masks, axis=(0,))[0:num_detections]detection_logits = np.squeeze(detection_logits, axis=(0,))[0:num_detections]attribute_logits = np.squeeze(attribute_logits, axis=(0,))[0:num_detections]
# include attributesattributes = []for i in range(num_detections):
prob = softmax(attribute_logits[i,:])attributes.append([j for j in range(len(prob)) if prob[j] > score_th])
[5]: max_boxes_to_draw = 10
linewidth = 2fontsize = 10line_alpha = 0.8mask_alpha = 0.5
output_image_path = 'results.pdf'
image = copy.deepcopy(image_raw)
cm_subsection = np.linspace(0., 1., min(max_boxes_to_draw,�,!len(detection_scores)))
colors = [cm.jet(x) for x in cm_subsection]
2
plt.figure()fig, ax = plt.subplots(1)
for i in range(len(detection_scores)-1, -1, -1):if i < max_boxes_to_draw:
# draw segmentation maskseg_mask = detection_masks[i,:,:]color = list(np.array(colors[i][:3])*255)pil_image = Image.fromarray(image)solid_color = np.expand_dims(
np.ones_like(seg_mask), axis=2) * np.reshape(color, [1, 1, 3])pil_solid_color = Image.fromarray(np.uint8(solid_color)).convert('RGBA')pil_mask = Image.fromarray(np.uint8(255.0*mask_alpha*seg_mask)).
,!convert('L')pil_image = Image.composite(pil_solid_color, pil_image, pil_mask)image = np.array(pil_image.convert('RGB')).astype(np.uint8)
# draw bboxtop, left, bottom, right = detection_boxes[i,:]width = right - leftheight = bottom - topbbox = patches.Rectangle((left, top), width, height,
linewidth=linewidth, edgecolor=colors[i],facecolor='none', alpha=line_alpha)
ax.add_patch(bbox)
# draw textattributes_str = ", ".join([val['attributes'][attr]['name'] for attr in�
,!attributes[i]])detections_str = '{} ({}%)'.
,!format(val['categories'][detection_classes[i]]['name'],int(100*detection_scores[i]))
display_str = '{}: {}'.format(detections_str, attributes_str)
font = ImageFont.truetype('arial.ttf', fontsize)text_width, text_height = font.getsize(detections_str)props = dict(boxstyle='Round, pad=0.05', facecolor=colors[i],�
,!linewidth=0, alpha=mask_alpha)ax.text(left, bottom, detections_str, fontsize=fontsize,�
,!verticalalignment='top', bbox=props)print(display_str)
plt.imshow(image, interpolation='none')plt.axis('off')plt.savefig('result.pdf', transparent=True, bbox_inches='tight', pad_inches=0.05)
pocket (95%): patch (pocket), slash (pocket), curved (pocket), flap (pocket)
3
buckle (98%): curved (pocket)pocket (99%): patch (pocket), welt (pocket), slash (pocket), curved (pocket),flap (pocket)pants (99%): maxi (length), fly (opening), no non-textile material, no specialmanufacturing technique, plain (pattern)collar (99%): shirt (collar)glasses (99%):lapel (99%): symmetrical, notched (lapel), single breastedbelt (99%): plain (pattern)sleeve (99%): wrist-length, set-in sleevesleeve (99%): wrist-length, set-in sleeve
<Figure size 432x288 with 0 Axes>
4
[1]: import warningswarnings.filterwarnings('ignore')
[2]: %matplotlib inlinefrom matplotlib import cmimport matplotlib.patches as patchesimport matplotlib.pyplot as pltfrom PIL import Imageimport numpy as npimport PIL.ImageFont as ImageFontimport copyimport jsonimport sys
from pycocotools import maskfrom scipy.special import softmax
import tensorflow as tf
[3]: session = tf.Session(graph=tf.Graph())saved_model_dir = 'model'_ = tf.saved_model.loader.load(session, ['serve'], saved_model_dir)
[4]: val = json.load(open('val2020.json'))n_class = len(val['categories'])score_th = 0.05
image_path = 'input.jpg'with open(image_path, 'rb') as f:
np_image_string = np.array([f.read()])image_raw = Image.open(image_path)width, height = image_raw.sizeimage_raw = np.array(image_raw.getdata()).reshape(height, width, 3).astype(np.
,!uint8)plt.imshow(image_raw, interpolation='none')plt.axis('off')
num_detections, detection_boxes, detection_classes, detection_scores,�,!detection_masks, detection_logits, attribute_logits, image_info = session.run(
['NumDetections:0', 'DetectionBoxes:0', 'DetectionClasses:0',�,!'DetectionScores:0', 'DetectionMasks:0', 'DetectionLogits:0', 'AttributeLogits:,!0', 'ImageInfo:0'],
feed_dict={'Placeholder:0': np_image_string})num_detections = np.squeeze(num_detections.astype(np.int32), axis=(0,))detection_boxes = np.squeeze(detection_boxes * image_info[0, 2], axis=(0,))[0:
,!num_detections]detection_scores = np.squeeze(detection_scores, axis=(0,))[0:num_detections]
1
detection_classes = np.squeeze(detection_classes.astype(np.int32), axis=(0,))[0:,!num_detections]
detection_masks = np.squeeze(detection_masks, axis=(0,))[0:num_detections]detection_logits = np.squeeze(detection_logits, axis=(0,))[0:num_detections]attribute_logits = np.squeeze(attribute_logits, axis=(0,))[0:num_detections]
# include attributesattributes = []for i in range(num_detections):
prob = softmax(attribute_logits[i,:])attributes.append([j for j in range(len(prob)) if prob[j] > score_th])
[5]: max_boxes_to_draw = 10
linewidth = 2fontsize = 10line_alpha = 0.8mask_alpha = 0.5
output_image_path = 'results.pdf'
image = copy.deepcopy(image_raw)
cm_subsection = np.linspace(0., 1., min(max_boxes_to_draw,�,!len(detection_scores)))
colors = [cm.jet(x) for x in cm_subsection]
2
detection_classes = np.squeeze(detection_classes.astype(np.int32), axis=(0,))[0:,!num_detections]
detection_masks = np.squeeze(detection_masks, axis=(0,))[0:num_detections]detection_logits = np.squeeze(detection_logits, axis=(0,))[0:num_detections]attribute_logits = np.squeeze(attribute_logits, axis=(0,))[0:num_detections]
# include attributesattributes = []for i in range(num_detections):
prob = softmax(attribute_logits[i,:])attributes.append([j for j in range(len(prob)) if prob[j] > score_th])
[5]: max_boxes_to_draw = 10
linewidth = 2fontsize = 10line_alpha = 0.8mask_alpha = 0.5
output_image_path = 'results.pdf'
image = copy.deepcopy(image_raw)
cm_subsection = np.linspace(0., 1., min(max_boxes_to_draw,�,!len(detection_scores)))
colors = [cm.jet(x) for x in cm_subsection]
2
Fig. 25. Attribute-Mask R-CNN inference demo