Exploiting Edge-Oriented Reasoning for 3D Point-based ...

3
Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis (Supplementary Materials) Chaoyi Zhang University of Sydney [email protected] Jianhui Yu University of Sydney [email protected] Yang Song University of New South Wales [email protected] Weidong Cai University of Sydney [email protected] The supplementary materials for [8] contain implemen- tation and training details, as well as other additional speci- fications for the following studies: A. 3D SGG point on Real-World 3D Scans. B. 3D SGG point on Synthetic 3D Scenes. C. Traditional Graph Representation Learning. A. 3D SGG point on Real-World 3D Scans We adopted the same dataset split [3] for method com- parisons. To alleviate the serious object class imbalance is- sues that appeared within the SG node recognition process, we selected their so-called RIO27 annotation set (27 object classes 1 ) for our SGG point studies, rather than their ini- tially published annotations (160 object classes) [4]. RIO27 annotation set was a subset mapping to the raw 160-class one and it was later officially released in their repository (here) . Similarly, we firstly filtered out their annotated com- parative relationships (e.g., bigger than and darker than) and following [4] we considered only a subset of the rela- tionships (16 structural relationships 2 ) to formulate the SG edge recognition as multi-class classification problems. All irrelevant objects and inter-object structural relationships were removed to obtain our cleared SG node and edge an- notations. Our densely sampled point cloud representations of 3D real-world scenes, together with these cleared SG annotations, will be published online for reproducibility, as well as fostering any further SGG point research. 1 C object := {wall, floor, cabinet, bed, chair, sofa, table, door, win- dow, counter, shelf, curtain, pillow, clothes, ceiling, fridge, tv, towel, plant, box, nightstand, toilet, sink, lamp, bathtub, object, blanket}. 2 C relationship := {supported by, attached to, standing on, lying on, hanging on, connected to, leaning against, part of, belonging to, build in, standing in, cover, lying in, hanging in, spatial proximity, close by}. We chose Adam as the optimizer with learning rate and weight decay set to 1e-3 and 1e-4, respectively. The SGG point framework was trained for 50 epochs with early stopping techniques applied on held-out validation set, and batch size was set to 4. We randomly cropped 4096 points on-the-fly for each scene by maintaining a same sampling ratio to be shared in between all object instances within any given scenes. Such design was insensitive to the varying ob- ject sizes and could thus ensure a balanced point sampling achieved at instance-level. The SGG point framework pro- posed for real-world 3D scans was established and trained with four 11GB NVIDIA GeForce GTX 1080Ti GPUs. More qualitative results can be found as Fig. 1. B. 3D SGG point on Synthetic 3D Scenes We followed the released dataset split and three-class structural relationship annotations [9] to establish our train- ing procedures. Moreover, the optimizer is selected as Adam with learning rate and weight decay set to 1e-3 and 1e-4, respectively. The total training epochs were set to 100, while batch size was set to 4. Since each room category may own unique object classes, the number of object classes for each room category was listed as follows: C bedroom = C living = 51, C office = 42, and C bathroom = 31. Any scenes containing more than 60 object nodes were divided into sub-graphs for training. Two 11GB NVIDIA GeForce GTX 1080Ti GPUs were employed for this group of exper- iments. C. Traditional Graph Representation Learn- ing Our contributions could also be validated on conven- tional graph representation learning tasks, such as node- wise classification and whole-graph recognition problems. More specifically, our method was evaluated on three

Transcript of Exploiting Edge-Oriented Reasoning for 3D Point-based ...

Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis(Supplementary Materials)

Chaoyi ZhangUniversity of Sydney

[email protected]

Jianhui YuUniversity of Sydney

[email protected]

Yang SongUniversity of New South Wales

[email protected]

Weidong CaiUniversity of Sydney

[email protected]

The supplementary materials for [8] contain implemen-tation and training details, as well as other additional speci-fications for the following studies:

A. 3D SGGpoint on Real-World 3D Scans.

B. 3D SGGpoint on Synthetic 3D Scenes.

C. Traditional Graph Representation Learning.

A. 3D SGGpoint on Real-World 3D ScansWe adopted the same dataset split [3] for method com-

parisons. To alleviate the serious object class imbalance is-sues that appeared within the SG node recognition process,we selected their so-called RIO27 annotation set (27 objectclasses1) for our SGGpoint studies, rather than their ini-tially published annotations (160 object classes) [4]. RIO27annotation set was a subset mapping to the raw 160-classone and it was later officially released in their repository(here). Similarly, we firstly filtered out their annotated com-parative relationships (e.g., bigger than and darker than)and following [4] we considered only a subset of the rela-tionships (16 structural relationships2) to formulate the SGedge recognition as multi-class classification problems. Allirrelevant objects and inter-object structural relationshipswere removed to obtain our cleared SG node and edge an-notations. Our densely sampled point cloud representationsof 3D real-world scenes, together with these cleared SGannotations, will be published online for reproducibility, aswell as fostering any further SGGpoint research.

1Cobject := {wall, floor, cabinet, bed, chair, sofa, table, door, win-dow, counter, shelf, curtain, pillow, clothes, ceiling, fridge, tv, towel,plant, box, nightstand, toilet, sink, lamp, bathtub, object, blanket}.

2Crelationship := {supported by, attached to, standing on, lying on,hanging on, connected to, leaning against, part of, belonging to, build in,standing in, cover, lying in, hanging in, spatial proximity, close by}.

We chose Adam as the optimizer with learning rateand weight decay set to 1e-3 and 1e-4, respectively. TheSGGpoint framework was trained for 50 epochs with earlystopping techniques applied on held-out validation set, andbatch size was set to 4. We randomly cropped 4096 pointson-the-fly for each scene by maintaining a same samplingratio to be shared in between all object instances within anygiven scenes. Such design was insensitive to the varying ob-ject sizes and could thus ensure a balanced point samplingachieved at instance-level. The SGGpoint framework pro-posed for real-world 3D scans was established and trainedwith four 11GB NVIDIA GeForce GTX 1080Ti GPUs.More qualitative results can be found as Fig. 1.

B. 3D SGGpoint on Synthetic 3D ScenesWe followed the released dataset split and three-class

structural relationship annotations [9] to establish our train-ing procedures. Moreover, the optimizer is selected asAdam with learning rate and weight decay set to 1e-3 and1e-4, respectively. The total training epochs were set to 100,while batch size was set to 4. Since each room categorymay own unique object classes, the number of object classesfor each room category was listed as follows: Cbedroom =Cliving = 51, Coffice = 42, and Cbathroom = 31. Anyscenes containing more than 60 object nodes were dividedinto sub-graphs for training. Two 11GB NVIDIA GeForceGTX 1080Ti GPUs were employed for this group of exper-iments.

C. Traditional Graph Representation Learn-ing

Our contributions could also be validated on conven-tional graph representation learning tasks, such as node-wise classification and whole-graph recognition problems.More specifically, our method was evaluated on three

standing on

attached to

spatia

l proxim

ity

(GT: a

ttach

ed to)

attached to

attached to

attach

ed to

spatial proximity

(GT: close by)

lying on

attached to

close by

(GT: spatial proximity)

attached to

attached toattached to

lying on

(GT: cover)

floor

hang

ing o

n(G

T: at

tach

ed to

)

stan

ding

on

close by

walldoorwall

wall(GT: object)

chair(GT: sofa)

table(GT: sofa)

cabinet

shelf(GT: object)

wall

sofa

box(GT: object)

pillow

pillow pillow

pillow(GT: blanket)

pillowpillow

pillow

pillow

lying on

lying

on

lying onlying on

lying on

lying on

spatial proximity

(GT: close by)

spati

al pro

ximity

(GT: clo

se by

)

hang

ing o

n(G

T: co

nnec

ed to

)

hanging on

stand

ing o

n

standing onstanding on

attached to

attac

hed t

o

ceiling

atta

ched

to

sofa

pillow

pillow

pillow

pillow

lying on

lying on

lying on

lying on

close by

wall

standing on

tabletable

(GT: chair)

spatial proximity (GT: close by)

chair(GT: shelf)

shelf

shelf

attached to

wall

shelf cabinet(GT: shelf)

spatial proximity

(GT: attached to)

wall

curtain

hanging on

shelf

atta

ched

to wall(GT: ceiling)

lamp

hanging onattached to

cabinet

plant(GT: box)

window(GT: object)

supported by

floor

atta

ched

toattached to

standing on

standing on

stand

ing on

standing onstanding on

cabinet

cabinet

wall

standing on

standing onattached to

cabinet

standing on

chair

standing on

wall

door(GT: lamp)

standing on

table

object(GT: table)

cabinet(GT: table)

standing on

(GT: lying on)

supp

orte

d by

(GT:

atta

ched

to)

spatial proximity

(GT: close by)

object(GT: box)

plant(GT: object)

plant

plant

counter(GT: window)standing on

wallwindow attached to

close by

spatia

l proxim

ity (G

T: close by)

standing o

n

object

object

object

object

object

stand

ing o

nstanding on

object

box

standing on

close by

standing on(GT: lying on)

standing o

n

floor

lying on

(GT: standing on)

chair

chairclose by

spatial proximity

(GT: close by)

shelf(GT: object)

close

by

spat

ial p

roxim

ity (G

T: cl

ose b

y)

spatial proximity(GT: close by)

close by

wall

ceiling

attac

hed to

attached to

wall

wall

wall

attached toattached to

attached tolamp

hang

ing o

n

door

hanging on

lamphanging on

chair

object

object

standing on

standing on

standing on

attached to

attached to

attached to

attached to

stand

ing on

stand

ing o

n

attached to

Figure 1. Qualitative visualization of the SGGpoint framework, where misclassified object and structural relationship samples are markedwith ground truth values in red, while the correct ones are shown in green with ground truth values omitted.

popular citation network datasets (Cora, CiteSeer, andPubmed) [7] and two molecular datasets (Tox21 andBBBP) [6]. The evaluations were completed through twouniversal benchmark scripts available for graph represen-tation learning studies, with all specific training settingsunchanged for all method evaluation, except for repeatingtheir procedure 50 times for each approach. The followingexperiments were conducted on one single 8GB NVIDIAGeForce GTX 1070Ti GPU.

C.1. Node-wise classification on citation datasets

We applied a Pytorch Geometric [1] script (here) to repli-cate the experiments on citation network datasets for eval-uations among node-wise classification approaches. Morespecifically, we adopted Adam as the optimizer with learn-ing rate and weight decay set to 1e-2 and 5e-4, respec-tively. All methods being investigated were trained over 200epochs for each run and 50 runs in total to reach a steadilyaveraged accuracy for performance comparisons. All GNNswere instanced as two-layer networks with ReLU as in-termediate non-linearity between, except for EGNN whichwas reproduced following their settings reported in [2].Their inner channels were set to 16 by default, unless oth-erwise specified.

C.2. Whole-graph recognition on moleculardatasets

We adopted a DGL [5] script (here) to evaluate whole-graph recognition approaches for molecular analysis. Morespecifically, Adam was utilized for parameter optimizationwith early stopping techniques applied over maximum 1000training epochs. Scaffold splitting policy was employedto divide all datasets into 80% training, 10% validation,and 10% testing sets, where hyper-parameter searches wereconducted with Bayesian Optimization for 32 trials, i.e.,a randomly initialized model would be trained for eachtrial, and the best model achieving highest validation perfor-mance could then be selected across trials for final evalua-tion on testing set. We constructed GNNs with their defaultarchitectures whose configuration details, as well as theirfine-tuned hyper-parameters such as learning rate and batchsize, can be found available in the online DGL repository.

References[1] Matthias Fey and Jan E. Lenssen. Fast graph representation

learning with PyTorch Geometric. In Proceedings of ICLRWorkshop on Representation Learning on Graphs and Mani-folds (ICLRW), 2019. 3

[2] Liyu Gong and Qiang Cheng. Exploiting edge features forgraph neural networks. In Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR),2019. 3

[3] Johanna Wald, Armen Avetisyan, Nassir Navab, FedericoTombari, and Matthias Nießner. RIO: 3D object instance re-localization in changing indoor environments. In Proceed-ings of the IEEE International Conference on Computer Vi-sion (ICCV), 2019. 1

[4] Johanna Wald, Helisa Dhamo, Nassir Navab, and FedericoTombari. Learning 3D semantic scene graphs from 3D in-door reconstructions. In Proceedings of IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2020. 1

[5] Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li,Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai,Tianjun Xiao, Tong He, George Karypis, Jinyang Li, andZheng Zhang. Deep Graph Library: A graph-centric, highly-performant package for graph neural networks. arXiv preprintarXiv:1909.01315, 2019. 3

[6] Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, JosephGomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, andVijay Pande. MoleculeNet: a benchmark for molecular ma-chine learning. Chem. Sci., 9:513–530, 2018. 3

[7] Zhilin Yang, William Cohen, and Ruslan Salakhudinov. Re-visiting semi-supervised learning with graph embeddings.In Proceedings of the International Conference on MachineLearning (ICML), volume 48, pages 40–48, New York, NewYork, USA, 2016. PMLR. 3

[8] Chaoyi Zhang, Jianhui Yu, Yang Song, and Weidong Cai.Exploiting edge-oriented reasoning for 3D point-based scenegraph analysis. In Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2021. 1

[9] Yang Zhou, Zachary While, and Evangelos Kalogerakis.SceneGraphNet: Neural message passing for 3D indoor sceneaugmentation. In Proceedings of the IEEE International Con-ference on Computer Vision (ICCV), 2019. 1