zhb10086 commited on
Commit
9968cfd
·
verified ·
1 Parent(s): 5c56007

Upload 8 files

Browse files
20240921_173230.log ADDED
@@ -0,0 +1,1106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-09-21 17:32:30,446 - mmdet - INFO - Environment info:
2
+ ------------------------------------------------------------
3
+ sys.platform: linux
4
+ Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0]
5
+ CUDA available: True
6
+ GPU 0,1,2,3,4,5,6,7: NVIDIA RTX A5000
7
+ CUDA_HOME: /data/home/hanbo/cuda-11.6
8
+ NVCC: Cuda compilation tools, release 11.6, V11.6.55
9
+ GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
10
+ PyTorch: 1.12.1+cu116
11
+ PyTorch compiling details: PyTorch built with:
12
+ - GCC 9.3
13
+ - C++ Version: 201402
14
+ - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
15
+ - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
16
+ - OpenMP 201511 (a.k.a. OpenMP 4.5)
17
+ - LAPACK is enabled (usually provided by MKL)
18
+ - NNPACK is enabled
19
+ - CPU capability usage: AVX2
20
+ - CUDA Runtime 11.6
21
+ - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
22
+ - CuDNN 8.3.2 (built against CUDA 11.5)
23
+ - Magma 2.6.1
24
+ - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
25
+
26
+ TorchVision: 0.13.1+cu116
27
+ OpenCV: 4.10.0
28
+ MMCV: 1.7.2
29
+ MMCV Compiler: GCC 9.3
30
+ MMCV CUDA Compiler: 11.6
31
+ MMDetection: 2.28.2+d592e33
32
+ ------------------------------------------------------------
33
+
34
+ 2024-09-21 17:32:31,819 - mmdet - INFO - Distributed training: True
35
+ 2024-09-21 17:32:33,172 - mmdet - INFO - Config:
36
+ norm_cfg = dict(
37
+ type='BN',
38
+ requires_grad=False,
39
+ mean=[123.675, 116.28, 103.53],
40
+ std=[1.0, 1.0, 1.0],
41
+ to_rgb=True)
42
+ model = dict(
43
+ type='FasterRCNNRelAfford',
44
+ backbone=dict(
45
+ type='mmdet.ResNet',
46
+ depth=101,
47
+ num_stages=3,
48
+ strides=(1, 2, 2),
49
+ dilations=(1, 1, 1),
50
+ out_indices=(2, ),
51
+ frozen_stages=1,
52
+ norm_cfg=dict(type='BN', requires_grad=False),
53
+ norm_eval=True,
54
+ style='caffe',
55
+ init_cfg=dict(
56
+ type='Pretrained',
57
+ checkpoint='open-mmlab://detectron2/resnet101_caffe')),
58
+ rpn_head=dict(
59
+ type='mmdet.RPNHead',
60
+ in_channels=1024,
61
+ feat_channels=1024,
62
+ anchor_generator=dict(
63
+ type='AnchorGenerator',
64
+ scales=[8, 16, 32],
65
+ ratios=[0.33, 0.5, 1.0, 2.0, 3.0],
66
+ strides=[16]),
67
+ bbox_coder=dict(
68
+ type='DeltaXYWHBBoxCoder',
69
+ target_means=[0.0, 0.0, 0.0, 0.0],
70
+ target_stds=[1.0, 1.0, 1.0, 1.0]),
71
+ loss_cls=dict(
72
+ type='mmdet.CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
73
+ loss_bbox=dict(type='mmdet.L1Loss', loss_weight=1.0)),
74
+ roi_head=None,
75
+ child_head=dict(
76
+ type='invigorate.PairedRoIHead',
77
+ shared_head=dict(
78
+ type='invigorate.PairedResLayer',
79
+ depth=50,
80
+ stage=3,
81
+ stride=1,
82
+ style='caffe',
83
+ norm_eval=False,
84
+ share_weights=False),
85
+ paired_roi_extractor=dict(
86
+ type='invigorate.VMRNPairedRoIExtractor',
87
+ roi_layer=dict(type='RoIPool', output_size=7),
88
+ out_channels=1024,
89
+ featmap_strides=[16]),
90
+ relation_head=dict(
91
+ type='invigorate.BBoxPairHead',
92
+ with_avg_pool=True,
93
+ roi_feat_size=7,
94
+ in_channels=2048,
95
+ num_relations=1,
96
+ loss_cls=dict(
97
+ type='mmdet.CrossEntropyLoss',
98
+ use_sigmoid=False,
99
+ loss_weight=1.0))),
100
+ leaf_head=dict(
101
+ type='mmdet.StandardRoIHead',
102
+ shared_head=dict(
103
+ type='mmdet.ResLayer',
104
+ depth=50,
105
+ stage=3,
106
+ stride=1,
107
+ style='caffe',
108
+ norm_cfg=dict(type='BN', requires_grad=False),
109
+ norm_eval=True),
110
+ bbox_roi_extractor=dict(
111
+ type='mmdet.SingleRoIExtractor',
112
+ roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
113
+ out_channels=1024,
114
+ featmap_strides=[16]),
115
+ bbox_head=dict(
116
+ type='mmdet.BBoxHead',
117
+ with_avg_pool=True,
118
+ with_reg=False,
119
+ roi_feat_size=7,
120
+ in_channels=2048,
121
+ num_classes=2,
122
+ loss_cls=dict(
123
+ type='mmdet.CrossEntropyLoss',
124
+ use_sigmoid=False,
125
+ loss_weight=1.0))),
126
+ train_cfg=dict(
127
+ rpn=dict(
128
+ assigner=dict(
129
+ type='MaxIoUAssigner',
130
+ pos_iou_thr=0.7,
131
+ neg_iou_thr=0.3,
132
+ min_pos_iou=0.3,
133
+ match_low_quality=True,
134
+ ignore_iof_thr=-1),
135
+ sampler=dict(
136
+ type='RandomSampler',
137
+ num=256,
138
+ pos_fraction=0.5,
139
+ neg_pos_ub=-1,
140
+ add_gt_as_proposals=False),
141
+ allowed_border=0,
142
+ pos_weight=-1,
143
+ debug=False),
144
+ rpn_proposal=dict(
145
+ nms_pre=12000,
146
+ max_per_img=2000,
147
+ nms=dict(type='nms', iou_threshold=0.7),
148
+ min_bbox_size=0),
149
+ rcnn=dict(
150
+ assigner=dict(
151
+ type='MaxIoUAssigner',
152
+ pos_iou_thr=0.5,
153
+ neg_iou_thr=0.5,
154
+ min_pos_iou=0.5,
155
+ match_low_quality=False,
156
+ ignore_iof_thr=-1),
157
+ sampler=dict(
158
+ type='RandomSampler',
159
+ num=256,
160
+ pos_fraction=0.25,
161
+ neg_pos_ub=-1,
162
+ add_gt_as_proposals=True),
163
+ pos_weight=-1,
164
+ debug=False),
165
+ child_head=dict(
166
+ assigner=dict(
167
+ type='MaxIoUAssigner',
168
+ pos_iou_thr=0.7,
169
+ neg_iou_thr=0.5,
170
+ min_pos_iou=0.7,
171
+ match_low_quality=False,
172
+ ignore_iof_thr=-1),
173
+ relation_sampler=dict(
174
+ type='RandomRelationSampler',
175
+ num=32,
176
+ pos_fraction=0.5,
177
+ cls_ratio_ub=1.0,
178
+ add_gt_as_proposals=True,
179
+ num_relation_cls=1,
180
+ neg_id=0),
181
+ pos_weight=-1,
182
+ online_data=True,
183
+ online_start_iteration=0),
184
+ leaf_head=dict(
185
+ assigner=dict(
186
+ type='MaxIoUAssigner',
187
+ pos_iou_thr=0.5,
188
+ neg_iou_thr=0.5,
189
+ min_pos_iou=0.5,
190
+ match_low_quality=False,
191
+ ignore_iof_thr=-1),
192
+ sampler=dict(
193
+ type='RandomSampler',
194
+ num=64,
195
+ pos_fraction=0.25,
196
+ neg_pos_ub=3.0,
197
+ add_gt_as_proposals=True),
198
+ pos_weight=-1,
199
+ debug=False)),
200
+ test_cfg=dict(
201
+ rpn=dict(
202
+ nms_pre=6000,
203
+ max_per_img=300,
204
+ nms=dict(type='nms', iou_threshold=0.7),
205
+ min_bbox_size=0),
206
+ rcnn=dict(
207
+ score_thr=0.05,
208
+ nms=dict(type='nms', iou_threshold=0.3),
209
+ max_per_img=100),
210
+ child_head=dict(
211
+ bbox_score_thr=0.5, verbose_relation=False, average_scores=False),
212
+ leaf_head=dict(score_thr=0.5, nms=None, max_per_img=100)))
213
+ dataset_type = 'REGRADAffordDataset'
214
+ data_root = 'data/regrad/'
215
+ img_norm_cfg = dict(
216
+ mean=[123.675, 116.28, 103.53], std=[1.0, 1.0, 1.0], to_rgb=True)
217
+ train_pipeline = [
218
+ dict(type='LoadImageFromFile', to_float32=True),
219
+ dict(
220
+ type='LoadAnnotationsCustom',
221
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
222
+ dict(type='RandomFlip', flip_ratio=0.5),
223
+ dict(type='PhotoMetricDistortion'),
224
+ dict(
225
+ type='RandomCrop', crop_type='random_keep', allow_negative_crop=False),
226
+ dict(type='Expand', mean=[123.675, 116.28, 103.53], ratio_range=(1, 2)),
227
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
228
+ dict(
229
+ type='Normalize',
230
+ mean=[123.675, 116.28, 103.53],
231
+ std=[1.0, 1.0, 1.0],
232
+ to_rgb=True),
233
+ dict(type='Pad', size_divisor=32),
234
+ dict(
235
+ type='DefaultFormatBundleCustom',
236
+ keys=['img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
237
+ 'gt_relleaves']),
238
+ dict(
239
+ type='Collect',
240
+ keys=['img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'])
241
+ ]
242
+ test_pipeline = [
243
+ dict(type='LoadImageFromFile'),
244
+ dict(type='LoadRelationProposals'),
245
+ dict(
246
+ type='MultiScaleFlipAug',
247
+ img_scale=(1000, 600),
248
+ flip=False,
249
+ transforms=[
250
+ dict(type='Resize', keep_ratio=True),
251
+ dict(
252
+ type='Normalize',
253
+ mean=[123.675, 116.28, 103.53],
254
+ std=[1.0, 1.0, 1.0],
255
+ to_rgb=True),
256
+ dict(type='Pad', size_divisor=32),
257
+ dict(type='ImageToTensor', keys=['img']),
258
+ dict(type='Collect', keys=['img', 'relation_proposals'])
259
+ ])
260
+ ]
261
+ data = dict(
262
+ train=dict(
263
+ _delete_=True,
264
+ type='ConcatDataset',
265
+ datasets=[
266
+ dict(
267
+ type='REGRADAffordDataset',
268
+ data_root='data/regrad/',
269
+ meta_info_file='dataset_train_5k/meta_infos.json',
270
+ ann_file='dataset_train_5k/objects.json',
271
+ img_prefix='dataset_train_5k/RGBImages',
272
+ seg_prefix='dataset_train_5k/SegmentationImages',
273
+ depth_prefix='dataset_train_5k/DepthImages',
274
+ pipeline=[
275
+ dict(type='LoadImageFromFile', to_float32=True),
276
+ dict(
277
+ type='LoadAnnotationsCustom',
278
+ keys=[
279
+ 'gt_bboxes', 'gt_labels', 'gt_relchilds',
280
+ 'gt_relleaves'
281
+ ]),
282
+ dict(type='RandomFlip', flip_ratio=0.5),
283
+ dict(type='PhotoMetricDistortion'),
284
+ dict(
285
+ type='RandomCrop',
286
+ crop_type='random_keep',
287
+ allow_negative_crop=False),
288
+ dict(type='Expand', mean=[123.675, 116.28, 103.53]),
289
+ dict(
290
+ type='Resize', img_scale=(1000, 600), keep_ratio=True),
291
+ dict(
292
+ type='Normalize',
293
+ mean=[123.675, 116.28, 103.53],
294
+ std=[1.0, 1.0, 1.0],
295
+ to_rgb=True),
296
+ dict(type='Pad', size_divisor=32),
297
+ dict(
298
+ type='DefaultFormatBundleCustom',
299
+ keys=[
300
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
301
+ 'gt_relleaves'
302
+ ]),
303
+ dict(
304
+ type='Collect',
305
+ keys=[
306
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
307
+ 'gt_relleaves'
308
+ ])
309
+ ],
310
+ min_pos_relation=1,
311
+ class_agnostic=True),
312
+ dict(
313
+ type='MetaGraspNetAffordDataset',
314
+ data_root='data/metagraspnet/sim/',
315
+ meta_info_file='meta_infos_train.json',
316
+ pipeline=[
317
+ dict(type='LoadImageFromFile', to_float32=True),
318
+ dict(
319
+ type='LoadAnnotationsCustom',
320
+ keys=[
321
+ 'gt_bboxes', 'gt_labels', 'gt_relchilds',
322
+ 'gt_relleaves'
323
+ ]),
324
+ dict(type='RandomFlip', flip_ratio=0.5),
325
+ dict(type='PhotoMetricDistortion'),
326
+ dict(
327
+ type='RandomCrop',
328
+ crop_type='random_keep',
329
+ allow_negative_crop=False),
330
+ dict(
331
+ type='Expand',
332
+ mean=[123.675, 116.28, 103.53],
333
+ ratio_range=(1, 2)),
334
+ dict(
335
+ type='Resize', img_scale=(1000, 600), keep_ratio=True),
336
+ dict(
337
+ type='Normalize',
338
+ mean=[123.675, 116.28, 103.53],
339
+ std=[1.0, 1.0, 1.0],
340
+ to_rgb=True),
341
+ dict(type='Pad', size_divisor=32),
342
+ dict(
343
+ type='DefaultFormatBundleCustom',
344
+ keys=[
345
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
346
+ 'gt_relleaves'
347
+ ]),
348
+ dict(
349
+ type='Collect',
350
+ keys=[
351
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
352
+ 'gt_relleaves'
353
+ ])
354
+ ],
355
+ min_pos_relation=1,
356
+ class_agnostic=True),
357
+ dict(
358
+ type='VMRDAffordDataset',
359
+ ann_file='data/vmrd/ImageSets/Main/trainval.txt',
360
+ img_prefix='data/vmrd/',
361
+ pipeline=[
362
+ dict(type='LoadImageFromFile', to_float32=True),
363
+ dict(
364
+ type='LoadAnnotationsCustom',
365
+ keys=[
366
+ 'gt_bboxes', 'gt_labels', 'gt_relchilds',
367
+ 'gt_relleaves'
368
+ ]),
369
+ dict(type='RandomFlip', flip_ratio=0.5),
370
+ dict(type='PhotoMetricDistortion'),
371
+ dict(type='Expand', mean=[123.675, 116.28, 103.53]),
372
+ dict(
373
+ type='Resize', img_scale=(1000, 600), keep_ratio=True),
374
+ dict(
375
+ type='Normalize',
376
+ mean=[123.675, 116.28, 103.53],
377
+ std=[1.0, 1.0, 1.0],
378
+ to_rgb=True),
379
+ dict(type='Pad', size_divisor=32),
380
+ dict(
381
+ type='DefaultFormatBundleCustom',
382
+ keys=[
383
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
384
+ 'gt_relleaves'
385
+ ]),
386
+ dict(
387
+ type='Collect',
388
+ keys=[
389
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
390
+ 'gt_relleaves'
391
+ ])
392
+ ],
393
+ class_agnostic=True),
394
+ dict(
395
+ type='VRDAffordDataset',
396
+ data_root='data/vrd/',
397
+ ann_file='sg_dataset/sg_train_annotations.json',
398
+ img_prefix='sg_dataset/sg_train_images/',
399
+ pipeline=[
400
+ dict(type='LoadImageFromFile', to_float32=True),
401
+ dict(
402
+ type='LoadAnnotationsCustom',
403
+ keys=[
404
+ 'gt_bboxes', 'gt_labels', 'gt_relchilds',
405
+ 'gt_relleaves'
406
+ ]),
407
+ dict(type='RandomFlip', flip_ratio=0.5),
408
+ dict(
409
+ type='Resize', img_scale=(1000, 600), keep_ratio=True),
410
+ dict(
411
+ type='Normalize',
412
+ mean=[123.675, 116.28, 103.53],
413
+ std=[1.0, 1.0, 1.0],
414
+ to_rgb=True),
415
+ dict(type='Pad', size_divisor=32),
416
+ dict(
417
+ type='DefaultFormatBundleCustom',
418
+ keys=[
419
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
420
+ 'gt_relleaves'
421
+ ]),
422
+ dict(
423
+ type='Collect',
424
+ keys=[
425
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
426
+ 'gt_relleaves'
427
+ ])
428
+ ],
429
+ class_agnostic=True),
430
+ dict(
431
+ type='VGAffordDataset',
432
+ data_root='data/vg/downloads',
433
+ ann_file='relationships.json',
434
+ img_prefix='',
435
+ pipeline=[
436
+ dict(type='LoadImageFromFile', to_float32=True),
437
+ dict(
438
+ type='LoadAnnotationsCustom',
439
+ keys=[
440
+ 'gt_bboxes', 'gt_labels', 'gt_relchilds',
441
+ 'gt_relleaves'
442
+ ]),
443
+ dict(type='RandomFlip', flip_ratio=0.5),
444
+ dict(
445
+ type='Resize', img_scale=(1000, 600), keep_ratio=True),
446
+ dict(
447
+ type='Normalize',
448
+ mean=[123.675, 116.28, 103.53],
449
+ std=[1.0, 1.0, 1.0],
450
+ to_rgb=True),
451
+ dict(type='Pad', size_divisor=32),
452
+ dict(
453
+ type='DefaultFormatBundleCustom',
454
+ keys=[
455
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
456
+ 'gt_relleaves'
457
+ ]),
458
+ dict(
459
+ type='Collect',
460
+ keys=[
461
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
462
+ 'gt_relleaves'
463
+ ])
464
+ ],
465
+ class_agnostic=True)
466
+ ],
467
+ separate_eval=True,
468
+ class_agnostic=True),
469
+ val=dict(
470
+ _delete_=True,
471
+ type='ConcatDataset',
472
+ datasets=[
473
+ dict(
474
+ type='REGRADAffordDataset',
475
+ data_root='data/regrad/',
476
+ using_depth=False,
477
+ using_gt_proposals=True,
478
+ meta_info_file='dataset_seen_val_1k/meta_infos.json',
479
+ ann_file='dataset_seen_val_1k/objects.json',
480
+ img_prefix='dataset_seen_val_1k/RGBImages',
481
+ seg_prefix='dataset_seen_val_1k/SegmentationImages',
482
+ depth_prefix='dataset_seen_val_1k/DepthImages',
483
+ test_mode=True,
484
+ pipeline=[
485
+ dict(type='LoadImageFromFile'),
486
+ dict(type='LoadRelationProposals'),
487
+ dict(
488
+ type='MultiScaleFlipAug',
489
+ img_scale=(1000, 600),
490
+ flip=False,
491
+ transforms=[
492
+ dict(type='Resize', keep_ratio=True),
493
+ dict(
494
+ type='Normalize',
495
+ mean=[123.675, 116.28, 103.53],
496
+ std=[1.0, 1.0, 1.0],
497
+ to_rgb=True),
498
+ dict(type='Pad', size_divisor=32),
499
+ dict(type='ImageToTensor', keys=['img']),
500
+ dict(
501
+ type='Collect',
502
+ keys=['img', 'relation_proposals'])
503
+ ])
504
+ ],
505
+ class_agnostic=True,
506
+ max_sample_num=1000),
507
+ dict(
508
+ type='VMRDAffordDataset',
509
+ ann_file='data/vmrd/ImageSets/Main/test.txt',
510
+ img_prefix='data/vmrd/',
511
+ using_gt_proposals=True,
512
+ pipeline=[
513
+ dict(type='LoadImageFromFile'),
514
+ dict(type='LoadRelationProposals'),
515
+ dict(
516
+ type='MultiScaleFlipAug',
517
+ img_scale=(1000, 600),
518
+ flip=False,
519
+ transforms=[
520
+ dict(type='Resize', keep_ratio=True),
521
+ dict(
522
+ type='Normalize',
523
+ mean=[123.675, 116.28, 103.53],
524
+ std=[1.0, 1.0, 1.0],
525
+ to_rgb=True),
526
+ dict(type='Pad', size_divisor=32),
527
+ dict(type='ImageToTensor', keys=['img']),
528
+ dict(
529
+ type='Collect',
530
+ keys=['img', 'relation_proposals'])
531
+ ])
532
+ ],
533
+ class_agnostic=True)
534
+ ],
535
+ separate_eval=True,
536
+ class_agnostic=True),
537
+ test=dict(
538
+ _delete_=True,
539
+ type='ConcatDataset',
540
+ datasets=[
541
+ dict(
542
+ type='REGRADAffordDataset',
543
+ data_root='data/regrad/',
544
+ using_depth=False,
545
+ using_gt_proposals=True,
546
+ meta_info_file='dataset_seen_val_1k/meta_infos.json',
547
+ ann_file='dataset_seen_val_1k/objects.json',
548
+ img_prefix='dataset_seen_val_1k/RGBImages',
549
+ seg_prefix='dataset_seen_val_1k/SegmentationImages',
550
+ depth_prefix='dataset_seen_val_1k/DepthImages',
551
+ test_mode=True,
552
+ pipeline=[
553
+ dict(type='LoadImageFromFile'),
554
+ dict(type='LoadRelationProposals'),
555
+ dict(
556
+ type='MultiScaleFlipAug',
557
+ img_scale=(1000, 600),
558
+ flip=False,
559
+ transforms=[
560
+ dict(type='Resize', keep_ratio=True),
561
+ dict(
562
+ type='Normalize',
563
+ mean=[123.675, 116.28, 103.53],
564
+ std=[1.0, 1.0, 1.0],
565
+ to_rgb=True),
566
+ dict(type='Pad', size_divisor=32),
567
+ dict(type='ImageToTensor', keys=['img']),
568
+ dict(
569
+ type='Collect',
570
+ keys=['img', 'relation_proposals'])
571
+ ])
572
+ ],
573
+ class_agnostic=True,
574
+ max_sample_num=1000),
575
+ dict(
576
+ type='VMRDAffordDataset',
577
+ ann_file='data/vmrd/ImageSets/Main/test.txt',
578
+ img_prefix='data/vmrd/',
579
+ using_gt_proposals=True,
580
+ pipeline=[
581
+ dict(type='LoadImageFromFile'),
582
+ dict(type='LoadRelationProposals'),
583
+ dict(
584
+ type='MultiScaleFlipAug',
585
+ img_scale=(1000, 600),
586
+ flip=False,
587
+ transforms=[
588
+ dict(type='Resize', keep_ratio=True),
589
+ dict(
590
+ type='Normalize',
591
+ mean=[123.675, 116.28, 103.53],
592
+ std=[1.0, 1.0, 1.0],
593
+ to_rgb=True),
594
+ dict(type='Pad', size_divisor=32),
595
+ dict(type='ImageToTensor', keys=['img']),
596
+ dict(
597
+ type='Collect',
598
+ keys=['img', 'relation_proposals'])
599
+ ])
600
+ ],
601
+ class_agnostic=True)
602
+ ],
603
+ separate_eval=True,
604
+ class_agnostic=True),
605
+ samples_per_gpu=4,
606
+ workers_per_gpu=2)
607
+ evaluation = dict(interval=1, metric=['mAP', 'ImgAcc'])
608
+ optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001)
609
+ optimizer_config = dict(grad_clip=dict(max_norm=100, norm_type=2))
610
+ lr_config = dict(
611
+ policy='step',
612
+ warmup='linear',
613
+ warmup_iters=4000,
614
+ warmup_ratio=0.001,
615
+ step=[12, 18])
616
+ runner = dict(type='EpochBasedRunner', max_epochs=20)
617
+ checkpoint_config = dict(interval=1, max_keep_ckpts=5)
618
+ log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
619
+ dist_params = dict(backend='nccl')
620
+ log_level = 'INFO'
621
+ load_from = None
622
+ resume_from = None
623
+ workflow = [('train', 1)]
624
+ opencv_num_threads = 0
625
+ mp_start_method = 'fork'
626
+ auto_scale_lr = dict(enable=False, base_batch_size=16)
627
+ mmdet = None
628
+ mmdet_root = '/data/home/hanbo/projects/cloud_services/service/vmrn/vmrn_models/mmdetection/mmdet'
629
+ test_with_object_detector = False
630
+ test_crop_config = (174, 79, 462, 372)
631
+ kinect_img_pipeline = [
632
+ dict(type='LoadImageFromFile'),
633
+ dict(type='LoadRelationProposals'),
634
+ dict(
635
+ type='FixedCrop',
636
+ crop_type='absolute',
637
+ top_left=(174, 79),
638
+ bottom_right=(462, 372)),
639
+ dict(
640
+ type='MultiScaleFlipAug',
641
+ img_scale=(1000, 600),
642
+ flip=False,
643
+ transforms=[
644
+ dict(type='Resize', keep_ratio=True),
645
+ dict(
646
+ type='Normalize',
647
+ mean=[123.675, 116.28, 103.53],
648
+ std=[1.0, 1.0, 1.0],
649
+ to_rgb=True),
650
+ dict(type='Pad', size_divisor=32),
651
+ dict(type='ImageToTensor', keys=['img']),
652
+ dict(type='Collect', keys=['img', 'relation_proposals'])
653
+ ])
654
+ ]
655
+ seen_val_dataset = dict(
656
+ type='REGRADAffordDataset',
657
+ data_root='data/regrad/',
658
+ using_depth=False,
659
+ using_gt_proposals=True,
660
+ meta_info_file='dataset_seen_val_1k/meta_infos.json',
661
+ ann_file='dataset_seen_val_1k/objects.json',
662
+ img_prefix='dataset_seen_val_1k/RGBImages',
663
+ seg_prefix='dataset_seen_val_1k/SegmentationImages',
664
+ depth_prefix='dataset_seen_val_1k/DepthImages',
665
+ test_mode=True,
666
+ pipeline=[
667
+ dict(type='LoadImageFromFile'),
668
+ dict(type='LoadRelationProposals'),
669
+ dict(
670
+ type='MultiScaleFlipAug',
671
+ img_scale=(1000, 600),
672
+ flip=False,
673
+ transforms=[
674
+ dict(type='Resize', keep_ratio=True),
675
+ dict(
676
+ type='Normalize',
677
+ mean=[123.675, 116.28, 103.53],
678
+ std=[1.0, 1.0, 1.0],
679
+ to_rgb=True),
680
+ dict(type='Pad', size_divisor=32),
681
+ dict(type='ImageToTensor', keys=['img']),
682
+ dict(type='Collect', keys=['img', 'relation_proposals'])
683
+ ])
684
+ ],
685
+ class_agnostic=True,
686
+ max_sample_num=1000)
687
+ unseen_val_dataset = dict(
688
+ type='REGRADAffordDataset',
689
+ data_root='data/regrad/',
690
+ using_depth=False,
691
+ using_gt_proposals=True,
692
+ meta_info_file='dataset_unseen_val_1k/meta_infos.json',
693
+ ann_file='dataset_unseen_val_1k/objects.json',
694
+ img_prefix='dataset_unseen_val_1k/RGBImages',
695
+ seg_prefix='dataset_unseen_val_1k/SegmentationImages',
696
+ depth_prefix='dataset_unseen_val_1k/DepthImages',
697
+ test_mode=True,
698
+ pipeline=[
699
+ dict(type='LoadImageFromFile'),
700
+ dict(type='LoadRelationProposals'),
701
+ dict(
702
+ type='MultiScaleFlipAug',
703
+ img_scale=(1000, 600),
704
+ flip=False,
705
+ transforms=[
706
+ dict(type='Resize', keep_ratio=True),
707
+ dict(
708
+ type='Normalize',
709
+ mean=[123.675, 116.28, 103.53],
710
+ std=[1.0, 1.0, 1.0],
711
+ to_rgb=True),
712
+ dict(type='Pad', size_divisor=32),
713
+ dict(type='ImageToTensor', keys=['img']),
714
+ dict(type='Collect', keys=['img', 'relation_proposals'])
715
+ ])
716
+ ],
717
+ class_agnostic=True,
718
+ max_sample_num=1000)
719
+ real_val_dataset = dict(
720
+ type='REGRADAffordDataset',
721
+ data_root='data/regrad/',
722
+ using_depth=False,
723
+ using_gt_proposals=True,
724
+ meta_info_file='real/meta_infos.json',
725
+ ann_file='real/objects.json',
726
+ img_prefix='real/RGBImages',
727
+ img_suffix='png',
728
+ depth_prefix='real/DepthImages',
729
+ test_mode=True,
730
+ test_gt_bbox_offset=(174, 79),
731
+ pipeline=[
732
+ dict(type='LoadImageFromFile'),
733
+ dict(type='LoadRelationProposals'),
734
+ dict(
735
+ type='FixedCrop',
736
+ crop_type='absolute',
737
+ top_left=(174, 79),
738
+ bottom_right=(462, 372)),
739
+ dict(
740
+ type='MultiScaleFlipAug',
741
+ img_scale=(1000, 600),
742
+ flip=False,
743
+ transforms=[
744
+ dict(type='Resize', keep_ratio=True),
745
+ dict(
746
+ type='Normalize',
747
+ mean=[123.675, 116.28, 103.53],
748
+ std=[1.0, 1.0, 1.0],
749
+ to_rgb=True),
750
+ dict(type='Pad', size_divisor=32),
751
+ dict(type='ImageToTensor', keys=['img']),
752
+ dict(type='Collect', keys=['img', 'relation_proposals'])
753
+ ])
754
+ ],
755
+ class_agnostic=True)
756
+ regrad_datatype = 'REGRADAffordDataset'
757
+ regrad_root = 'data/regrad/'
758
+ vmrd_datatype = 'VMRDAffordDataset'
759
+ vmrd_root = 'data/vmrd/'
760
+ vmrd_train = dict(
761
+ type='VMRDAffordDataset',
762
+ ann_file='data/vmrd/ImageSets/Main/trainval.txt',
763
+ img_prefix='data/vmrd/',
764
+ pipeline=[
765
+ dict(type='LoadImageFromFile', to_float32=True),
766
+ dict(
767
+ type='LoadAnnotationsCustom',
768
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
769
+ dict(type='RandomFlip', flip_ratio=0.5),
770
+ dict(type='PhotoMetricDistortion'),
771
+ dict(type='Expand', mean=[123.675, 116.28, 103.53]),
772
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
773
+ dict(
774
+ type='Normalize',
775
+ mean=[123.675, 116.28, 103.53],
776
+ std=[1.0, 1.0, 1.0],
777
+ to_rgb=True),
778
+ dict(type='Pad', size_divisor=32),
779
+ dict(
780
+ type='DefaultFormatBundleCustom',
781
+ keys=[
782
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
783
+ ]),
784
+ dict(
785
+ type='Collect',
786
+ keys=[
787
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
788
+ ])
789
+ ],
790
+ class_agnostic=True)
791
+ regrad_train = dict(
792
+ type='REGRADAffordDataset',
793
+ data_root='data/regrad/',
794
+ meta_info_file='dataset_train_5k/meta_infos.json',
795
+ ann_file='dataset_train_5k/objects.json',
796
+ img_prefix='dataset_train_5k/RGBImages',
797
+ seg_prefix='dataset_train_5k/SegmentationImages',
798
+ depth_prefix='dataset_train_5k/DepthImages',
799
+ pipeline=[
800
+ dict(type='LoadImageFromFile', to_float32=True),
801
+ dict(
802
+ type='LoadAnnotationsCustom',
803
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
804
+ dict(type='RandomFlip', flip_ratio=0.5),
805
+ dict(type='PhotoMetricDistortion'),
806
+ dict(
807
+ type='RandomCrop',
808
+ crop_type='random_keep',
809
+ allow_negative_crop=False),
810
+ dict(type='Expand', mean=[123.675, 116.28, 103.53]),
811
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
812
+ dict(
813
+ type='Normalize',
814
+ mean=[123.675, 116.28, 103.53],
815
+ std=[1.0, 1.0, 1.0],
816
+ to_rgb=True),
817
+ dict(type='Pad', size_divisor=32),
818
+ dict(
819
+ type='DefaultFormatBundleCustom',
820
+ keys=[
821
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
822
+ ]),
823
+ dict(
824
+ type='Collect',
825
+ keys=[
826
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
827
+ ])
828
+ ],
829
+ min_pos_relation=1,
830
+ class_agnostic=True)
831
+ metagraspnet_sim_train = dict(
832
+ type='MetaGraspNetAffordDataset',
833
+ data_root='data/metagraspnet/sim/',
834
+ meta_info_file='meta_infos_train.json',
835
+ pipeline=[
836
+ dict(type='LoadImageFromFile', to_float32=True),
837
+ dict(
838
+ type='LoadAnnotationsCustom',
839
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
840
+ dict(type='RandomFlip', flip_ratio=0.5),
841
+ dict(type='PhotoMetricDistortion'),
842
+ dict(
843
+ type='RandomCrop',
844
+ crop_type='random_keep',
845
+ allow_negative_crop=False),
846
+ dict(
847
+ type='Expand', mean=[123.675, 116.28, 103.53], ratio_range=(1, 2)),
848
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
849
+ dict(
850
+ type='Normalize',
851
+ mean=[123.675, 116.28, 103.53],
852
+ std=[1.0, 1.0, 1.0],
853
+ to_rgb=True),
854
+ dict(type='Pad', size_divisor=32),
855
+ dict(
856
+ type='DefaultFormatBundleCustom',
857
+ keys=[
858
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
859
+ ]),
860
+ dict(
861
+ type='Collect',
862
+ keys=[
863
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
864
+ ])
865
+ ],
866
+ min_pos_relation=1,
867
+ class_agnostic=True)
868
+ vgvrd_train_pipeline = [
869
+ dict(type='LoadImageFromFile', to_float32=True),
870
+ dict(
871
+ type='LoadAnnotationsCustom',
872
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
873
+ dict(type='RandomFlip', flip_ratio=0.5),
874
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
875
+ dict(
876
+ type='Normalize',
877
+ mean=[123.675, 116.28, 103.53],
878
+ std=[1.0, 1.0, 1.0],
879
+ to_rgb=True),
880
+ dict(type='Pad', size_divisor=32),
881
+ dict(
882
+ type='DefaultFormatBundleCustom',
883
+ keys=['img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
884
+ 'gt_relleaves']),
885
+ dict(
886
+ type='Collect',
887
+ keys=['img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'])
888
+ ]
889
+ vrd_train = dict(
890
+ type='VRDAffordDataset',
891
+ data_root='data/vrd/',
892
+ ann_file='sg_dataset/sg_train_annotations.json',
893
+ img_prefix='sg_dataset/sg_train_images/',
894
+ pipeline=[
895
+ dict(type='LoadImageFromFile', to_float32=True),
896
+ dict(
897
+ type='LoadAnnotationsCustom',
898
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
899
+ dict(type='RandomFlip', flip_ratio=0.5),
900
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
901
+ dict(
902
+ type='Normalize',
903
+ mean=[123.675, 116.28, 103.53],
904
+ std=[1.0, 1.0, 1.0],
905
+ to_rgb=True),
906
+ dict(type='Pad', size_divisor=32),
907
+ dict(
908
+ type='DefaultFormatBundleCustom',
909
+ keys=[
910
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
911
+ ]),
912
+ dict(
913
+ type='Collect',
914
+ keys=[
915
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
916
+ ])
917
+ ],
918
+ class_agnostic=True)
919
+ vg_train = dict(
920
+ type='VGAffordDataset',
921
+ data_root='data/vg/downloads',
922
+ ann_file='relationships.json',
923
+ img_prefix='',
924
+ pipeline=[
925
+ dict(type='LoadImageFromFile', to_float32=True),
926
+ dict(
927
+ type='LoadAnnotationsCustom',
928
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
929
+ dict(type='RandomFlip', flip_ratio=0.5),
930
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
931
+ dict(
932
+ type='Normalize',
933
+ mean=[123.675, 116.28, 103.53],
934
+ std=[1.0, 1.0, 1.0],
935
+ to_rgb=True),
936
+ dict(type='Pad', size_divisor=32),
937
+ dict(
938
+ type='DefaultFormatBundleCustom',
939
+ keys=[
940
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
941
+ ]),
942
+ dict(
943
+ type='Collect',
944
+ keys=[
945
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
946
+ ])
947
+ ],
948
+ class_agnostic=True)
949
+ real_test_pipeline = [
950
+ dict(type='LoadImageFromFile'),
951
+ dict(type='LoadRelationProposals'),
952
+ dict(
953
+ type='FixedCrop',
954
+ crop_type='absolute',
955
+ top_left=(174, 79),
956
+ bottom_right=(462, 372)),
957
+ dict(
958
+ type='MultiScaleFlipAug',
959
+ img_scale=(1000, 600),
960
+ flip=False,
961
+ transforms=[
962
+ dict(type='Resize', keep_ratio=True),
963
+ dict(
964
+ type='Normalize',
965
+ mean=[123.675, 116.28, 103.53],
966
+ std=[1.0, 1.0, 1.0],
967
+ to_rgb=True),
968
+ dict(type='Pad', size_divisor=32),
969
+ dict(type='ImageToTensor', keys=['img']),
970
+ dict(type='Collect', keys=['img', 'relation_proposals'])
971
+ ])
972
+ ]
973
+ regrad_seen_val_dataset = dict(
974
+ type='REGRADAffordDataset',
975
+ data_root='data/regrad/',
976
+ using_depth=False,
977
+ using_gt_proposals=True,
978
+ meta_info_file='dataset_seen_val_1k/meta_infos.json',
979
+ ann_file='dataset_seen_val_1k/objects.json',
980
+ img_prefix='dataset_seen_val_1k/RGBImages',
981
+ seg_prefix='dataset_seen_val_1k/SegmentationImages',
982
+ depth_prefix='dataset_seen_val_1k/DepthImages',
983
+ test_mode=True,
984
+ pipeline=[
985
+ dict(type='LoadImageFromFile'),
986
+ dict(type='LoadRelationProposals'),
987
+ dict(
988
+ type='MultiScaleFlipAug',
989
+ img_scale=(1000, 600),
990
+ flip=False,
991
+ transforms=[
992
+ dict(type='Resize', keep_ratio=True),
993
+ dict(
994
+ type='Normalize',
995
+ mean=[123.675, 116.28, 103.53],
996
+ std=[1.0, 1.0, 1.0],
997
+ to_rgb=True),
998
+ dict(type='Pad', size_divisor=32),
999
+ dict(type='ImageToTensor', keys=['img']),
1000
+ dict(type='Collect', keys=['img', 'relation_proposals'])
1001
+ ])
1002
+ ],
1003
+ class_agnostic=True,
1004
+ max_sample_num=1000)
1005
+ regrad_unseen_val_dataset = dict(
1006
+ type='REGRADAffordDataset',
1007
+ data_root='data/regrad/',
1008
+ using_depth=False,
1009
+ using_gt_proposals=True,
1010
+ meta_info_file='dataset_unseen_val_1k/meta_infos.json',
1011
+ ann_file='dataset_unseen_val_1k/objects.json',
1012
+ img_prefix='dataset_unseen_val_1k/RGBImages',
1013
+ seg_prefix='dataset_unseen_val_1k/SegmentationImages',
1014
+ depth_prefix='dataset_unseen_val_1k/DepthImages',
1015
+ test_mode=True,
1016
+ pipeline=[
1017
+ dict(type='LoadImageFromFile'),
1018
+ dict(type='LoadRelationProposals'),
1019
+ dict(
1020
+ type='MultiScaleFlipAug',
1021
+ img_scale=(1000, 600),
1022
+ flip=False,
1023
+ transforms=[
1024
+ dict(type='Resize', keep_ratio=True),
1025
+ dict(
1026
+ type='Normalize',
1027
+ mean=[123.675, 116.28, 103.53],
1028
+ std=[1.0, 1.0, 1.0],
1029
+ to_rgb=True),
1030
+ dict(type='Pad', size_divisor=32),
1031
+ dict(type='ImageToTensor', keys=['img']),
1032
+ dict(type='Collect', keys=['img', 'relation_proposals'])
1033
+ ])
1034
+ ],
1035
+ class_agnostic=True,
1036
+ max_sample_num=1000)
1037
+ regrad_real_val_dataset = dict(
1038
+ type='REGRADAffordDataset',
1039
+ data_root='data/regrad/',
1040
+ using_depth=False,
1041
+ using_gt_proposals=True,
1042
+ meta_info_file='real/meta_infos.json',
1043
+ ann_file='real/objects.json',
1044
+ img_prefix='real/RGBImages',
1045
+ img_suffix='png',
1046
+ depth_prefix='real/DepthImages',
1047
+ test_mode=True,
1048
+ test_gt_bbox_offset=(174, 79),
1049
+ pipeline=[
1050
+ dict(type='LoadImageFromFile'),
1051
+ dict(type='LoadRelationProposals'),
1052
+ dict(
1053
+ type='FixedCrop',
1054
+ crop_type='absolute',
1055
+ top_left=(174, 79),
1056
+ bottom_right=(462, 372)),
1057
+ dict(
1058
+ type='MultiScaleFlipAug',
1059
+ img_scale=(1000, 600),
1060
+ flip=False,
1061
+ transforms=[
1062
+ dict(type='Resize', keep_ratio=True),
1063
+ dict(
1064
+ type='Normalize',
1065
+ mean=[123.675, 116.28, 103.53],
1066
+ std=[1.0, 1.0, 1.0],
1067
+ to_rgb=True),
1068
+ dict(type='Pad', size_divisor=32),
1069
+ dict(type='ImageToTensor', keys=['img']),
1070
+ dict(type='Collect', keys=['img', 'relation_proposals'])
1071
+ ])
1072
+ ],
1073
+ class_agnostic=True)
1074
+ vmrd_val_dataset = dict(
1075
+ type='VMRDAffordDataset',
1076
+ ann_file='data/vmrd/ImageSets/Main/test.txt',
1077
+ img_prefix='data/vmrd/',
1078
+ using_gt_proposals=True,
1079
+ pipeline=[
1080
+ dict(type='LoadImageFromFile'),
1081
+ dict(type='LoadRelationProposals'),
1082
+ dict(
1083
+ type='MultiScaleFlipAug',
1084
+ img_scale=(1000, 600),
1085
+ flip=False,
1086
+ transforms=[
1087
+ dict(type='Resize', keep_ratio=True),
1088
+ dict(
1089
+ type='Normalize',
1090
+ mean=[123.675, 116.28, 103.53],
1091
+ std=[1.0, 1.0, 1.0],
1092
+ to_rgb=True),
1093
+ dict(type='Pad', size_divisor=32),
1094
+ dict(type='ImageToTensor', keys=['img']),
1095
+ dict(type='Collect', keys=['img', 'relation_proposals'])
1096
+ ])
1097
+ ],
1098
+ class_agnostic=True)
1099
+ train_sampler = dict(
1100
+ type='DistributedWeightedSampler',
1101
+ weights=[0.1, 0.1, 0.05, 0.05, 0.7],
1102
+ sample_per_epoch=150000,
1103
+ shuffle=True)
1104
+ work_dir = './work_dirs/relation_afford_r101_caffe_c4_1x_regrad_vmrd_metagraspnet_vrd_vg_class_agnostic'
1105
+ gpu_ids = range(0, 8)
1106
+
20240921_173230.log.json ADDED
The diff for this file is too large to render. See raw diff
 
epoch_16.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57843004a514095bbedf8f99328d2a2f76919a3619ebd321d1ede68026731940
3
+ size 909495892
epoch_17.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3734323578d29b58043eecfc969f726e789ad029b92d52a8b4702a7b25c1e64a
3
+ size 909495892
epoch_18.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6c786bd32aa51e14988b63467163e29941d976cc5a6b9dde100468c5b3ecd98
3
+ size 909495892
epoch_19.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01b04bbb288a755d88ab3b7321e71c1b7a36b205e5857a8bd8c9829a21e0bc08
3
+ size 909495892
epoch_20.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3bccd494e3ac72b27b8bf3fa27f864aa0d0305bee772e20467b9ea7fb51117f1
3
+ size 909495892
relation_afford_r101_caffe_c4_1x_regrad_vmrd_metagraspnet_vrd_vg_class_agnostic.py ADDED
@@ -0,0 +1,1070 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ norm_cfg = dict(
2
+ type='BN',
3
+ requires_grad=False,
4
+ mean=[123.675, 116.28, 103.53],
5
+ std=[1.0, 1.0, 1.0],
6
+ to_rgb=True)
7
+ model = dict(
8
+ type='FasterRCNNRelAfford',
9
+ backbone=dict(
10
+ type='mmdet.ResNet',
11
+ depth=101,
12
+ num_stages=3,
13
+ strides=(1, 2, 2),
14
+ dilations=(1, 1, 1),
15
+ out_indices=(2, ),
16
+ frozen_stages=1,
17
+ norm_cfg=dict(type='BN', requires_grad=False),
18
+ norm_eval=True,
19
+ style='caffe',
20
+ init_cfg=dict(
21
+ type='Pretrained',
22
+ checkpoint='open-mmlab://detectron2/resnet101_caffe')),
23
+ rpn_head=dict(
24
+ type='mmdet.RPNHead',
25
+ in_channels=1024,
26
+ feat_channels=1024,
27
+ anchor_generator=dict(
28
+ type='AnchorGenerator',
29
+ scales=[8, 16, 32],
30
+ ratios=[0.33, 0.5, 1.0, 2.0, 3.0],
31
+ strides=[16]),
32
+ bbox_coder=dict(
33
+ type='DeltaXYWHBBoxCoder',
34
+ target_means=[0.0, 0.0, 0.0, 0.0],
35
+ target_stds=[1.0, 1.0, 1.0, 1.0]),
36
+ loss_cls=dict(
37
+ type='mmdet.CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
38
+ loss_bbox=dict(type='mmdet.L1Loss', loss_weight=1.0)),
39
+ roi_head=None,
40
+ child_head=dict(
41
+ type='invigorate.PairedRoIHead',
42
+ shared_head=dict(
43
+ type='invigorate.PairedResLayer',
44
+ depth=50,
45
+ stage=3,
46
+ stride=1,
47
+ style='caffe',
48
+ norm_eval=False,
49
+ share_weights=False),
50
+ paired_roi_extractor=dict(
51
+ type='invigorate.VMRNPairedRoIExtractor',
52
+ roi_layer=dict(type='RoIPool', output_size=7),
53
+ out_channels=1024,
54
+ featmap_strides=[16]),
55
+ relation_head=dict(
56
+ type='invigorate.BBoxPairHead',
57
+ with_avg_pool=True,
58
+ roi_feat_size=7,
59
+ in_channels=2048,
60
+ num_relations=1,
61
+ loss_cls=dict(
62
+ type='mmdet.CrossEntropyLoss',
63
+ use_sigmoid=False,
64
+ loss_weight=1.0))),
65
+ leaf_head=dict(
66
+ type='mmdet.StandardRoIHead',
67
+ shared_head=dict(
68
+ type='mmdet.ResLayer',
69
+ depth=50,
70
+ stage=3,
71
+ stride=1,
72
+ style='caffe',
73
+ norm_cfg=dict(type='BN', requires_grad=False),
74
+ norm_eval=True),
75
+ bbox_roi_extractor=dict(
76
+ type='mmdet.SingleRoIExtractor',
77
+ roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
78
+ out_channels=1024,
79
+ featmap_strides=[16]),
80
+ bbox_head=dict(
81
+ type='mmdet.BBoxHead',
82
+ with_avg_pool=True,
83
+ with_reg=False,
84
+ roi_feat_size=7,
85
+ in_channels=2048,
86
+ num_classes=2,
87
+ loss_cls=dict(
88
+ type='mmdet.CrossEntropyLoss',
89
+ use_sigmoid=False,
90
+ loss_weight=1.0))),
91
+ train_cfg=dict(
92
+ rpn=dict(
93
+ assigner=dict(
94
+ type='MaxIoUAssigner',
95
+ pos_iou_thr=0.7,
96
+ neg_iou_thr=0.3,
97
+ min_pos_iou=0.3,
98
+ match_low_quality=True,
99
+ ignore_iof_thr=-1),
100
+ sampler=dict(
101
+ type='RandomSampler',
102
+ num=256,
103
+ pos_fraction=0.5,
104
+ neg_pos_ub=-1,
105
+ add_gt_as_proposals=False),
106
+ allowed_border=0,
107
+ pos_weight=-1,
108
+ debug=False),
109
+ rpn_proposal=dict(
110
+ nms_pre=12000,
111
+ max_per_img=2000,
112
+ nms=dict(type='nms', iou_threshold=0.7),
113
+ min_bbox_size=0),
114
+ rcnn=dict(
115
+ assigner=dict(
116
+ type='MaxIoUAssigner',
117
+ pos_iou_thr=0.5,
118
+ neg_iou_thr=0.5,
119
+ min_pos_iou=0.5,
120
+ match_low_quality=False,
121
+ ignore_iof_thr=-1),
122
+ sampler=dict(
123
+ type='RandomSampler',
124
+ num=256,
125
+ pos_fraction=0.25,
126
+ neg_pos_ub=-1,
127
+ add_gt_as_proposals=True),
128
+ pos_weight=-1,
129
+ debug=False),
130
+ child_head=dict(
131
+ assigner=dict(
132
+ type='MaxIoUAssigner',
133
+ pos_iou_thr=0.7,
134
+ neg_iou_thr=0.5,
135
+ min_pos_iou=0.7,
136
+ match_low_quality=False,
137
+ ignore_iof_thr=-1),
138
+ relation_sampler=dict(
139
+ type='RandomRelationSampler',
140
+ num=32,
141
+ pos_fraction=0.5,
142
+ cls_ratio_ub=1.0,
143
+ add_gt_as_proposals=True,
144
+ num_relation_cls=1,
145
+ neg_id=0),
146
+ pos_weight=-1,
147
+ online_data=True,
148
+ online_start_iteration=0),
149
+ leaf_head=dict(
150
+ assigner=dict(
151
+ type='MaxIoUAssigner',
152
+ pos_iou_thr=0.5,
153
+ neg_iou_thr=0.5,
154
+ min_pos_iou=0.5,
155
+ match_low_quality=False,
156
+ ignore_iof_thr=-1),
157
+ sampler=dict(
158
+ type='RandomSampler',
159
+ num=64,
160
+ pos_fraction=0.25,
161
+ neg_pos_ub=3.0,
162
+ add_gt_as_proposals=True),
163
+ pos_weight=-1,
164
+ debug=False)),
165
+ test_cfg=dict(
166
+ rpn=dict(
167
+ nms_pre=6000,
168
+ max_per_img=300,
169
+ nms=dict(type='nms', iou_threshold=0.7),
170
+ min_bbox_size=0),
171
+ rcnn=dict(
172
+ score_thr=0.05,
173
+ nms=dict(type='nms', iou_threshold=0.3),
174
+ max_per_img=100),
175
+ child_head=dict(
176
+ bbox_score_thr=0.5, verbose_relation=False, average_scores=False),
177
+ leaf_head=dict(score_thr=0.5, nms=None, max_per_img=100)))
178
+ dataset_type = 'REGRADAffordDataset'
179
+ data_root = 'data/regrad/'
180
+ img_norm_cfg = dict(
181
+ mean=[123.675, 116.28, 103.53], std=[1.0, 1.0, 1.0], to_rgb=True)
182
+ train_pipeline = [
183
+ dict(type='LoadImageFromFile', to_float32=True),
184
+ dict(
185
+ type='LoadAnnotationsCustom',
186
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
187
+ dict(type='RandomFlip', flip_ratio=0.5),
188
+ dict(type='PhotoMetricDistortion'),
189
+ dict(
190
+ type='RandomCrop', crop_type='random_keep', allow_negative_crop=False),
191
+ dict(type='Expand', mean=[123.675, 116.28, 103.53], ratio_range=(1, 2)),
192
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
193
+ dict(
194
+ type='Normalize',
195
+ mean=[123.675, 116.28, 103.53],
196
+ std=[1.0, 1.0, 1.0],
197
+ to_rgb=True),
198
+ dict(type='Pad', size_divisor=32),
199
+ dict(
200
+ type='DefaultFormatBundleCustom',
201
+ keys=['img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
202
+ 'gt_relleaves']),
203
+ dict(
204
+ type='Collect',
205
+ keys=['img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'])
206
+ ]
207
+ test_pipeline = [
208
+ dict(type='LoadImageFromFile'),
209
+ dict(type='LoadRelationProposals'),
210
+ dict(
211
+ type='MultiScaleFlipAug',
212
+ img_scale=(1000, 600),
213
+ flip=False,
214
+ transforms=[
215
+ dict(type='Resize', keep_ratio=True),
216
+ dict(
217
+ type='Normalize',
218
+ mean=[123.675, 116.28, 103.53],
219
+ std=[1.0, 1.0, 1.0],
220
+ to_rgb=True),
221
+ dict(type='Pad', size_divisor=32),
222
+ dict(type='ImageToTensor', keys=['img']),
223
+ dict(type='Collect', keys=['img', 'relation_proposals'])
224
+ ])
225
+ ]
226
+ data = dict(
227
+ train=dict(
228
+ _delete_=True,
229
+ type='ConcatDataset',
230
+ datasets=[
231
+ dict(
232
+ type='REGRADAffordDataset',
233
+ data_root='data/regrad/',
234
+ meta_info_file='dataset_train_5k/meta_infos.json',
235
+ ann_file='dataset_train_5k/objects.json',
236
+ img_prefix='dataset_train_5k/RGBImages',
237
+ seg_prefix='dataset_train_5k/SegmentationImages',
238
+ depth_prefix='dataset_train_5k/DepthImages',
239
+ pipeline=[
240
+ dict(type='LoadImageFromFile', to_float32=True),
241
+ dict(
242
+ type='LoadAnnotationsCustom',
243
+ keys=[
244
+ 'gt_bboxes', 'gt_labels', 'gt_relchilds',
245
+ 'gt_relleaves'
246
+ ]),
247
+ dict(type='RandomFlip', flip_ratio=0.5),
248
+ dict(type='PhotoMetricDistortion'),
249
+ dict(
250
+ type='RandomCrop',
251
+ crop_type='random_keep',
252
+ allow_negative_crop=False),
253
+ dict(type='Expand', mean=[123.675, 116.28, 103.53]),
254
+ dict(
255
+ type='Resize', img_scale=(1000, 600), keep_ratio=True),
256
+ dict(
257
+ type='Normalize',
258
+ mean=[123.675, 116.28, 103.53],
259
+ std=[1.0, 1.0, 1.0],
260
+ to_rgb=True),
261
+ dict(type='Pad', size_divisor=32),
262
+ dict(
263
+ type='DefaultFormatBundleCustom',
264
+ keys=[
265
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
266
+ 'gt_relleaves'
267
+ ]),
268
+ dict(
269
+ type='Collect',
270
+ keys=[
271
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
272
+ 'gt_relleaves'
273
+ ])
274
+ ],
275
+ min_pos_relation=1,
276
+ class_agnostic=True),
277
+ dict(
278
+ type='MetaGraspNetAffordDataset',
279
+ data_root='data/metagraspnet/sim/',
280
+ meta_info_file='meta_infos_train.json',
281
+ pipeline=[
282
+ dict(type='LoadImageFromFile', to_float32=True),
283
+ dict(
284
+ type='LoadAnnotationsCustom',
285
+ keys=[
286
+ 'gt_bboxes', 'gt_labels', 'gt_relchilds',
287
+ 'gt_relleaves'
288
+ ]),
289
+ dict(type='RandomFlip', flip_ratio=0.5),
290
+ dict(type='PhotoMetricDistortion'),
291
+ dict(
292
+ type='RandomCrop',
293
+ crop_type='random_keep',
294
+ allow_negative_crop=False),
295
+ dict(
296
+ type='Expand',
297
+ mean=[123.675, 116.28, 103.53],
298
+ ratio_range=(1, 2)),
299
+ dict(
300
+ type='Resize', img_scale=(1000, 600), keep_ratio=True),
301
+ dict(
302
+ type='Normalize',
303
+ mean=[123.675, 116.28, 103.53],
304
+ std=[1.0, 1.0, 1.0],
305
+ to_rgb=True),
306
+ dict(type='Pad', size_divisor=32),
307
+ dict(
308
+ type='DefaultFormatBundleCustom',
309
+ keys=[
310
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
311
+ 'gt_relleaves'
312
+ ]),
313
+ dict(
314
+ type='Collect',
315
+ keys=[
316
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
317
+ 'gt_relleaves'
318
+ ])
319
+ ],
320
+ min_pos_relation=1,
321
+ class_agnostic=True),
322
+ dict(
323
+ type='VMRDAffordDataset',
324
+ ann_file='data/vmrd/ImageSets/Main/trainval.txt',
325
+ img_prefix='data/vmrd/',
326
+ pipeline=[
327
+ dict(type='LoadImageFromFile', to_float32=True),
328
+ dict(
329
+ type='LoadAnnotationsCustom',
330
+ keys=[
331
+ 'gt_bboxes', 'gt_labels', 'gt_relchilds',
332
+ 'gt_relleaves'
333
+ ]),
334
+ dict(type='RandomFlip', flip_ratio=0.5),
335
+ dict(type='PhotoMetricDistortion'),
336
+ dict(type='Expand', mean=[123.675, 116.28, 103.53]),
337
+ dict(
338
+ type='Resize', img_scale=(1000, 600), keep_ratio=True),
339
+ dict(
340
+ type='Normalize',
341
+ mean=[123.675, 116.28, 103.53],
342
+ std=[1.0, 1.0, 1.0],
343
+ to_rgb=True),
344
+ dict(type='Pad', size_divisor=32),
345
+ dict(
346
+ type='DefaultFormatBundleCustom',
347
+ keys=[
348
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
349
+ 'gt_relleaves'
350
+ ]),
351
+ dict(
352
+ type='Collect',
353
+ keys=[
354
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
355
+ 'gt_relleaves'
356
+ ])
357
+ ],
358
+ class_agnostic=True),
359
+ dict(
360
+ type='VRDAffordDataset',
361
+ data_root='data/vrd/',
362
+ ann_file='sg_dataset/sg_train_annotations.json',
363
+ img_prefix='sg_dataset/sg_train_images/',
364
+ pipeline=[
365
+ dict(type='LoadImageFromFile', to_float32=True),
366
+ dict(
367
+ type='LoadAnnotationsCustom',
368
+ keys=[
369
+ 'gt_bboxes', 'gt_labels', 'gt_relchilds',
370
+ 'gt_relleaves'
371
+ ]),
372
+ dict(type='RandomFlip', flip_ratio=0.5),
373
+ dict(
374
+ type='Resize', img_scale=(1000, 600), keep_ratio=True),
375
+ dict(
376
+ type='Normalize',
377
+ mean=[123.675, 116.28, 103.53],
378
+ std=[1.0, 1.0, 1.0],
379
+ to_rgb=True),
380
+ dict(type='Pad', size_divisor=32),
381
+ dict(
382
+ type='DefaultFormatBundleCustom',
383
+ keys=[
384
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
385
+ 'gt_relleaves'
386
+ ]),
387
+ dict(
388
+ type='Collect',
389
+ keys=[
390
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
391
+ 'gt_relleaves'
392
+ ])
393
+ ],
394
+ class_agnostic=True),
395
+ dict(
396
+ type='VGAffordDataset',
397
+ data_root='data/vg/downloads',
398
+ ann_file='relationships.json',
399
+ img_prefix='',
400
+ pipeline=[
401
+ dict(type='LoadImageFromFile', to_float32=True),
402
+ dict(
403
+ type='LoadAnnotationsCustom',
404
+ keys=[
405
+ 'gt_bboxes', 'gt_labels', 'gt_relchilds',
406
+ 'gt_relleaves'
407
+ ]),
408
+ dict(type='RandomFlip', flip_ratio=0.5),
409
+ dict(
410
+ type='Resize', img_scale=(1000, 600), keep_ratio=True),
411
+ dict(
412
+ type='Normalize',
413
+ mean=[123.675, 116.28, 103.53],
414
+ std=[1.0, 1.0, 1.0],
415
+ to_rgb=True),
416
+ dict(type='Pad', size_divisor=32),
417
+ dict(
418
+ type='DefaultFormatBundleCustom',
419
+ keys=[
420
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
421
+ 'gt_relleaves'
422
+ ]),
423
+ dict(
424
+ type='Collect',
425
+ keys=[
426
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
427
+ 'gt_relleaves'
428
+ ])
429
+ ],
430
+ class_agnostic=True)
431
+ ],
432
+ separate_eval=True,
433
+ class_agnostic=True),
434
+ val=dict(
435
+ _delete_=True,
436
+ type='ConcatDataset',
437
+ datasets=[
438
+ dict(
439
+ type='REGRADAffordDataset',
440
+ data_root='data/regrad/',
441
+ using_depth=False,
442
+ using_gt_proposals=True,
443
+ meta_info_file='dataset_seen_val_1k/meta_infos.json',
444
+ ann_file='dataset_seen_val_1k/objects.json',
445
+ img_prefix='dataset_seen_val_1k/RGBImages',
446
+ seg_prefix='dataset_seen_val_1k/SegmentationImages',
447
+ depth_prefix='dataset_seen_val_1k/DepthImages',
448
+ test_mode=True,
449
+ pipeline=[
450
+ dict(type='LoadImageFromFile'),
451
+ dict(type='LoadRelationProposals'),
452
+ dict(
453
+ type='MultiScaleFlipAug',
454
+ img_scale=(1000, 600),
455
+ flip=False,
456
+ transforms=[
457
+ dict(type='Resize', keep_ratio=True),
458
+ dict(
459
+ type='Normalize',
460
+ mean=[123.675, 116.28, 103.53],
461
+ std=[1.0, 1.0, 1.0],
462
+ to_rgb=True),
463
+ dict(type='Pad', size_divisor=32),
464
+ dict(type='ImageToTensor', keys=['img']),
465
+ dict(
466
+ type='Collect',
467
+ keys=['img', 'relation_proposals'])
468
+ ])
469
+ ],
470
+ class_agnostic=True,
471
+ max_sample_num=1000),
472
+ dict(
473
+ type='VMRDAffordDataset',
474
+ ann_file='data/vmrd/ImageSets/Main/test.txt',
475
+ img_prefix='data/vmrd/',
476
+ using_gt_proposals=True,
477
+ pipeline=[
478
+ dict(type='LoadImageFromFile'),
479
+ dict(type='LoadRelationProposals'),
480
+ dict(
481
+ type='MultiScaleFlipAug',
482
+ img_scale=(1000, 600),
483
+ flip=False,
484
+ transforms=[
485
+ dict(type='Resize', keep_ratio=True),
486
+ dict(
487
+ type='Normalize',
488
+ mean=[123.675, 116.28, 103.53],
489
+ std=[1.0, 1.0, 1.0],
490
+ to_rgb=True),
491
+ dict(type='Pad', size_divisor=32),
492
+ dict(type='ImageToTensor', keys=['img']),
493
+ dict(
494
+ type='Collect',
495
+ keys=['img', 'relation_proposals'])
496
+ ])
497
+ ],
498
+ class_agnostic=True)
499
+ ],
500
+ separate_eval=True,
501
+ class_agnostic=True),
502
+ test=dict(
503
+ _delete_=True,
504
+ type='ConcatDataset',
505
+ datasets=[
506
+ dict(
507
+ type='REGRADAffordDataset',
508
+ data_root='data/regrad/',
509
+ using_depth=False,
510
+ using_gt_proposals=True,
511
+ meta_info_file='dataset_seen_val_1k/meta_infos.json',
512
+ ann_file='dataset_seen_val_1k/objects.json',
513
+ img_prefix='dataset_seen_val_1k/RGBImages',
514
+ seg_prefix='dataset_seen_val_1k/SegmentationImages',
515
+ depth_prefix='dataset_seen_val_1k/DepthImages',
516
+ test_mode=True,
517
+ pipeline=[
518
+ dict(type='LoadImageFromFile'),
519
+ dict(type='LoadRelationProposals'),
520
+ dict(
521
+ type='MultiScaleFlipAug',
522
+ img_scale=(1000, 600),
523
+ flip=False,
524
+ transforms=[
525
+ dict(type='Resize', keep_ratio=True),
526
+ dict(
527
+ type='Normalize',
528
+ mean=[123.675, 116.28, 103.53],
529
+ std=[1.0, 1.0, 1.0],
530
+ to_rgb=True),
531
+ dict(type='Pad', size_divisor=32),
532
+ dict(type='ImageToTensor', keys=['img']),
533
+ dict(
534
+ type='Collect',
535
+ keys=['img', 'relation_proposals'])
536
+ ])
537
+ ],
538
+ class_agnostic=True,
539
+ max_sample_num=1000),
540
+ dict(
541
+ type='VMRDAffordDataset',
542
+ ann_file='data/vmrd/ImageSets/Main/test.txt',
543
+ img_prefix='data/vmrd/',
544
+ using_gt_proposals=True,
545
+ pipeline=[
546
+ dict(type='LoadImageFromFile'),
547
+ dict(type='LoadRelationProposals'),
548
+ dict(
549
+ type='MultiScaleFlipAug',
550
+ img_scale=(1000, 600),
551
+ flip=False,
552
+ transforms=[
553
+ dict(type='Resize', keep_ratio=True),
554
+ dict(
555
+ type='Normalize',
556
+ mean=[123.675, 116.28, 103.53],
557
+ std=[1.0, 1.0, 1.0],
558
+ to_rgb=True),
559
+ dict(type='Pad', size_divisor=32),
560
+ dict(type='ImageToTensor', keys=['img']),
561
+ dict(
562
+ type='Collect',
563
+ keys=['img', 'relation_proposals'])
564
+ ])
565
+ ],
566
+ class_agnostic=True)
567
+ ],
568
+ separate_eval=True,
569
+ class_agnostic=True),
570
+ samples_per_gpu=4,
571
+ workers_per_gpu=2)
572
+ evaluation = dict(interval=1, metric=['mAP', 'ImgAcc'])
573
+ optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001)
574
+ optimizer_config = dict(grad_clip=dict(max_norm=100, norm_type=2))
575
+ lr_config = dict(
576
+ policy='step',
577
+ warmup='linear',
578
+ warmup_iters=4000,
579
+ warmup_ratio=0.001,
580
+ step=[12, 18])
581
+ runner = dict(type='EpochBasedRunner', max_epochs=20)
582
+ checkpoint_config = dict(interval=1, max_keep_ckpts=5)
583
+ log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
584
+ dist_params = dict(backend='nccl')
585
+ log_level = 'INFO'
586
+ load_from = None
587
+ resume_from = None
588
+ workflow = [('train', 1)]
589
+ opencv_num_threads = 0
590
+ mp_start_method = 'fork'
591
+ auto_scale_lr = dict(enable=False, base_batch_size=16)
592
+ mmdet = None
593
+ mmdet_root = '/data/home/hanbo/projects/cloud_services/service/vmrn/vmrn_models/mmdetection/mmdet'
594
+ test_with_object_detector = False
595
+ test_crop_config = (174, 79, 462, 372)
596
+ kinect_img_pipeline = [
597
+ dict(type='LoadImageFromFile'),
598
+ dict(type='LoadRelationProposals'),
599
+ dict(
600
+ type='FixedCrop',
601
+ crop_type='absolute',
602
+ top_left=(174, 79),
603
+ bottom_right=(462, 372)),
604
+ dict(
605
+ type='MultiScaleFlipAug',
606
+ img_scale=(1000, 600),
607
+ flip=False,
608
+ transforms=[
609
+ dict(type='Resize', keep_ratio=True),
610
+ dict(
611
+ type='Normalize',
612
+ mean=[123.675, 116.28, 103.53],
613
+ std=[1.0, 1.0, 1.0],
614
+ to_rgb=True),
615
+ dict(type='Pad', size_divisor=32),
616
+ dict(type='ImageToTensor', keys=['img']),
617
+ dict(type='Collect', keys=['img', 'relation_proposals'])
618
+ ])
619
+ ]
620
+ seen_val_dataset = dict(
621
+ type='REGRADAffordDataset',
622
+ data_root='data/regrad/',
623
+ using_depth=False,
624
+ using_gt_proposals=True,
625
+ meta_info_file='dataset_seen_val_1k/meta_infos.json',
626
+ ann_file='dataset_seen_val_1k/objects.json',
627
+ img_prefix='dataset_seen_val_1k/RGBImages',
628
+ seg_prefix='dataset_seen_val_1k/SegmentationImages',
629
+ depth_prefix='dataset_seen_val_1k/DepthImages',
630
+ test_mode=True,
631
+ pipeline=[
632
+ dict(type='LoadImageFromFile'),
633
+ dict(type='LoadRelationProposals'),
634
+ dict(
635
+ type='MultiScaleFlipAug',
636
+ img_scale=(1000, 600),
637
+ flip=False,
638
+ transforms=[
639
+ dict(type='Resize', keep_ratio=True),
640
+ dict(
641
+ type='Normalize',
642
+ mean=[123.675, 116.28, 103.53],
643
+ std=[1.0, 1.0, 1.0],
644
+ to_rgb=True),
645
+ dict(type='Pad', size_divisor=32),
646
+ dict(type='ImageToTensor', keys=['img']),
647
+ dict(type='Collect', keys=['img', 'relation_proposals'])
648
+ ])
649
+ ],
650
+ class_agnostic=True,
651
+ max_sample_num=1000)
652
+ unseen_val_dataset = dict(
653
+ type='REGRADAffordDataset',
654
+ data_root='data/regrad/',
655
+ using_depth=False,
656
+ using_gt_proposals=True,
657
+ meta_info_file='dataset_unseen_val_1k/meta_infos.json',
658
+ ann_file='dataset_unseen_val_1k/objects.json',
659
+ img_prefix='dataset_unseen_val_1k/RGBImages',
660
+ seg_prefix='dataset_unseen_val_1k/SegmentationImages',
661
+ depth_prefix='dataset_unseen_val_1k/DepthImages',
662
+ test_mode=True,
663
+ pipeline=[
664
+ dict(type='LoadImageFromFile'),
665
+ dict(type='LoadRelationProposals'),
666
+ dict(
667
+ type='MultiScaleFlipAug',
668
+ img_scale=(1000, 600),
669
+ flip=False,
670
+ transforms=[
671
+ dict(type='Resize', keep_ratio=True),
672
+ dict(
673
+ type='Normalize',
674
+ mean=[123.675, 116.28, 103.53],
675
+ std=[1.0, 1.0, 1.0],
676
+ to_rgb=True),
677
+ dict(type='Pad', size_divisor=32),
678
+ dict(type='ImageToTensor', keys=['img']),
679
+ dict(type='Collect', keys=['img', 'relation_proposals'])
680
+ ])
681
+ ],
682
+ class_agnostic=True,
683
+ max_sample_num=1000)
684
+ real_val_dataset = dict(
685
+ type='REGRADAffordDataset',
686
+ data_root='data/regrad/',
687
+ using_depth=False,
688
+ using_gt_proposals=True,
689
+ meta_info_file='real/meta_infos.json',
690
+ ann_file='real/objects.json',
691
+ img_prefix='real/RGBImages',
692
+ img_suffix='png',
693
+ depth_prefix='real/DepthImages',
694
+ test_mode=True,
695
+ test_gt_bbox_offset=(174, 79),
696
+ pipeline=[
697
+ dict(type='LoadImageFromFile'),
698
+ dict(type='LoadRelationProposals'),
699
+ dict(
700
+ type='FixedCrop',
701
+ crop_type='absolute',
702
+ top_left=(174, 79),
703
+ bottom_right=(462, 372)),
704
+ dict(
705
+ type='MultiScaleFlipAug',
706
+ img_scale=(1000, 600),
707
+ flip=False,
708
+ transforms=[
709
+ dict(type='Resize', keep_ratio=True),
710
+ dict(
711
+ type='Normalize',
712
+ mean=[123.675, 116.28, 103.53],
713
+ std=[1.0, 1.0, 1.0],
714
+ to_rgb=True),
715
+ dict(type='Pad', size_divisor=32),
716
+ dict(type='ImageToTensor', keys=['img']),
717
+ dict(type='Collect', keys=['img', 'relation_proposals'])
718
+ ])
719
+ ],
720
+ class_agnostic=True)
721
+ regrad_datatype = 'REGRADAffordDataset'
722
+ regrad_root = 'data/regrad/'
723
+ vmrd_datatype = 'VMRDAffordDataset'
724
+ vmrd_root = 'data/vmrd/'
725
+ vmrd_train = dict(
726
+ type='VMRDAffordDataset',
727
+ ann_file='data/vmrd/ImageSets/Main/trainval.txt',
728
+ img_prefix='data/vmrd/',
729
+ pipeline=[
730
+ dict(type='LoadImageFromFile', to_float32=True),
731
+ dict(
732
+ type='LoadAnnotationsCustom',
733
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
734
+ dict(type='RandomFlip', flip_ratio=0.5),
735
+ dict(type='PhotoMetricDistortion'),
736
+ dict(type='Expand', mean=[123.675, 116.28, 103.53]),
737
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
738
+ dict(
739
+ type='Normalize',
740
+ mean=[123.675, 116.28, 103.53],
741
+ std=[1.0, 1.0, 1.0],
742
+ to_rgb=True),
743
+ dict(type='Pad', size_divisor=32),
744
+ dict(
745
+ type='DefaultFormatBundleCustom',
746
+ keys=[
747
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
748
+ ]),
749
+ dict(
750
+ type='Collect',
751
+ keys=[
752
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
753
+ ])
754
+ ],
755
+ class_agnostic=True)
756
+ regrad_train = dict(
757
+ type='REGRADAffordDataset',
758
+ data_root='data/regrad/',
759
+ meta_info_file='dataset_train_5k/meta_infos.json',
760
+ ann_file='dataset_train_5k/objects.json',
761
+ img_prefix='dataset_train_5k/RGBImages',
762
+ seg_prefix='dataset_train_5k/SegmentationImages',
763
+ depth_prefix='dataset_train_5k/DepthImages',
764
+ pipeline=[
765
+ dict(type='LoadImageFromFile', to_float32=True),
766
+ dict(
767
+ type='LoadAnnotationsCustom',
768
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
769
+ dict(type='RandomFlip', flip_ratio=0.5),
770
+ dict(type='PhotoMetricDistortion'),
771
+ dict(
772
+ type='RandomCrop',
773
+ crop_type='random_keep',
774
+ allow_negative_crop=False),
775
+ dict(type='Expand', mean=[123.675, 116.28, 103.53]),
776
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
777
+ dict(
778
+ type='Normalize',
779
+ mean=[123.675, 116.28, 103.53],
780
+ std=[1.0, 1.0, 1.0],
781
+ to_rgb=True),
782
+ dict(type='Pad', size_divisor=32),
783
+ dict(
784
+ type='DefaultFormatBundleCustom',
785
+ keys=[
786
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
787
+ ]),
788
+ dict(
789
+ type='Collect',
790
+ keys=[
791
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
792
+ ])
793
+ ],
794
+ min_pos_relation=1,
795
+ class_agnostic=True)
796
+ metagraspnet_sim_train = dict(
797
+ type='MetaGraspNetAffordDataset',
798
+ data_root='data/metagraspnet/sim/',
799
+ meta_info_file='meta_infos_train.json',
800
+ pipeline=[
801
+ dict(type='LoadImageFromFile', to_float32=True),
802
+ dict(
803
+ type='LoadAnnotationsCustom',
804
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
805
+ dict(type='RandomFlip', flip_ratio=0.5),
806
+ dict(type='PhotoMetricDistortion'),
807
+ dict(
808
+ type='RandomCrop',
809
+ crop_type='random_keep',
810
+ allow_negative_crop=False),
811
+ dict(
812
+ type='Expand', mean=[123.675, 116.28, 103.53], ratio_range=(1, 2)),
813
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
814
+ dict(
815
+ type='Normalize',
816
+ mean=[123.675, 116.28, 103.53],
817
+ std=[1.0, 1.0, 1.0],
818
+ to_rgb=True),
819
+ dict(type='Pad', size_divisor=32),
820
+ dict(
821
+ type='DefaultFormatBundleCustom',
822
+ keys=[
823
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
824
+ ]),
825
+ dict(
826
+ type='Collect',
827
+ keys=[
828
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
829
+ ])
830
+ ],
831
+ min_pos_relation=1,
832
+ class_agnostic=True)
833
+ vgvrd_train_pipeline = [
834
+ dict(type='LoadImageFromFile', to_float32=True),
835
+ dict(
836
+ type='LoadAnnotationsCustom',
837
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
838
+ dict(type='RandomFlip', flip_ratio=0.5),
839
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
840
+ dict(
841
+ type='Normalize',
842
+ mean=[123.675, 116.28, 103.53],
843
+ std=[1.0, 1.0, 1.0],
844
+ to_rgb=True),
845
+ dict(type='Pad', size_divisor=32),
846
+ dict(
847
+ type='DefaultFormatBundleCustom',
848
+ keys=['img', 'gt_bboxes', 'gt_labels', 'gt_relchilds',
849
+ 'gt_relleaves']),
850
+ dict(
851
+ type='Collect',
852
+ keys=['img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'])
853
+ ]
854
+ vrd_train = dict(
855
+ type='VRDAffordDataset',
856
+ data_root='data/vrd/',
857
+ ann_file='sg_dataset/sg_train_annotations.json',
858
+ img_prefix='sg_dataset/sg_train_images/',
859
+ pipeline=[
860
+ dict(type='LoadImageFromFile', to_float32=True),
861
+ dict(
862
+ type='LoadAnnotationsCustom',
863
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
864
+ dict(type='RandomFlip', flip_ratio=0.5),
865
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
866
+ dict(
867
+ type='Normalize',
868
+ mean=[123.675, 116.28, 103.53],
869
+ std=[1.0, 1.0, 1.0],
870
+ to_rgb=True),
871
+ dict(type='Pad', size_divisor=32),
872
+ dict(
873
+ type='DefaultFormatBundleCustom',
874
+ keys=[
875
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
876
+ ]),
877
+ dict(
878
+ type='Collect',
879
+ keys=[
880
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
881
+ ])
882
+ ],
883
+ class_agnostic=True)
884
+ vg_train = dict(
885
+ type='VGAffordDataset',
886
+ data_root='data/vg/downloads',
887
+ ann_file='relationships.json',
888
+ img_prefix='',
889
+ pipeline=[
890
+ dict(type='LoadImageFromFile', to_float32=True),
891
+ dict(
892
+ type='LoadAnnotationsCustom',
893
+ keys=['gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves']),
894
+ dict(type='RandomFlip', flip_ratio=0.5),
895
+ dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
896
+ dict(
897
+ type='Normalize',
898
+ mean=[123.675, 116.28, 103.53],
899
+ std=[1.0, 1.0, 1.0],
900
+ to_rgb=True),
901
+ dict(type='Pad', size_divisor=32),
902
+ dict(
903
+ type='DefaultFormatBundleCustom',
904
+ keys=[
905
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
906
+ ]),
907
+ dict(
908
+ type='Collect',
909
+ keys=[
910
+ 'img', 'gt_bboxes', 'gt_labels', 'gt_relchilds', 'gt_relleaves'
911
+ ])
912
+ ],
913
+ class_agnostic=True)
914
+ real_test_pipeline = [
915
+ dict(type='LoadImageFromFile'),
916
+ dict(type='LoadRelationProposals'),
917
+ dict(
918
+ type='FixedCrop',
919
+ crop_type='absolute',
920
+ top_left=(174, 79),
921
+ bottom_right=(462, 372)),
922
+ dict(
923
+ type='MultiScaleFlipAug',
924
+ img_scale=(1000, 600),
925
+ flip=False,
926
+ transforms=[
927
+ dict(type='Resize', keep_ratio=True),
928
+ dict(
929
+ type='Normalize',
930
+ mean=[123.675, 116.28, 103.53],
931
+ std=[1.0, 1.0, 1.0],
932
+ to_rgb=True),
933
+ dict(type='Pad', size_divisor=32),
934
+ dict(type='ImageToTensor', keys=['img']),
935
+ dict(type='Collect', keys=['img', 'relation_proposals'])
936
+ ])
937
+ ]
938
+ regrad_seen_val_dataset = dict(
939
+ type='REGRADAffordDataset',
940
+ data_root='data/regrad/',
941
+ using_depth=False,
942
+ using_gt_proposals=True,
943
+ meta_info_file='dataset_seen_val_1k/meta_infos.json',
944
+ ann_file='dataset_seen_val_1k/objects.json',
945
+ img_prefix='dataset_seen_val_1k/RGBImages',
946
+ seg_prefix='dataset_seen_val_1k/SegmentationImages',
947
+ depth_prefix='dataset_seen_val_1k/DepthImages',
948
+ test_mode=True,
949
+ pipeline=[
950
+ dict(type='LoadImageFromFile'),
951
+ dict(type='LoadRelationProposals'),
952
+ dict(
953
+ type='MultiScaleFlipAug',
954
+ img_scale=(1000, 600),
955
+ flip=False,
956
+ transforms=[
957
+ dict(type='Resize', keep_ratio=True),
958
+ dict(
959
+ type='Normalize',
960
+ mean=[123.675, 116.28, 103.53],
961
+ std=[1.0, 1.0, 1.0],
962
+ to_rgb=True),
963
+ dict(type='Pad', size_divisor=32),
964
+ dict(type='ImageToTensor', keys=['img']),
965
+ dict(type='Collect', keys=['img', 'relation_proposals'])
966
+ ])
967
+ ],
968
+ class_agnostic=True,
969
+ max_sample_num=1000)
970
+ regrad_unseen_val_dataset = dict(
971
+ type='REGRADAffordDataset',
972
+ data_root='data/regrad/',
973
+ using_depth=False,
974
+ using_gt_proposals=True,
975
+ meta_info_file='dataset_unseen_val_1k/meta_infos.json',
976
+ ann_file='dataset_unseen_val_1k/objects.json',
977
+ img_prefix='dataset_unseen_val_1k/RGBImages',
978
+ seg_prefix='dataset_unseen_val_1k/SegmentationImages',
979
+ depth_prefix='dataset_unseen_val_1k/DepthImages',
980
+ test_mode=True,
981
+ pipeline=[
982
+ dict(type='LoadImageFromFile'),
983
+ dict(type='LoadRelationProposals'),
984
+ dict(
985
+ type='MultiScaleFlipAug',
986
+ img_scale=(1000, 600),
987
+ flip=False,
988
+ transforms=[
989
+ dict(type='Resize', keep_ratio=True),
990
+ dict(
991
+ type='Normalize',
992
+ mean=[123.675, 116.28, 103.53],
993
+ std=[1.0, 1.0, 1.0],
994
+ to_rgb=True),
995
+ dict(type='Pad', size_divisor=32),
996
+ dict(type='ImageToTensor', keys=['img']),
997
+ dict(type='Collect', keys=['img', 'relation_proposals'])
998
+ ])
999
+ ],
1000
+ class_agnostic=True,
1001
+ max_sample_num=1000)
1002
+ regrad_real_val_dataset = dict(
1003
+ type='REGRADAffordDataset',
1004
+ data_root='data/regrad/',
1005
+ using_depth=False,
1006
+ using_gt_proposals=True,
1007
+ meta_info_file='real/meta_infos.json',
1008
+ ann_file='real/objects.json',
1009
+ img_prefix='real/RGBImages',
1010
+ img_suffix='png',
1011
+ depth_prefix='real/DepthImages',
1012
+ test_mode=True,
1013
+ test_gt_bbox_offset=(174, 79),
1014
+ pipeline=[
1015
+ dict(type='LoadImageFromFile'),
1016
+ dict(type='LoadRelationProposals'),
1017
+ dict(
1018
+ type='FixedCrop',
1019
+ crop_type='absolute',
1020
+ top_left=(174, 79),
1021
+ bottom_right=(462, 372)),
1022
+ dict(
1023
+ type='MultiScaleFlipAug',
1024
+ img_scale=(1000, 600),
1025
+ flip=False,
1026
+ transforms=[
1027
+ dict(type='Resize', keep_ratio=True),
1028
+ dict(
1029
+ type='Normalize',
1030
+ mean=[123.675, 116.28, 103.53],
1031
+ std=[1.0, 1.0, 1.0],
1032
+ to_rgb=True),
1033
+ dict(type='Pad', size_divisor=32),
1034
+ dict(type='ImageToTensor', keys=['img']),
1035
+ dict(type='Collect', keys=['img', 'relation_proposals'])
1036
+ ])
1037
+ ],
1038
+ class_agnostic=True)
1039
+ vmrd_val_dataset = dict(
1040
+ type='VMRDAffordDataset',
1041
+ ann_file='data/vmrd/ImageSets/Main/test.txt',
1042
+ img_prefix='data/vmrd/',
1043
+ using_gt_proposals=True,
1044
+ pipeline=[
1045
+ dict(type='LoadImageFromFile'),
1046
+ dict(type='LoadRelationProposals'),
1047
+ dict(
1048
+ type='MultiScaleFlipAug',
1049
+ img_scale=(1000, 600),
1050
+ flip=False,
1051
+ transforms=[
1052
+ dict(type='Resize', keep_ratio=True),
1053
+ dict(
1054
+ type='Normalize',
1055
+ mean=[123.675, 116.28, 103.53],
1056
+ std=[1.0, 1.0, 1.0],
1057
+ to_rgb=True),
1058
+ dict(type='Pad', size_divisor=32),
1059
+ dict(type='ImageToTensor', keys=['img']),
1060
+ dict(type='Collect', keys=['img', 'relation_proposals'])
1061
+ ])
1062
+ ],
1063
+ class_agnostic=True)
1064
+ train_sampler = dict(
1065
+ type='DistributedWeightedSampler',
1066
+ weights=[0.1, 0.1, 0.05, 0.05, 0.7],
1067
+ sample_per_epoch=150000,
1068
+ shuffle=True)
1069
+ work_dir = './work_dirs/relation_afford_r101_caffe_c4_1x_regrad_vmrd_metagraspnet_vrd_vg_class_agnostic'
1070
+ gpu_ids = range(0, 8)