admin管理员组

文章数量:1530518

前言

SAR舰船检测数据集SSDD(SAR Ship Detection Dataset) 可以说是比较经典的数据集了,在 SAR Ship Detection Dataset (SSDD): Official Release and Comprehensive Data Analysis 里有这样一段话

The images with the last digits of the file number 1 and 9 are uniquely determined as the test set, and the rest are regarded as the training set. Such a rule can also maintain the distribution consistency of the training set and test set, which is conducive to network feature learning.

大致翻译一下,就是文件编号为1和9的最后一位的图像被严格确定为测试集,其余图像被视为训练集(本人注释:包括验证集)。这样的规则还可以保持训练集和测试集分布的一致性,有利于网络特征学习。

毕竟样本太少了,只有1160个,随机划分可能会破坏训练集和测试集之间的分布一致性,导致结果不一样。而且,对于每一个样本都是十分珍贵的。但是对于训练集和验证集的划分,论文并没有给出明确的规定。但是给出了一个建议是建立交叉验证集。这里我是给出了尾号8作为验证集,这样验证集中就包括了近岸和远海目标。

所以写了个脚本分一下训练集和检测集。


代码

suffix_1 = list(range(1,1160,10))
suffix_9 = list(range(9,1160,10))
suffix_8 = list(range(8,1160,10)) # 验证集不想用尾号8可以改
suffix_1_9 = suffix_1+suffix_9
suffix_1_9.sort()
#-----------------------test---------------------#
test = [str(i).zfill(6) for i in suffix_1_9]

with open("ImageSets/Main/test.txt", 'w') as f:
    for i in test:
        f.write(i+'\n')
#-------------------train&val--------------------#
suf_not_1_9 = []

for i in list(range(1,1161)):
    if i not in suffix_1_9:
        suf_not_1_9.append(i)

trainval = [str(i).zfill(6) for i in suf_not_1_9]

with open("ImageSets/Main/trainval.txt", 'w') as f:
    for i in trainval:
        f.write(i+'\n')
#-----------------val----------------------------#
val = [str(i).zfill(6) for i in suffix_8]

with open("ImageSets/Main/val.txt", 'w') as f:
    for i in val:
        f.write(i+'\n')
#-----------------train--------------------------#
suf_not_1_8_9 = []

for i in suf_not_1_9:
    if i not in suffix_8:
        suf_not_1_8_9.append(i)

train = [str(i).zfill(6) for i in suf_not_1_8_9]

with open("ImageSets/Main/train.txt", 'w') as f:
    for i in train:
        f.write(i+'\n')

本文标签: 舰船代码数据SARSSDD