环境
Python 3.8.10
标注工具labelImg
YOLO: yolo数据集标注格式主要是 U版本yolov5项目需要用到。
标签使用txt文本进行保存。
yolo标注格式如下所示:
<object-class> <x> <y> <width> <height>
例如:
0 0.412500 0.318981 0.358333 0.636111
:对象的标签索引
x,y:目标的中心坐标,相对于图片的H和W做归一化。即x/W,y/H。
width,height:目标(bbox)的宽和高,相对于图像的H和W做归一化。
1 mkdir -p test /images test /labels train/images train/labels valid/images valid/labels
VOC VOC数据集由五个部分构成:JPEGImages,Annotations,ImageSets,SegmentationClass以及SegmentationObject.
JPEGImages:存放的是训练与测试的所有图片。 Annotations:里面存放的是每张图片打完标签所对应的XML文件。 ImageSets:ImageSets文件夹下本次讨论的只有Main文件夹,此文件夹中存放的主要又有四个文本文件test.txt、train.txt、trainval.txt、val.txt, 其中分别存放的是测试集图片的文件名、训练集图片的文件名、训练验证集图片的文件名、验证集图片的文件名。 SegmentationClass与SegmentationObject:存放的都是图片,且都是图像分割结果图,对目标检测任务来说没有用。class segmentation 标注出每一个像素的类别 object segmentation 标注出每一个像素属于哪一个物体。 voc数据集的标签主要以xml文件形式进行存放。
xml文件的标注格式如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 <annotation> <folder>17 </folder> <filename>77258. bmp</filename> <path>~/frcnn-image/61 /ADAS/image/frcnn-image/17 /77258. bmp</path> <source> <database>Unknown</database> </source> <size> <width>640 </width> <height>480 </height> <depth>3 </depth> </size> <segmented>0 </segmented> <object > 包含的物体 <name>car</name> <pose>Unspecified</pose> <truncated>0 </truncated> <difficult>0 </difficult> <bndbox> <xmin>2 </xmin> <ymin>156 </ymin> <xmax>111 </xmax> <ymax>259 </ymax> </bndbox> </object > </annotation>
自制VOC数据集 首先,按照VOC2007的数据集格式要求,分别创建文件夹VOCdevkit、VOC2007、Annotations、ImageSets、Main和JPEGImages,它们的层级结构如下所示
1 2 3 4 5 6 └─VOCdevkit └─VOC2007 ├─Annotations ├─ImageSets │ └─Main └─JPEGImages
可使用命令快速创建:
1 mkdir -p VOCdevkit/VOC2007/{Annotations,ImageSets/Main,JPEGImages}
其中,Annotations用来存放xml标注文件,JPEGImages用来存放图片文件,而ImageSets/Main存放几个txt文本文件,文件的内容是训练集、验证集和测试集中图片的名称(去掉扩展名),这几个文本文件是需要我们自己生成的,后面会讲到。
接下来,将images文件夹中的图片文件拷贝到JPEGImages文件夹中,将images文件中的xml标注文件拷贝到Annotations文件夹中 接下来新建一个脚本,把它放在VOCdevkit/VOC2007文件夹下,命名为test.py
1 2 3 4 5 6 7 8 ─VOCdevkit └─VOC2007 │ test.py │ ├─Annotations ├─ImageSets │ └─Main └─JPEGImages
脚本的内容如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 import osimport randomtrainval_percent = 0.1 train_percent = 0.9 xmlfilepath = 'Annotations' txtsavepath = 'ImageSets\Main' total_xml = os.listdir(xmlfilepath) num = len (total_xml) list = range (num)tv = int (num * trainval_percent) tr = int (tv * train_percent) trainval = random.sample(list , tv) train = random.sample(trainval, tr) ftrainval = open ('ImageSets/Main/trainval.txt' , 'w' ) ftest = open ('ImageSets/Main/test.txt' , 'w' ) ftrain = open ('ImageSets/Main/train.txt' , 'w' ) fval = open ('ImageSets/Main/val.txt' , 'w' ) for i in list : name = total_xml[i][:-4 ] + '\n' if i in trainval: ftrainval.write(name) if i in train: ftest.write(name) else : fval.write(name) else : ftrain.write(name) ftrainval.close() ftrain.close() fval.close() ftest.close()
然后,进入到目录VOCdevkit/VOC2007,执行这个脚本,结束后,在ImageSets/Main下生成了4个txt文件
1 2 3 4 5 6 7 8 ├─ImageSets │ └─Main │ test.txt │ train.txt │ trainval.txt │ val.txt │ └─JPEGImages
这4个文件的格式都是一样的,文件的内容是对应图片名称去掉扩展名(与xml标注文件去掉.xml一致)的结果
然后新建脚本voc_labels.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 import xml.etree.ElementTree as ETimport pickleimport osfrom os import listdir, getcwdfrom os.path import joinsets=[('2007' , 'train' ), ('2007' , 'val' ), ('2007' , 'test' )] classes = ["hat" ] def convert (size, box ): dw = 1. /size[0 ] dh = 1. /size[1 ] x = (box[0 ] + box[1 ])/2.0 y = (box[2 ] + box[3 ])/2.0 w = box[1 ] - box[0 ] h = box[3 ] - box[2 ] x = x*dw w = w*dw y = y*dh h = h*dh return (x,y,w,h) def convert_annotation (year, image_id ): in_file = open ('VOCdevkit/VOC%s/Annotations/%s.xml' %(year, image_id)) out_file = open ('VOCdevkit/VOC%s/labels/%s.txt' %(year, image_id), 'w' ) tree=ET.parse(in_file) root = tree.getroot() size = root.find('size' ) w = int (size.find('width' ).text) h = int (size.find('height' ).text) for obj in root.iter ('object' ): difficult = obj.find('difficult' ).text cls = obj.find('name' ).text if cls not in classes or int (difficult) == 1 : continue cls_id = classes.index(cls) xmlbox = obj.find('bndbox' ) b = (float (xmlbox.find('xmin' ).text), float (xmlbox.find('xmax' ).text), float (xmlbox.find('ymin' ).text), float (xmlbox.find('ymax' ).text)) bb = convert((w,h), b) out_file.write(str (cls_id) + " " + " " .join([str (a) for a in bb]) + '\n' ) wd = getcwd() for year, image_set in sets: if not os.path.exists('VOCdevkit/VOC%s/labels/' %(year)): os.makedirs('VOCdevkit/VOC%s/labels/' %(year)) image_ids = open ('VOCdevkit/VOC%s/ImageSets/Main/%s.txt' %(year, image_set)).read().strip().split() list_file = open ('%s_%s.txt' %(year, image_set), 'w' ) for image_id in image_ids: list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n' %(wd, year, image_id)) convert_annotation(year, image_id) list_file.close()
执行上述脚本后,在VOCdevkit同级目录就会生成2007_train.txt、2007_val.txt、2007_test.txt。 自制的VOC2007数据集就已经准备好了。对应到darknet中的配置文件cfg/voc.data就可以这么写
1 2 3 4 5 classes= 1 train = 2007_train.txt valid = 2007_val.txt names = data/voc.names backup = backup/
VOC_To_YOLO
labelImg可以导出YOLO的数据格式。但是如果你拿到的是一份标注格式为xml的数据,那就需要进行转换了。
将所有图片存放在images文件夹,xml标注文件放在Annotations文件夹,然后创建一个文件夹labels
1 2 3 4 ├─Annotations ├─images ├─labels └─voc_to_yolo.py
可使用命令快速创建:
1 mkdir Annotations images labels
新建脚本voc_to_yolo.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 import xml.etree.ElementTree as ETimport pickleimport osfrom os import listdir, getcwdfrom os.path import joinclasses = ["hat" ] def convert (size, box ): dw = 1. / size[0 ] dh = 1. / size[1 ] x = (box[0 ] + box[1 ]) / 2.0 y = (box[2 ] + box[3 ]) / 2.0 w = box[1 ] - box[0 ] h = box[3 ] - box[2 ] x = x * dw w = w * dw y = y * dh h = h * dh return (x, y, w, h) def convert_annotation (image_id ): if not os.path.exists('Annotations/%s.xml' % (image_id)): return in_file = open ('annotations/%s.xml' % (image_id)) out_file = open ('labels/%s.txt' % (image_id), 'w' ) tree = ET.parse(in_file) root = tree.getroot() size = root.find('size' ) w = int (size.find('width' ).text) h = int (size.find('height' ).text) for obj in root.iter ('object' ): cls = obj.find('name' ).text if cls not in classes: continue cls_id = classes.index(cls) xmlbox = obj.find('bndbox' ) b = (float (xmlbox.find('xmin' ).text), float (xmlbox.find('xmax' ).text), float (xmlbox.find('ymin' ).text), float (xmlbox.find('ymax' ).text)) bb = convert((w, h), b) out_file.write(str (cls_id) + " " + " " .join([str (a) for a in bb]) + '\n' ) for image in os.listdir('images' ): image_id = image.split('.' )[0 ] convert_annotation(image_id)
执行上述脚本后,labels文件夹就会生成txt格式的标注文件了
yolov5训练时使用的数据集结构是:
1 2 3 4 5 6 7 8 9 ├─test │ ├─images │ └─labels ├─train │ ├─images │ └─labels └─valid ├─images └─labels
可使用命令快速创建:
1 mkdir -p {test ,train,valid}/{images,labels}
目录结构:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 ├─Annotations ├─images ├─labels ├─test │ ├─images │ └─labels ├─train │ ├─images │ └─labels └─valid ├─images └─labels └─voc_to_yolo.py └─distribution.py
我们还需要将图片文件和对应的txt标签文件再进行一次划分,首先创建外层的train、valid、test文件夹,然后在每个文件夹底下都分别创建images和labels文件夹 新建distribution.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 import osimport shutilimport randomtest_percent = 0.1 valid_percent = 0.2 train_percent = 0.7 image_path = 'images' label_path = 'labels' images_files_list = os.listdir(image_path) labels_files_list = os.listdir(label_path) print ('images files: {}' .format (images_files_list))print ('labels files: {}' .format (labels_files_list))total_num = len (images_files_list) print ('total_num: {}' .format (total_num))test_num = int (total_num * test_percent) valid_num = int (total_num * valid_percent) train_num = int (total_num * train_percent) test_image_index = random.sample(range (total_num), test_num) valid_image_index = random.sample(range (total_num), valid_num) train_image_index = random.sample(range (total_num), train_num) for i in range (total_num): print ('src image: {}, i={}' .format (images_files_list[i], i)) if i in test_image_index: shutil.copyfile('images/{}' .format (images_files_list[i]), 'test/images/{}' .format (images_files_list[i])) shutil.copyfile('labels/{}' .format (labels_files_list[i]), 'test/labels/{}' .format (labels_files_list[i])) elif i in valid_image_index: shutil.copyfile('images/{}' .format (images_files_list[i]), 'valid/images/{}' .format (images_files_list[i])) shutil.copyfile('labels/{}' .format (labels_files_list[i]), 'valid/labels/{}' .format (labels_files_list[i])) else : shutil.copyfile('images/{}' .format (images_files_list[i]), 'train/images/{}' .format (images_files_list[i])) shutil.copyfile('labels/{}' .format (labels_files_list[i]), 'train/labels/{}' .format (labels_files_list[i]))
执行代码后,可以看到类似文件层级结构
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 ─test │ ├─images │ │ 1234565343231.jpg │ │ 1559035146628.jpg │ │ 2019032210151.jpg │ │ │ └─labels │ 1234565343231.txt │ 1559035146628.txt │ 2019032210151.txt │ ├─train │ ├─images │ │ 1213211.jpg │ │ 12i4u33112.jpg │ │ 1559092537114.jpg │ │ │ └─labels │ 1213211.txt │ 12i4u33112.txt │ 1559092537114.txt │ └─valid ├─images │ 120131247621.jpg │ 124iuy311.jpg │ 1559093141383.jpg │ └─labels 120131247621.txt 124iuy311.txt 1559093141383.txt
YOLO_To_VOC 拿到了txt的标注,但是需要使用VOC,也需要进行转换。看下面这个脚本,注释写在代码中
新建yolo_to_voc.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 import osimport xml.etree.ElementTree as ETfrom PIL import Imageimport numpy as npimg_path = 'images/' labels_path = 'labels/' annotations_path = 'Annotations/' labels = os.listdir(labels_path) classes = ["hat" ] sh = sw = sd = 0 def write_xml (imgname, sw, sh, sd, filepath, labeldicts ): ''' imgname: 没有扩展名的图片名称 ''' root = ET.Element('Annotation' ) ET.SubElement(root, 'filename' ).text = str (imgname) sizes = ET.SubElement(root,'size' ) ET.SubElement(sizes, 'width' ).text = str (sw) ET.SubElement(sizes, 'height' ).text = str (sh) ET.SubElement(sizes, 'depth' ).text = str (sd) for labeldict in labeldicts: objects = ET.SubElement(root, 'object' ) ET.SubElement(objects, 'name' ).text = labeldict['name' ] ET.SubElement(objects, 'pose' ).text = 'Unspecified' ET.SubElement(objects, 'truncated' ).text = '0' ET.SubElement(objects, 'difficult' ).text = '0' bndbox = ET.SubElement(objects,'bndbox' ) ET.SubElement(bndbox, 'xmin' ).text = str (int (labeldict['xmin' ])) ET.SubElement(bndbox, 'ymin' ).text = str (int (labeldict['ymin' ])) ET.SubElement(bndbox, 'xmax' ).text = str (int (labeldict['xmax' ])) ET.SubElement(bndbox, 'ymax' ).text = str (int (labeldict['ymax' ])) tree = ET.ElementTree(root) tree.write(filepath, encoding='utf-8' ) for label in labels: with open (labels_path + label, 'r' ) as f: img_id = os.path.splitext(label)[0 ] contents = f.readlines() labeldicts = [] for content in contents: img = np.array(Image.open (img_path + label.strip('.txt' ) + '.jpg' )) sh, sw, sd = img.shape[0 ], img.shape[1 ], img.shape[2 ] content = content.strip('\n' ).split() x = float (content[1 ])*sw y = float (content[2 ])*sh w = float (content[3 ])*sw h = float (content[4 ])*sh new_dict = {'name' : classes[int (content[0 ])], 'difficult' : '0' , 'xmin' : x+1 -w/2 , 'ymin' : y+1 -h/2 , 'xmax' : x+1 +w/2 , 'ymax' : y+1 +h/2 } labeldicts.append(new_dict) write_xml(img_id, sw, sh, sd, annotations_path + label.strip('.txt' ) + '.xml' , labeldicts)
引用
https://zhuanlan.zhihu.com/p/383660741
https://zhuanlan.zhihu.com/p/461488682