【AR实验室】mulberryAR:并行提取ORB特征
- 时间:
- 浏览:0
本文转载请注明出处 —— polobymulberry-博客园
0x00 - 前言
在【AR实验室】mulberryAR : ORBSLAM2+VVSION末尾提及了ipone6手机手机4 5s真机测试结果,其中ExtractORB函数,也要是提取图像的ORB结构一种块耗时很可观。要是这也是目前前要优化的重中之重。此处,我使用【AR实验室】mulberryAR :加上连续图像作为输入中加上的连续图像作为输入。越来越 的好处有有一一好2个 多,有一一好2个 多要是保证输入一致,越来越 单进程运行提取结构和并行提取结构一种最好的措施优化对比就比较有可信度,越来越 是都前要使用iOS模拟器来跑进程运行了,但会 不前要打开摄像头的,测试起来相当方便,更有多种机型任你选。
目前对结构提取这次要优化就只能有一一好2个 多想法:
- 将结构提取的过程并行化。
- 减少提取的结构点数量。
第二种最好的措施很容易,只前要在配置文件中更改提取结构点的数目即可,此处不赘述。本文主要集中第一种最好的措施,初步尝试将结构提取并行化。
0x01 - ORB-SLAM2结构提取过程耗几点几分 析
ORB-SLAM2中结构提取函数叫做ExtractORB,是Frame类的有一一好2个 多成员函数。用来提取当前Frame的ORB结构点。
// flag是给双目相机用的,单目相机默认flag为0 // 提取im上的ORB结构点 void Frame::ExtractORB(int flag, const cv::Mat &im) { if(flag==0) // mpORBextractorLeft是ORBextractor对象,但会 ORBextractor重载了() // 要是才会有下面一种用法 (*mpORBextractorLeft)(im,cv::Mat(),mvKeys,mDescriptors); else (*mpORBextractorRight)(im,cv::Mat(),mvKeysRight,mDescriptorsRight); }
从上边代码都前要看出ORB-SLAM2结构提取主要调用的是ORBextractor重载的()函数。我们歌词 歌词 歌词 给该函数重要的2个次要打点,测试每个次要的耗时。
重要提示-测试代码执行时间:
测试某段代码执行的时间有要是种最好的措施,比如:
clock_t begin = clock(); //... clock_t end = clock(); cout << "execute time = " << (end - begin) / CLOCKS_PER_SEC << "s" << endl;
不过我然后 在多进程运行求和【原】C++11并行计算 — 数组求生和熟使用上述最好的措施计时,发现一种最好的措施对于多进程运行计算位于bug。但会 目前我是基于iOS平台,要是此处我使用了iOS中计算时间的最好的措施。另外又但会 在C++文件中只能直接使用Foundation组件,要是采用对应的CoreFoundation。
CFAbsoluteTime beginTime = CFAbsoluteTimeGetCurrent(); CFDateRef beginDate = CFDateCreate(kCFAllocatorDefault, beginTime); // ... CFAbsoluteTime endime = CFAbsoluteTimeGetCurrent(); CFDateRef endDate = CFDateCreate(kCFAllocatorDefault, endTime); CFTimeInterval timeInterval = CFDateGetTimeIntervalSinceDate(endDate, beginDate); cout << "execure time = " << (double)(timeInterval) * 60 0.0 << "ms" << endl;
将上述计时代码插入到operator()函数中,目前函数整体看起来如下,主要是对有一一好2个 多次要进行计时,分别为ComputePyramid、ComputeKeyPointsOctTree和ComputeDescriptors:
void ORBextractor::operator()( InputArray _image, InputArray _mask, vector<KeyPoint>& _keypoints, OutputArray _descriptors) { if(_image.empty()) return; Mat image = _image.getMat(); assert(image.type() == CV_8UC1 ); // 1.计算图像金字塔的时间 CFAbsoluteTime beginComputePyramidTime = CFAbsoluteTimeGetCurrent(); CFDateRef computePyramidBeginDate = CFDateCreate(kCFAllocatorDefault, beginComputePyramidTime); // Pre-compute the scale pyramid ComputePyramid(image); CFAbsoluteTime endComputePyramidTime = CFAbsoluteTimeGetCurrent(); CFDateRef computePyramidEndDate = CFDateCreate(kCFAllocatorDefault, endComputePyramidTime); CFTimeInterval computePyramidTimeInterval = CFDateGetTimeIntervalSinceDate(computePyramidEndDate, computePyramidBeginDate); cout << "ComputePyramid time = " << (double)(computePyramidTimeInterval) * 60 0.0 << endl; vector < vector<KeyPoint> > allKeypoints; // 2.计算关键点KeyPoint的时间 CFAbsoluteTime beginComputeKeyPointsTime = CFAbsoluteTimeGetCurrent(); CFDateRef computeKeyPointsBeginDate = CFDateCreate(kCFAllocatorDefault, beginComputeKeyPointsTime); ComputeKeyPointsOctTree(allKeypoints); //ComputeKeyPointsOld(allKeypoints); CFAbsoluteTime endComputeKeyPointsTime = CFAbsoluteTimeGetCurrent(); CFDateRef computeKeyPointsEndDate = CFDateCreate(kCFAllocatorDefault, endComputeKeyPointsTime); CFTimeInterval computeKeyPointsTimeInterval = CFDateGetTimeIntervalSinceDate(computeKeyPointsEndDate, computeKeyPointsBeginDate); cout << "ComputeKeyPointsOctTree time = " << (double)(computeKeyPointsTimeInterval) * 60 0.0 << endl; Mat descriptors; int nkeypoints = 0; for (int level = 0; level < nlevels; ++level) nkeypoints += (int)allKeypoints[level].size(); if( nkeypoints == 0 ) _descriptors.release(); else { _descriptors.create(nkeypoints, 32, CV_8U); descriptors = _descriptors.getMat(); } _keypoints.clear(); _keypoints.reserve(nkeypoints); int offset = 0; // 3.计算描述子的时间 CFAbsoluteTime beginComputeDescriptorsTime = CFAbsoluteTimeGetCurrent(); CFDateRef computeDescriptorsBeginDate = CFDateCreate(kCFAllocatorDefault, beginComputeDescriptorsTime); for (int level = 0; level < nlevels; ++level) { vector<KeyPoint>& keypoints = allKeypoints[level]; int nkeypointsLevel = (int)keypoints.size(); if(nkeypointsLevel==0) continue; // preprocess the resized image Mat workingMat = mvImagePyramid[level].clone(); GaussianBlur(workingMat, workingMat, cv::Size(7, 7), 2, 2, BORDER_REFLECT_101); // Compute the descriptors Mat desc = descriptors.rowRange(offset, offset + nkeypointsLevel); computeDescriptors(workingMat, keypoints, desc, pattern); offset += nkeypointsLevel; // Scale keypoint coordinates if (level != 0) { float scale = mvScaleFactor[level]; //getScale(level, firstLevel, scaleFactor); for (vector<KeyPoint>::iterator keypoint = keypoints.begin(), keypointEnd = keypoints.end(); keypoint != keypointEnd; ++keypoint) keypoint->pt *= scale; } // And add the keypoints to the output _keypoints.insert(_keypoints.end(), keypoints.begin(), keypoints.end()); } CFAbsoluteTime endComputeDescriptorsTime = CFAbsoluteTimeGetCurrent(); CFDateRef computeDescriptorsEndDate = CFDateCreate(kCFAllocatorDefault, endComputeDescriptorsTime); CFTimeInterval computeDescriptorsTimeInterval = CFDateGetTimeIntervalSinceDate(computeDescriptorsEndDate, computeDescriptorsBeginDate); cout << "ComputeDescriptors time = " << (double)(computeDescriptorsTimeInterval) * 60 0.0 << endl; }
此时,使用ipone6手机手机4 7模拟器运行mulberryAR,但会 运行我然后 录制的一段连续图像帧,得到结果如下(此处我只截取前三帧的结果):
都前要看出优化的重点在于ComputeKeyPointsOctTree、ComputeDescriptiors。
0x02 - ORB-SLAM2结构提取优化思路
ComputePyramid、ComputeKeyPointsOctTree和ComputeDescriptors函数中后会根据图像金字塔的不同层级做同样的操作,要是此处都前要将图像金字塔不同层级的操作并行化。按照一种思路,对有一一好2个 多次要的代码进行了修改。
1.ComputePyramid函数并行化
该函数暂时无法进行并行化补救,但会 上边在计算图像金字塔中第n层图像的然后 ,依赖第n-1层的图像,另外此函数在整个结构提取的次要占比后会很大,相对来说并行化意义后会很大。
2.ComputeKeyPointsOctTree函数并行化
该函数的并行化过程很容易,只前要将其中的for(int i = 0; i < nlevels; ++i)上边的函数做成单独函数,并加上到每该人的thread中即可。不废话,直接上代码:
void ORBextractor::ComputeKeyPointsOctTree(vector<vector<KeyPoint> >& allKeypoints) { allKeypoints.resize(nlevels); vector<thread> computeKeyPointsThreads; for (int i = 0; i < nlevels; ++i) { computeKeyPointsThreads.push_back(thread(&ORBextractor::ComputeKeyPointsOctTreeEveryLevel, this, i, std::ref(allKeypoints))); } for (int i = 0; i < nlevels; ++i) { computeKeyPointsThreads[i].join(); } // compute orientations vector<thread> computeOriThreads; for (int level = 0; level < nlevels; ++level) { computeOriThreads.push_back(thread(computeOrientation, mvImagePyramid[level], std::ref(allKeypoints[level]), umax)); } for (int level = 0; level < nlevels; ++level) { computeOriThreads[level].join(); } }
其中ComputeKeyPointsOctTreeEveryLevel函数如下:
void ORBextractor::ComputeKeyPointsOctTreeEveryLevel(int level, vector<vector<KeyPoint> >& allKeypoints) { const float W = 60 ; const int minBorderX = EDGE_THRESHOLD-3; const int minBorderY = minBorderX; const int maxBorderX = mvImagePyramid[level].cols-EDGE_THRESHOLD+3; const int maxBorderY = mvImagePyramid[level].rows-EDGE_THRESHOLD+3; vector<cv::KeyPoint> vToDistributeKeys; vToDistributeKeys.reserve(nfeatures*10); const float width = (maxBorderX-minBorderX); const float height = (maxBorderY-minBorderY); const int nCols = width/W; const int nRows = height/W; const int wCell = ceil(width/nCols); const int hCell = ceil(height/nRows); for(int i=0; i<nRows; i++) { const float iniY =minBorderY+i*hCell; float maxY = iniY+hCell+6; if(iniY>=maxBorderY-3) continue; if(maxY>maxBorderY) maxY = maxBorderY; for(int j=0; j<nCols; j++) { const float iniX =minBorderX+j*wCell; float maxX = iniX+wCell+6; if(iniX>=maxBorderX-6) continue; if(maxX>maxBorderX) maxX = maxBorderX; vector<cv::KeyPoint> vKeysCell; FAST(mvImagePyramid[level].rowRange(iniY,maxY).colRange(iniX,maxX), vKeysCell,iniThFAST,true); if(vKeysCell.empty()) { FAST(mvImagePyramid[level].rowRange(iniY,maxY).colRange(iniX,maxX), vKeysCell,minThFAST,true); } if(!vKeysCell.empty()) { for(vector<cv::KeyPoint>::iterator vit=vKeysCell.begin(); vit!=vKeysCell.end();vit++) { (*vit).pt.x+=j*wCell; (*vit).pt.y+=i*hCell; vToDistributeKeys.push_back(*vit); } } } } vector<KeyPoint> & keypoints = allKeypoints[level]; keypoints.reserve(nfeatures); keypoints = DistributeOctTree(vToDistributeKeys, minBorderX, maxBorderX, minBorderY, maxBorderY,mnFeaturesPerLevel[level], level); const int scaledPatchSize = PATCH_SIZE*mvScaleFactor[level]; // Add border to coordinates and scale information const int nkps = keypoints.size(); for(int i=0; i<nkps ; i++) { keypoints[i].pt.x+=minBorderX; keypoints[i].pt.y+=minBorderY; keypoints[i].octave=level; keypoints[i].size = scaledPatchSize; } }
在ipone6手机手机4 7模拟器上测试,得到如下结果(取前5帧图像测试):
都前要看后通过并行补救,ComputeKeyPointsOctTree获得了2~3倍的提速。
3.ComputeDescriptors次要并行化
难能可贵一种次要叫做“次要”,而非“函数”是但会 这次要涉及的函数相对于ComputeKeyPointsOctTree比较简化,涉及的变量比较多。只能理清之间的关系要能安全地并行化。
此处要是赘述,直接贴出修改后的并行化代码:
vector<thread> computeDescThreads; vector<vector<KeyPoint> > keypointsEveryLevel; keypointsEveryLevel.resize(nlevels); // 图像金字塔每层的offset与前面每层的offset有关,要是只能直接放上去ComputeDescriptorsEveryLevel计算 for (int level = 0; level < nlevels; ++level) { computeDescThreads.push_back(thread(&ORBextractor::ComputeDescriptorsEveryLevel, this, level, std::ref(allKeypoints), descriptors, offset, std::ref(keypointsEveryLevel[level]))); int keypointsNum = (int)allKeypoints[level].size(); offset += keypointsNum; } for (int level = 0; level < nlevels; ++level) { computeDescThreads[level].join(); } // _keypoints要按照顺序进行插入,要是只能直接放上去ComputeDescriptorsEveryLevel计算 for (int level = 0; level < nlevels; ++level) { _keypoints.insert(_keypoints.end(), keypointsEveryLevel[level].begin(), keypointsEveryLevel[level].end()); } // 其中ComputeDescriptorsEveryLevel函数如下 void ORBextractor::ComputeDescriptorsEveryLevel(int level, std::vector<std::vector<KeyPoint> > &allKeypoints, const Mat& descriptors, int offset, vector<KeyPoint>& _keypoints) { vector<KeyPoint>& keypoints = allKeypoints[level]; int nkeypointsLevel = (int)keypoints.size(); if(nkeypointsLevel==0) return; // preprocess the resized image Mat workingMat = mvImagePyramid[level].clone(); GaussianBlur(workingMat, workingMat, cv::Size(7, 7), 2, 2, BORDER_REFLECT_101); // Compute the descriptors Mat desc = descriptors.rowRange(offset, offset + nkeypointsLevel); computeDescriptors(workingMat, keypoints, desc, pattern); // offset += nkeypointsLevel; // Scale keypoint coordinates if (level != 0) { float scale = mvScaleFactor[level]; //getScale(level, firstLevel, scaleFactor); for (vector<KeyPoint>::iterator keypoint = keypoints.begin(), keypointEnd = keypoints.end(); keypoint != keypointEnd; ++keypoint) keypoint->pt *= scale; } // And add the keypoints to the output // _keypoints.insert(_keypoints.end(), keypoints.begin(), keypoints.end()); _keypoints = keypoints; }
在ipone6手机手机4 7模拟器上测试,得到如下结果(取前5帧图像测试):
都前要看后通过并行补救,ComputeDescriptors获得了2~3倍的提速。
0x03 - 并行化结果分析
0x02小节但会 对比了每步优化的结果。此处从整体的深度1对结果进行简单的分析。使用ipone6手机手机4 7模拟器跑了前5帧的对比结果:
从结果中都前要看出,ORB结构提取数率有了2~3倍的提升,在TrackMonocular次要占比也下降了不少,暂时ORB结构提取无需作为性能优化的重点。上边但会 从有些方面对ORB-SLAM2进行优化。