Abstract |
: |
Text in video provides brief and important content information which is helpful to video scene understanding, annotation and searching. Most of the previous approaches to extracting text from videos are based on low-level features, such as edge, color, and texture information. However, existing methods experience difficulties in handling texts with various contrasts or inserted in a complex background. In this paper, we propose a novel framework to detect and extract the text from the video scene. A morphological binary map is generated by calculating difference between the closing image and the opening image. Then candidate regions are connected by using a morphological dilation operation and the text regions are determined based on the occurrence of text in each candidate. The detected text regions are localized accurately using the projection of text pixels in the morphological binary map and the text extraction is finally conducted. The proposed method is robust to different character size, position, contrast, and color. It is also language independent. Text region update between frames is also employed to reduce the processing time. Experiments are performed on diverse videos to confirm the efficiency of the proposed method. |