DVB (Digital Video Broadcasting) is the standard for TV video in a large number of countries, and is especially prevalent in Europe. In a DVB video stream, subtitles are present as colored bitmap images, which are simply overlaid on the video if subtitles are turned on in the viewing system.
CCExtractor already had excellent support for DVB subtitle text recognition, using Tesseract. It was done by first binarizing the bitmap so that text and the background were separate. This resulted in accurate text recognition by cleaning up the image, but lost color information in cases where multiple colors of text were present in a single bitmap. An additional requirement was to detect the color of each word in the subtitle.
Why Color Is Important
Color changes in DVB subtitles refer to speaker changes in the program. Assigning a different color for a different speaker enriches the assistive capabilities of captions (e.g. for hearing impaired people). Speaker change detection also holds a very large importance for various text processing algorithms for which CCExtractor is a major data source.
Bitmaps and Color Histograms
Bitmaps are just 2-D arrays of numbers, along with an accompanying palette. A palette is like a dictionary which represents a mapping from the pixel value in the bitmap to the actual RGB value. For example, the bitmap may have values ranging from 1 to 8, and 1 may represent Black (0,0,0), 8 may represent White (255,255,255) and so on. DVB subtitles are also bitmaps with their corresponding palettes. Color Histograms are a way to represent the frequency of each color in the image. They are a frequency representation of every single pixel value in the bitmap. The more the amount of a particular pixel value in the image, the higher will its histogram value be.
Word-Wise Color Quantization
The color detection for every word is done by iterating over the bounding boxes of every word obtained in the original DVB OCR results. For every bounding box, a color quantization process is performed. Color quantization essentially means changing the pixel value to a nearby value which has a much higher frequency in the histogram. Using this information from a two bin color quantization, the background and foreground colors are determined, and the foreground (text) color is assigned as the detected color of the text.
Successive words with the same color are grouped together and the points at which the color changed are marked with <font> tags.
A DVB frame with three colours along with the corresponding output is as shown:-
00:01:47,780 –> 00:01:51,339
<font color=”#00ff00″>So he spent last night in a cell?</font>
<font color=”#ececec”>It’s a ROOM. Not a cell. </font><font color=”#ffff00″>Ian!</font>
The color values are exactly what their pixel values are in the bitmap.
DVB Crash Fix
As a result of working on DVB Color Detection, I also noticed and fixed an important bug which was causing a lot of periodic crashes while continuously processing DVB subtitles. The bug was largely due to Tesseract OCR returning multiple newlines at the end of a line. I made a quick fix by increasing the memory allocated to the resulting string variable. It resulted in a large increase in the stability of the DVB processing pipeline.
Although there are still a few issues and bugs in the program, the DVB system is quite stable.