Welcome to CCExtractor's home
A free, GPL licensed closed caption tool

Current version: 0.43, from June 20th, 2008.
|
Update: I no longer have access to my primary source of .TS (transport streams) files. Since I don't live in the US I can't just record my own. Having access to TS files is obviously essential to ccextractor development, so if you can help please contact me.
|
What's CCExtractor?
In short CCExtractor is a small program that processes MPEG 2 files and extracts closed captions data to generate subtitle files. CCExtractor is portable. Linux and Windows versions are included in the .zip (it's less than 230 Kb so everything comes in the same file) plus a build script for OSX.
What? New versions? For real?
Yes :-) I know I've been lazy lately, mostly because ccextractor 0.34 worked well for
me. I got some reports that there were timing problems in some files (usually were
VideoReDo was involved) but not samples.
Anyway, around two months ago Volker Quetschke started submitting patches and reworking
parts of the MPEG code. Also, a couple of users helped very actively by testing a new
build every day, sending reports back, clips, etc.
If this version doesn't work for you, please contact me. Just be ready to help - if you have patience I'll have patience :-)
What's new in the last version?
A Windows GUI! I don't claim it's the best GUI ever but it should do. There's a small window of time for GUI development, so go ahead and send your suggestions and bug reports as soon as possible.
Note: The GUI requires the .NET runtime.

What's the point of generating separate files for subtitles, if they are already in the source file?
There are several reasons to have subtitles separated from the video file, including:
- Closed captions never survive MPEG processing. If you take a MPEG file and encode it to any format (such as divx), your result file will not have closed captions. This means that if you want to keep the subtitles, you need to keep the original file. This is hardly practical if you are archiving HDTV shows for example.
- Subtitles files are small - so small (around 250 Kb for a movie) that you can quickly download them, or email them, etc, in case you have a recording without subtitles.
- Subtitles files are indexable: You can have a database with all your subtitles if you want (there are many available), so you can search the dialogs.
- Subtitles files are a de-facto standard: Almost every player can use them. In fact, many setbox players accept subtitles files in .srt format - so you can have subtitles in your divx movies and not just in your original DVDs.
- Closed captions are stored in many different formats by capture cards. Upgrading to a new card, if it comes with a new player, may mean that you can't use your previously recorded closed captions, even if the audio/video are fine.
- Closed captions require a closed caption decoder. All US TV have one (it's a legal requirement), but no European TV does, since there are not closed captions in Europe (teletext is used instead). Basically this means that if you buy a DVD in the US which has closed captions but no DVD subtitles, you are out of luck. This is a problem with many (most) old TV shows DVDs, which only come with closed captions. DVD producers don't bother doing DVD subs, since it's another way to segment the market, same as with DVD regions.
How I do use subtitles once they are in a separate file?
CCExtractor generates files in the two most common formats: .srt (SubRip) and .smi (which is a Microsoft standard). Most players support at least .srt natively. You just need to name the .srt file as the file you want to play it with, for example sample.avi and sample.srt.
What kind of files can I extract closed captions from?
CCExtractor currently handles:
- DVDs.
- Most HDTV captures (where you save the Transport Stream).
- Captures where captions are recorded in bttv format. The number of cards that use this card is huge. My test samples came from a Hauppage PVR-250. You can check the complete list here.
- DVR-MS (microsoft digital video recording).
- Tivo files
Usually, if you record a TV show with your capture card and CCExtractor produces the expected result, it will work for your all recordings. If it doesn't, which means that your card uses a format CCExtractor can't handle, please contact me and we'll try to make it work.
Can I edit the subtitles?
.srt files are just text files, with time information (when subtitles are supposed to be shown and for how long) and some basic formatting (use italics, bold, etc). So you can edit them with any text editor. If you need to do serious editing (such as adjusting timing), you can use subtitle editing tools - there are many available. A good source for your video needs is doom9.org.
Can CCExtractor generate other subtitles formats?
At this time, CCExtractor can generate .srt, .smi and raw files.
What's a raw file?
A raw file is a file that contains an exact dump of the closed captions bytes, without any processing. This lets you use any tool of your choice to process the data. For example, McPoodle's excellent tools can generate subtitles files in several formats, adjust timing, etc.
How long does it take to process a MPEG file?
Obviously, it depends on the computer and the length of the file. In my computer it takes around 90 seconds for a 45 minutes show in HDTV, with CPU usage around 3% (I/O operations are what's holding it back).
What platforms does CCExtractor work on?
The distribution .zip file comes with linux and Windows binaries, plus a build script for OSX (I don't own a Mac so I can't provide the binary myself). Also the C source is included too, so you can compile it for any other platform. If you do, let me know.
Where can I download it?
CCExtractor is hosted in sourceforge. This is the download page and this is the project summary page.
How I can contact the author?
Send me an email to
.
How do I use this tool (parameters, etc)?
Run it without parameters and you will get a help screen. Basically, you just give it the input file name and whether you want a .srt or .raw file, like this:
ccextractor -srt the.sopranos.ts
I am aware that the documentation is not very comprehensive. This is partially intentional. I prefer to keep the required info as a help screen, so you don't need to have several files around.
How can I contribute to this project?
There are several ways:
- If you are a developer, since the source code is available, you can fix things or add features yourself and send them to me.
- If you are an user and find any bug, or have good suggestions, let me know.
- If you are doing your own recordings are have any particular one that CCExtractor can't process correctly, I'd definitely like to take a look at it and try to fix it.
- If you really hate that there is not a lot of documentation, you can write it yourself. I'll answer any question you might have.
- Finally, you can give CCExtractor a good rating at freshmeat.
Does CCExtractor use code from other projects?
Yes. Lots of code comes from McPoodle's tools (even though it was ported from Perl to C). I've also taken code from MythTV (which in turn took some from other places).
A good thing about Open Source is that you don't need to reinvent the wheel unless you want to (or unless you think you can come up with a 'rounder' wheel).