Command Line Usage

This page is no longer maintained. Please go to the new one.

CCExtractor's main program is console based. There's a GUI for Windows, as well

as provisions so other programs can easily interface with CCExtractor, but the

heavy lefting is done by a command line program (that can be called by scripts so

integration with larger processes is straightforward).

Running CCExtractor without any parameter will display a help screen with all the

options. As of version 0.68 the help screen is as follows:

CCExtractor 0.68, Carlos Fernandez Sanz, Volker Quetschke.

Teletext portions taken from Petr Kutalek's telxcc

--------------------------------------------------------------------------

Originally based on McPoodle's tools. Check his page for lots of information

on closed captions technical details.

(http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_TOOLS.HTML)

This tool home page:

http://ccextractor.sourceforge.net

Extracts closed captions and teletext subtitles from video streams.

(DVB, .TS, ReplayTV 4000 and 5000, dvr-ms, bttv, Tivo, Dish Network,

.mp4, HDHomeRun are known to work).

Syntax:

ccextractor [options] inputfile1 [inputfile2...] [-o outputfilename]

[-o1 outputfilename1] [-o2 outputfilename2]

File name related options:

inputfile: file(s) to process

-o outputfilename: Use -o parameters to define output filename if you don't

like the default ones (same as infile plus _1 or _2 when

needed and file extension, e.g. .srt).

-o or -o1 -> Name of the first (maybe only) output

file.

-o2 -> Name of the second output file, when

it applies.

-cf filename: Write 'clean' data to a file. Cleans means the ES

without TS or PES headers.

-stdout: Write output to stdout (console) instead of file. If

stdout is used, then -o, -o1 and -o2 can't be used. Also

-stdout will redirect all messages to stderr (error).

You can pass as many input files as you need. They will be processed in order.

If a file name is suffixed by +, ccextractor will try to follow a numerical

sequence. For example, DVD001.VOB+ means DVD001.VOB, DVD002.VOB and so on

until there are no more files.

Output will be one single file (either raw or srt). Use this if you made your

recording in several cuts (to skip commercials for example) but you want one

subtitle file with contiguous timing.

Network support:

-udp port: Read the input via UDP (listening in the specified port)

instead of reading a file.

-udp [host:]port: Read the input via UDP (listening in the specified

port) instead of reading a file. Host can be a

hostname or IPv4 address. If host is not specified

then listens on the local host.

Options that affect what will be processed:

-1, -2, -12: Output Field 1 data, Field 2 data, or both

(DEFAULT is -1)

-cc2: When in srt/sami mode, process captions in channel 2

instead channel 1.

-svc --service N,N...: Enabled CEA-708 captions processing for the listed

services. The parameter is a command delimited list

of services numbers, such as "1,2" to process the

primary and secondary language services.

In general, if you want English subtitles you don't need to use these options

as they are broadcast in field 1, channel 1. If you want the second language

(usually Spanish) you may need to try -2, or -cc2, or both.

Input formats:

With the exception of McPoodle's raw format, which is just the closed

caption data with no other info, CCExtractor can usually detect the

input format correctly. To force a specific format:

-in=format

where format is one of these:

ts -> For Transport Streams.

ps -> For Program Streams.

es -> For Elementary Streams.

asf -> ASF container (such as DVR-MS).

bin -> CCExtractor's own binary format.

raw -> For McPoodle's raw files.

mp4 -> MP4/MOV/M4V and similar.

hex -> Hexadecimal dump as generated by wtvccdump.

-ts, -ps, -es, -mp4 and -asf (or --dvr-ms) can be used as shorts.

Output formats:

-out=format

where format is one of these:

srt -> SubRip (default, so not actually needed).

sami -> MS Synchronized Accesible Media Interface.

bin -> CC data in CCExtractor's own binary format.

raw -> CC data in McPoodle's Broadcast format.

dvdraw -> CC data in McPoodle's DVD format.

txt -> Transcript (no time codes, no roll-up

captions, just the plain transcription.

ttxt -> Timed Transcript (transcription with time

info)

smptett -> SMPTE Timed Text (W3C TTML) format.

spupng -> Set of .xml and .png files for use with

dvdauthor's spumux.

See "Notes on spupng output format"

null -> Don't produce any file output

Note: Teletext output can only be srt, txt or ttxt for now.

Options that affect how input files will be processed.

-gt --goptime: Use GOP for timing instead of PTS. This only applies

to Program or Transport Streams with MPEG2 data and

overrides the default PTS timing.

GOP timing is always used for Elementary Streams.

-nogt --nogoptime: Never use GOP timing (use PTS), even if ccextractor

detects GOP timing is the reasonable choice.

-fp --fixpadding: Fix padding - some cards (or providers, or whatever)

seem to send 0000 as CC padding instead of 8080. If you

get bad timing, this might solve it.

-90090: Use 90090 (instead of 90000) as MPEG clock frequency.

(reported to be needed at least by Panasonic DMR-ES15

DVD Recorder)

-ve --videoedited: By default, ccextractor will process input files in

sequence as if they were all one large file (i.e.

split by a generic, non video-aware tool. If you

are processing video hat was split with a editing

tool, use -ve so ccextractor doesn't try to rebuild

the original timing.

-s --stream [secs]: Consider the file as a continuous stream that is

growing as ccextractor processes it, so don't try

to figure out its size and don't terminate processing

when reaching the current end (i.e. wait for more

data to arrive). If the optional parameter secs is

present, it means the number of seconds without any

new data after which ccextractor should exit. Use

this parameter if you want to process a live stream

but not kill ccextractor externally.

Note: If -s is used then only one input file is

allowed.

-poc --usepicorder: Use the pic_order_cnt_lsb in AVC/H.264 data streams

to order the CC information. The default way is to

use the PTS information. Use this switch only when

needed.

-myth: Force MythTV code branch.

-nomyth: Disable MythTV code branch.

The MythTV branch is needed for analog captures where

the closed caption data is stored in the VBI, such as

those with bttv cards (Hauppage 250 for example). This

is detected automatically so you don't need to worry

about this unless autodetection doesn't work for you.

-wtvconvertfix: This switch works around a bug in Windows 7's built in

software to convert *.wtv to *.dvr-ms. For analog NTSC

recordings the CC information is marked as digital

captions. Use this switch only when needed.

-pn --program-number: In TS mode, specifically select a program to process.

Not needed if the TS only has one. If this parameter

is not specified and CCExtractor detects more than one

program in the input, it will list the programs found

and terminate without doing anything, unless

-autoprogram (see below) is used.

-autoprogram: If there's more than one program in the stream, just use

the first one we find that contains a suitable stream.

-datapid: Don't try to find out the stream for caption/teletext

data, just use this one instead.

-datastreamtype: Instead of selecting the stream by its PID, select it

by its type (pick the stream that has this type in

the PMT)

-streamtype: Assume the data is of this type, don't autodetect. This

parameter may be needed if -datapid or -datastreamtype

is used and CCExtractor cannot determine how to process

the stream. The value will usually be 2 (MPEG video) or

6 (MPEG private data).

-haup --hauppauge: If the video was recorder using a Hauppauge card, it

might need special processing. This parameter will

force the special treatment.

-mp4vidtrack: In MP4 files the closed caption data can be embedded in

the video track or in a dedicated CC track. If a

dedicated track is detected it will be processed instead

of the video track. If you need to force the video track

to be processed instead use this option.

-noautotimeref: Some streams come with broadcast date information. When

such data is available, CCExtractor will set its time

reference to the received data. Use this parameter if

you prefer your own reference. Note: Current this only

affects Teletext in timed transcript with -datets.

Options that affect what kind of output will be produced:

-unicode: Encode subtitles in Unicode instead of Latin-1.

-utf8: Encode subtitles in UTF-8 (no longer needed.

because UTF-8 is now the default).

-latin1: Encode subtitles in UTF-8 instead of Latin-1

-nofc --nofontcolor: For .srt/.sami, don't add font color tags.

-nots --notypesetting: For .srt/.sami, don't add typesetting tags.

-trim: Trim lines.

-dc --defaultcolor: Select a different default color (instead of

white). This causes all output in .srt/.smi

files to have a font tag, which makes the files

larger. Add the color you want in RGB, such as

-dc #FF0000 for red.

-sc --sentencecap: Sentence capitalization. Use if you hate

ALL CAPS in subtitles.

--capfile -caf file: Add the contents of 'file' to the list of words

that must be capitalized. For example, if file

is a plain text file that contains

Tony

Alan

Whenever those words are found they will be written

exactly as they appear in the file.

Use one line per word. Lines starting with # are

considered comments and discarded.

-unixts REF: For timed transcripts that have an absolute date

instead of a timestamp relative to the file start), use

this time reference (UNIX timestamp). 0 => Use current

system time.

ccextractor will automatically switch to transport

stream UTC timestamps when available.

-datets: In transcripts, write time as YYYYMMDDHHMMss,ms.

-sects: In transcripts, write time as ss,ms

-UCLA: Transcripts are generated with a specific format

that is convenient for a specific project, feel

free to play with it but be aware that this format

is really live - don't rely on its output format

not changing between versions.

-lf: Use LF (UNIX) instead of CRLF (DOS, Windows) as line

terminator.

-autodash: Based on position on screen, attempt to determine

the different speakers and a dash (-) when each

of them talks (.srt only, -trim required).Options that affect how ccextractor reads and writes (buffering):

-bi --bufferinput: Forces input buffering.

-nobi -nobufferinput: Disables input buffering.

-bs --buffersize val: Specify a size for reading, in bytes (suffix with K or

or M for kilobytes and megabytes). Default is 16M.

Note: -bo is only used when writing raw files, not .srt or .sami

Options that affect the built-in closed caption decoder:

-dru: Direct Roll-Up. When in roll-up mode, write character by

character instead of line by line. Note that this

produces (much) larger files.

-noru --norollup: If you hate the repeated lines caused by the roll-up

emulation, you can have ccextractor write only one

line at a time, getting rid of these repeated lines.

-ru1 / ru2 / ru3: roll-up captions can consist of 2, 3 or 4 visible

lines at any time (the number of lines is part of

the transmission). If having 3 or 4 lines annoys

you you can use -ru to force the decoder to always

use 1, 2 or 3 lines. Note that 1 line is not

a real mode rollup mode, so CCExtractor does what

it can.

In -ru1 the start timestamp is actually the timestamp

of the first character received which is possibly more

accurate.

Options that affect timing:

-delay ms: For srt/sami, add this number of milliseconds to

all times. For example, -delay 400 makes subtitles

appear 400ms late. You can also use negative numbers

to make subs appear early.

Notes on times: -startat and -endat times are used first, then -delay.

So if you use -srt -startat 3:00 -endat 5:00 -delay 120000, ccextractor will

generate a .srt file, with only data from 3:00 to 5:00 in the input file(s)

and then add that (huge) delay, which would make the final file start at

5:00 and end at 7:00.

Options that affect what segment of the input file(s) to process:

-startat time: Only write caption information that starts after the

given time.

Time can be seconds, MM:SS or HH:MM:SS.

For example, -startat 3:00 means 'start writing from

minute 3.

-endat time: Stop processing after the given time (same format as

-startat).

The -startat and -endat options are honored in all

output formats. In all formats with timing information

the times are unchanged.

-scr --screenfuls num: Write 'num' screenfuls and terminate processing.

Adding start and end credits:

CCExtractor can _try_ to add a custom message (for credits for example) at

the start and end of the file, looking for a window where there are no

captions. If there is no such window, then no text will be added.

The start window must be between the times given and must have enough time

to display the message for at least the specified time.

--startcreditstext txt: Write this text as start credits. If there are

several lines, separate them with the

characters \n, for example Line1\nLine 2.

--startcreditsnotbefore time: Don't display the start credits before this

time (S, or MM:SS). Default: 0

--startcreditsnotafter time: Don't display the start credits after this

time (S, or MM:SS). Default: 5:00

--startcreditsforatleast time: Start credits need to be displayed for at least

this time (S, or MM:SS). Default: 2

--startcreditsforatmost time: Start credits should be displayed for at most

this time (S, or MM:SS). Default: 5

--endcreditstext txt: Write this text as end credits. If there are

several lines, separate them with the

characters \n, for example Line1\nLine 2.

--endcreditsforatleast time: End credits need to be displayed for at least

this time (S, or MM:SS). Default: 2

--endcreditsforatmost time: End credits should be displayed for at most

this time (S, or MM:SS). Default: 5

Options that affect debug data:

-debug: Show lots of debugging output.

-608: Print debug traces from the EIA-608 decoder.

If you need to submit a bug report, please send

the output from this option.

-708: Print debug information from the (currently

in development) EIA-708 (DTV) decoder.

-goppts: Enable lots of time stamp output.

-xdsdebug: Enable XDS debug data (lots of it).

-vides: Print debug info about the analysed elementary

video stream.

-cbraw: Print debug trace with the raw 608/708 data with

time stamps.

-nosync: Disable the syncing code. Only useful for debugging

purposes.

-fullbin: Disable the removal of trailing padding blocks

when exporting to bin format. Only useful for

for debugging purposes.

-parsedebug: Print debug info about the parsed container

file. (Only for TS/ASF files at the moment.)

-parsePAT: Print Program Association Table dump.

-parsePMT: Print Program Map Table dump.

-investigate_packets: If no CC packets are detected based on the PMT, try

to find data in all packets by scanning.

Teletext related options:

-tpage page: Use this page for subtitles (if this parameter

is not used, try to autodetect). In Spain the

page is always 888, may vary in other countries.

-tverbose: Enable verbose mode in the teletext decoder.

-teletext: Force teletext mode even if teletext is not detected.

If used, you should also pass -datapid to specify

the stream ID you want to process.

-noteletext: Disable teletext processing. This might be needed

for video streams that have both teletext packets

and CEA-608/708 packets (if teletext is processed

then CEA-608/708 processing is disabled).

Communication with other programs and console output:

--gui_mode_reports: Report progress and interesting events to stderr

in a easy to parse format. This is intended to be

used by other programs. See docs directory for.

details.

--no_progress_bar: Suppress the output of the progress bar

-quiet: Don't write any message.

Notes on the CEA-708 decoder: While it is starting to be useful, it's

a work in progress. A number of things don't work yet in the decoder

itself, and many of the auxiliary tools (case conversion to name one)

won't do anything yet. Feel free to submit samples that cause problems

and feature requests.

Notes on spupng output format:

One .xml file is created per output field. A set of .png files are created in

a directory with the same base name as the corresponding .xml file(s), but with

a .d extension. Each .png file will contain an image representing one caption

and named subNNNN.png, starting with sub0000.png.

For example, the command:

ccextractor -out=spupng input.mpg

will create the files:

input.xml

input.d/sub0000.png

input.d/sub0001.png

...

The command:

ccextractor -out=spupng -o /tmp/output -12 input.mpg

will create the files:

/tmp/output_1.xml

/tmp/output_1.d/sub0000.png

/tmp/output_1.d/sub0001.png

...

/tmp/output_2.xml

/tmp/output_2.d/sub0000.png

/tmp/output_2.d/sub0001.png

...

Error: (This help screen was shown because there were no input files)

CCExtractor

A free, GPL licensed closed caption tool

Command Line Usage