captionstransformer

A set of tools (API + script) to read, write and transform captions from/to many formats
Download

captionstransformer Ranking & Summary

Advertisement

  • Rating:
  • License:
  • GPL
  • Price:
  • FREE
  • Publisher Name:
  • JeanMichel FRANCOIS
  • Publisher web site:
  • http://makina-corpus.org

captionstransformer Tags


captionstransformer Description

captionstransformer is a set of tools to transform captions from one format to another. You will find Writer and Reader for each format and a script if you want to use it in command-line.Supported Format:- sbv Reader and Writer- srt Reader and Writer- ttml Reader and Writer- transcript Reader and WriterHow to use (API)You can read the provided unittest to have complete examples:from captionstransformer.sbv import Readerfrom captionstransformer.ttml import Writerfrom StringIO import StringIOtest_content = StringIO(u"""0:00:03.490,0:00:07.430 > > FISHER: All right. So, let's begin.This session is: Going Social0:00:07.430,0:00:11.600with the YouTube APIs. I amJeff Fisher,0:00:11.600,0:00:14.009and this is Johann Hartmann,we're presenting today.0:00:14.009,0:00:15.889""")reader = Reader(test_content)captions = reader.read()len(captions) == 4first = captionstype(first.text) == unicodefirst.text == u" > > FISHER: All right. So, let's begin.\nThis session is: Going Social\n"# next get a writerfilelike = StringIO()writer = Writer(filelike)writer.set_captions(captions)text = writer.captions_to_text()text.startswith(u"""< tt xml:lang="" xmlns="http://www.w3.org/ns/ttml" >< body >< div >""")writer.write()writer.close()About FormatsThis quite hard to find simple documentation about existing caption format. Here is a set of existing named caption format:SubViewer (SUB):00:04:35.03,00:04:38.82Hello guys... please sit down...00:05:00.19,00:05:03.47M. Franklin,are you crazy?Youtube (SBV):0:00:03.490,0:00:07.430FISHER: All right. So, let's begin.This session is: Going Social0:00:07.430,0:00:11.600with the YouTube APIs. I amJeff Fisher,0:00:11.600,0:00:14.009and this is Johann Hartmann,we're presenting today.0:00:14.009,0:00:15.889SubRip (SRT):100:00:03,490 -- > 00:00:07,430FISHER: All right. So, let's begin.This session is: Going Social00:00:07,430 -- > 00:00:11,600with the YouTube APIs. I amJeff Fisher,200:00:11,600 -- > 00:00:14,009and this is Johann Hartmann,we're presenting today.300:00:14,009 -- > 00:00:15,889Timed Text Markup Language (TTML):< tt xml:lang="" xmlns="http://www.w3.org/ns/ttml" > < body region="subtitleArea" > < div > < p xml:id="subtitle1" begin="0.76s" end="3.45s" > It seems a paradox, does it not, < /p > < p xml:id="subtitle2" begin="5.0s" end="10.0s" > that the image formed on< br/ > the Retina should be inverted? < /p > < /div > < /body >< /tt >Returned by http://video.google.com/timedtext?lang=en&v=VIDEOID< ?xml version="1.0" encoding="utf-8" ? >< transcript > < text start="10" dur="2" >Hi, I'm Emily from Nomensa< /text > < text start="12" dur="3" >and today I'm going to be talking about the order of content on your pages.< /text > < text start="16" dur="6" >Making sure the content on your web pages is presented logically is a really important part of web accessibility.< /text > < text start="23" dur="2" >Page content should be ordered so it makes sense< /text >< /transcript >Microsoft SAMI (.sami, .smi):< SAMI >< Head > < Title >President John F. Kennedy Speech< /Title > < SAMIParam > Copyright {(C)Copyright 1997, Microsoft Corporation} Media {JF Kennedy.wav} Metrics {time:ms; duration: 73000;} Spec {MSFT:1.0;} < /SAMIParam >< /Head >< Body > < SYNC Start=0 > < P Class=ENUSCC ID=Source >Pres. John F. Kennedy < SYNC Start=10 > < P Class=ENUSCC >Let the word go forth, from this time and place to friend and foe alike that the torch< /Body >< /SAMI >Product's homepage


captionstransformer Related Software