VirtualBox

source: vbox/trunk/src/libs/libogg-1.3.5/doc/oggstream.html

Last change on this file was 96360, checked in by vboxsync, 2 years ago

libogg, libvorbis: export to OSE

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 23.2 KB
Line 
1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2<html>
3<head>
4
5<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
6<title>Ogg Documentation</title>
7
8<style type="text/css">
9body {
10 margin: 0 18px 0 18px;
11 padding-bottom: 30px;
12 font-family: Verdana, Arial, Helvetica, sans-serif;
13 color: #333333;
14 font-size: .8em;
15}
16
17a {
18 color: #3366cc;
19}
20
21img {
22 border: 0;
23}
24
25#xiphlogo {
26 margin: 30px 0 16px 0;
27}
28
29#content p {
30 line-height: 1.4;
31}
32
33h1, h1 a, h2, h2 a, h3, h3 a {
34 font-weight: bold;
35 color: #ff9900;
36 margin: 1.3em 0 8px 0;
37}
38
39h1 {
40 font-size: 1.3em;
41}
42
43h2 {
44 font-size: 1.2em;
45}
46
47h3 {
48 font-size: 1.1em;
49}
50
51li {
52 line-height: 1.4;
53}
54
55#copyright {
56 margin-top: 30px;
57 line-height: 1.5em;
58 text-align: center;
59 font-size: .8em;
60 color: #888888;
61 clear: both;
62}
63
64.caption {
65 color: #000000;
66 background-color: #aabbff;
67 margin: 1em;
68 margin-left: 2em;
69 margin-right: 2em;
70 padding: 1em;
71 padding-bottom: 0em;
72 overflow: hidden;
73}
74
75.caption p {
76 clear: none;
77}
78
79.caption img {
80 display: block;
81 margin: 0px;
82 margin-left: auto;
83 margin-right: auto;
84 margin-bottom: 1.5em;
85 background-color: #ffffff;
86 padding: 10px;
87}
88
89#thepage {
90 margin-left: auto;
91 margin-right: auto;
92 width: 840px;
93}
94
95</style>
96
97</head>
98
99<body>
100<div id="thepage">
101
102<div id="xiphlogo">
103 <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
104</div>
105
106<h1>Ogg bitstream overview</h1>
107
108<p>This document serves as starting point for understanding the design
109and implementation of the Ogg container format. If you're new to Ogg
110or merely want a high-level technical overview, start reading here.
111Other documents linked from the <a href="index.html">index page</a>
112give distilled technical descriptions and references of the container
113mechanisms. This document is intended to aid understanding.
114
115<h2>Container format design points</h2>
116
117<p>Ogg is intended to be a simplest-possible container, concerned only
118with framing, ordering, and interleave. It can be used as a stream delivery
119mechanism, for media file storage, or as a building block toward
120implementing a more complex, non-linear container (for example, see
121the <a href="skeleton.html">Skeleton</a> or <a
122href="http://en.wikipedia.org/wiki/Annodex">Annodex/CMML</a>).
123
124<p>The Ogg container is not intended to be a monolithic
125'kitchen-sink'. It exists only to frame and deliver in-order stream
126data and as such is vastly simpler than most other containers.
127Elementary and multiplexed streams are both constructed entirely from a
128single building block (an Ogg page) comprised of eight fields
129totalling twenty-eight bytes (the page header) a list of packet lengths
130(up to 255 bytes) and payload data (up to 65025 bytes). The structure
131of every page is the same. There are no optional fields or alternate
132encodings.
133
134<p>Stream and media metadata is contained in Ogg and not built into
135the Ogg container itself. Metadata is thus compartmentalized and
136layered rather than part of a monolithic design, an especially good
137idea as no two groups seem able to agree on what a complete or
138complete-enough metadata set should be. In this way, the container and
139container implementation are isolated from unnecessary metadata design
140flux.
141
142<h3>Streaming</h3>
143
144<p>The Ogg container is primarily a streaming format,
145encapsulating chronological, time-linear mixed media into a single
146delivery stream or file. The design is such that an application can
147always encode and/or decode all features of a bitstream in one pass
148with no seeking and minimal buffering. Seeking to provide optimized
149encoding (such as two-pass encoding) or interactive decoding (such as
150scrubbing or instant replay) is not disallowed or discouraged, however
151no container feature requires nonlinear access of the bitstream.
152
153<h3>Variable Bit Rate, Variable Payload Size</h3>
154
155<p>Ogg is designed to contain any size data payload with bounded,
156predictable efficiency. Ogg packets have no maximum size and a
157zero-byte minimum size. There is no restriction on size changes from
158packet to packet. Variable size packets do not require the use of any
159optional or additional container features. There is no optimal
160suggested packet size, though special consideration was paid to make
161sure 50-200 byte packets were no less efficient than larger packet
162sizes. The original design criteria was a 2% overhead at 50 byte
163packets, dropping to a maximum working overhead of 1% with larger
164packets, and a typical working overhead of .5-.7% for most practical
165uses.
166
167<h3>Simple pagination</h3>
168
169<p>Ogg is a byte-aligned container with no context-dependent, optional
170or variable-length fields. Ogg requires no repacking of codec data.
171The page structure is written out in-line as packet data is submitted
172to the streaming abstraction. In addition, it is possible to
173implement both Ogg mux and demux as MT-hot zero-copy abstractions (as
174is done in the Tremor sourcebase).
175
176<h3>Capture</h3>
177
178<p>Ogg is designed for efficient and immediate stream capture with
179high confidence. Although packets have no size limit in Ogg, pages
180are a maximum of just under 64kB meaning that any Ogg stream can be
181captured with confidence after seeing 128kB of data or less [worst
182case; typical figure is 6kB] from any random starting point in the
183stream.
184
185<h3>Seeking</h3>
186
187<p>Ogg implements simple coarse- and fine-grained seeking by design.
188
189<p>Coarse seeking may be performed by simply 'moving the tone arm' to a
190new position and 'dropping the needle'. Rapid capture with
191accompanying timecode from any location in an Ogg file is guaranteed
192by the stream design. From the acquisition of the first timecode,
193all data needed to play back from that time code forward is ahead of
194the stream cursor.
195
196<p>Ogg implements full sample-granularity seeking using an
197interpolated bisection search built on the capture and timecode
198mechanisms used by coarse seeking. As above, once a search finds
199the desired timecode, all data needed to play back from that time code
200forward is ahead of the stream cursor.
201
202<p>Both coarse and fine seeking use the page structure and sequencing
203inherent to the Ogg format. All Ogg streams are fully seekable from
204creation; seekability is unaffected by truncation or missing data, and
205is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor
206heuristic.
207
208<p>Seeking without use of an index is a major point of the Ogg
209design. There two primary reasons why Ogg transport forgoes an index:
210
211<ol>
212
213<li>An index is only marginally useful in Ogg for the complexity
214added; it adds no new functionality and seldom improves performance
215noticeably. Empirical testing shows that indexless interpolation
216search does not require many more seeks in practice than using an
217index would.
218
219<li>'Optional' indexes encourage lazy implementations that can seek
220only when indexes are present, or that implement indexless seeking
221only by building an internal index after reading the entire file
222beginning to end. This has been the fate of other containers that
223specify optional indexing.
224
225</ol>
226
227<p>In addition, it must be possible to create an Ogg stream in a
228single pass. Although an optional index can simply be tacked on the
229end of the created stream, some software groups object to
230end-positioned indexes and claim to be unwilling to support indexes
231not located at the stream beginning.
232
233<p><i>All this said, it's become clear that an optional index is a
234demanded feature. For this reason, the <a
235href="http://wiki.xiph.org/Ogg_Index">OggSkeleton now defines a
236proposed index.</a></i>
237
238<h3>Simple multiplexing</h3>
239
240<p>Ogg multiplexes streams by interleaving pages from multiple elementary streams into a
241multiplexed stream in time order. The multiplexed pages are not
242altered. Muxing an Ogg AV stream out of separate audio,
243video and data streams is akin to shuffling several decks of cards
244together into a single deck; the cards themselves remain unchanged.
245Demultiplexing is similarly simple (as the cards are marked).
246
247<p>The goal of this design is to make the mux/demux operation as
248trivial as possible to allow live streaming systems to build and
249rebuild streams on the fly with minimal CPU usage and no additional
250storage or latency requirements.
251
252<h3>Continuous and Discontinuous Media</h3>
253
254<p>Ogg streams belong to one of two categories, "Continuous" streams and
255"Discontinuous" streams.
256
257<p>A stream that provides a gapless, time-continuous media type with a
258fine-grained timebase is considered to be 'Continuous'. A continuous
259stream should never be starved of data. Examples of continuous data
260types include broadcast audio and video.
261
262<p>A stream that delivers data in a potentially irregular pattern or
263with widely spaced timing gaps is considered to be 'Discontinuous'. A
264discontinuous stream may be best thought of as data representing
265scattered events; although they happen in order, they are typically
266unconnected data often located far apart. One example of a
267discontinuous stream types would be captioning such as <a
268href="http://wiki.xiph.org/OggKate">Ogg Kate</a>. Although it's
269possible to design captions as a continuous stream type, it's most
270natural to think of captions as widely spaced pieces of text with
271little happening between.
272
273<p>The fundamental reason for distinction between continuous and
274discontinuous streams concerns buffering.
275
276<h3>Buffering</h3>
277
278<p>A continuous stream is, by definition, gapless. Ogg buffering is based
279on the simple premise of never allowing an active continuous stream
280to starve for data during decode; buffering works ahead until all
281continuous streams in a physical stream have data ready and no further.
282
283<p>Discontinuous stream data is not assumed to be predictable. The
284buffering design takes discontinuous data 'as it comes' rather than
285working ahead to look for future discontinuous data for a potentially
286unbounded period. Thus, the buffering process makes no attempt to fill
287discontinuous stream buffers; their pages simply 'fall out' of the
288stream when continuous streams are handled properly.
289
290<p>Buffering requirements in this design need not be explicitly
291declared or managed in the encoded stream. The decoder simply reads as
292much data as is necessary to keep all continuous stream types gapless
293and no more, with discontinuous data processed as it arrives in the
294continuous data. Buffering is implicitly optimal for the given
295stream. Because all pages of all data types are stamped with absolute
296timing information within the stream, inter-stream synchronization
297timing is always maintained without the need for explicitly declared
298buffer-ahead hinting.
299
300<h3>Codec metadata</h3>
301
302<p>Ogg does not replicate codec-specific metadata into the mux layer
303in an attempt to make the mux and codec layer implementations 'fully
304separable'. Things like specific timebase, keyframing strategy, frame
305duration, etc, do not appear in the Ogg container. The mux layer is,
306instead, expected to query a codec through a centralized interface,
307left to the implementation, for this data when it is needed.
308
309<p>Though modern design wisdom usually prefers to predict all possible
310needs of current and future codecs then embed these dependencies and
311the required metadata into the container itself, this strategy
312increases container specification complexity, fragility, and rigidity.
313The mux and codec code becomes more independent, but the
314specifications become logically less independent. A codec can't do
315what a container hasn't already provided for. Novel codecs are harder
316to support, and you can do fewer useful things with the ones you've
317already got (eg, try to make a good splitter without using any codecs.
318Such a splitter is limited to splitting at keyframes only, or building
319yet another new mechanism into the container layer to mark what frames
320to skip displaying).
321
322<p>Ogg's design goes the opposite direction, where the specification
323is to be as simple, easy to understand, and 'proofed' against novel
324codecs as possible. When an Ogg mux layer requires codec-specific
325information, it queries the codec (or a codec stub). This trades a
326more complex implementation for a simpler, more flexible
327specification.
328
329<h3>Stream structure metadata</h3>
330
331<p>The Ogg container itself does not define a metadata system for
332declaring the structure and interrelations between multiple media
333types in a muxed stream. That is, the Ogg container itself does not
334specify data like 'which steam is the subtitle stream?' or 'which
335video stream is the primary angle?'. This metadata still exists, but
336is stored by the Ogg container rather than being built into the Ogg
337container itself. Xiph specifies the 'Skeleton' metadata format for Ogg
338streams, but this decoupling of container and stream structure
339metadata means it is possible to use Ogg with any metadata
340specification without altering the container itself, or without stream
341structure metadata at all.
342
343<h3>Frame accurate absolute position</h3>
344
345<p>Every Ogg page is stamped with a 64 bit 'granule position' that
346serves as an absolute timestamp for mux and seeking. A few nifty
347little tricks are usually also embedded in the granpos state, but
348we'll leave those aside for the moment (strictly speaking, they're
349part of each codec's mapping, not Ogg).
350
351<p>As previously mentioned above, granule positions are mapped into
352absolute timestamps by the codec, rather than being a hard timestamp.
353This allows maximally efficient use of the available 64 bits to
354address every sample/frame position without approximation while
355supporting new and previously unknown timebase encodings without
356needing to extend or update the mux layer. When a codec needs a novel
357timebase, it simply brings the code for that mapping along with it.
358This is not a theoretical curiosity; new, wholly novel timebases were
359deployed with the adoption of both Theora and Dirac. "Rolling INTRA"
360(keyframeless video) also benefits from novel use of the granule
361position.
362
363<h2>Ogg stream arrangement</h2>
364
365<h3>Packets, pages, and bitstreams</h3>
366
367<p>Ogg codecs place raw compressed data into <em>packets</em>.
368Packets are octet payloads containing the data needed for a single
369decompressed unit, eg, one video frame. Packets have no maximum size
370and may be zero length. They do not generally have any framing
371information; strung together, the unframed packets form a <em>logical
372bitstream</em> of codec data with no internal landmarks.
373
374<div class="caption">
375 <img src="packets.png">
376
377 <p> Packets of raw codec data are not typically internally framed.
378 When they are strung together into a stream without any container to
379 provide framing, they lose their individual boundaries. Seek and
380 capture are not possible within an unframed stream, and for many
381 codecs with variable length payloads and/or early-packet termination
382 (such as Vorbis), it may become impossible to recover the original
383 frame boundaries even if the stream is scanned linearly from
384 beginning to end.
385
386</div>
387
388<p>Logical bitstream packets are grouped and framed into Ogg pages
389along with a unique stream <em>serial number</em> to produce a
390<em>physical bitstream</em>. An <em>elementary stream</em> is a
391physical bitstream containing only a single logical bitstream. Each
392page is a self contained entity, although a packet may be split and
393encoded across one or more pages. The page decode mechanism is
394designed to recognize, verify and handle single pages at a time from
395the overall bitstream.
396
397<div class="caption">
398 <img src="pages.png">
399
400 <p> The primary purpose of a container is to provide framing for raw
401 packets, marking the packet boundaries so the exact packets can be
402 retrieved for decode later. The container also provides secondary
403 functions such as capture, timestamping, sequencing, stream
404 identification and so on. Not all of these functions are represented in the diagram.
405
406 <p>In the Ogg container, pages do not necessarily contain
407 integer numbers of packets. Packets may span across page boundaries
408 or even multiple pages. This is necessary as pages have a maximum
409 possible size in order to provide capture guarantees, but packet
410 size is unbounded.
411</div>
412
413
414<p><a href="framing.html">Ogg Bitstream Framing</a> specifies
415the page format of an Ogg bitstream, the packet coding process
416and elementary bitstreams in detail.
417
418<h3>Multiplexed bitstreams</h3>
419
420<p>Multiple logical/elementary bitstreams can be combined into a single
421<em>multiplexed bitstream</em> by interleaving whole pages from each
422contributing elementary stream in time order. The result is a single
423physical stream that multiplexes and frames multiple logical streams.
424Each logical stream is identified by the unique stream serial number
425stamped in its pages. A physical stream may include a 'meta-header'
426(such as the <a href="skeleton.html">Ogg Skeleton</a>) comprising its
427own Ogg page at the beginning of the physical stream. A decoder
428recovers the original logical/elementary bitstreams out of the
429physical bitstream by taking the pages in order from the physical
430bitstream and redirecting them into the appropriate logical decoding
431entity.
432
433<div class="caption">
434 <img src="multiplex1.png">
435
436<p>Multiple media types are mutliplexed into a single Ogg stream by
437interleaving the pages from each elementary physical stream.
438
439</div>
440
441<p><a href="ogg-multiplex.html">Ogg Bitstream Multiplexing</a> specifies
442proper multiplexing of an Ogg bitstream in detail.
443
444<h3>Chaining</h3>
445
446<p>Multiple Ogg physical bitstreams may be concatenated into a single new
447stream; this is <em>chaining</em>. The bitstreams do not overlap; the
448final page of a given logical bitstream is immediately followed by the
449initial page of the next.</p>
450
451<p>Each logical bitstream in a chain must have a unique serial number
452within the scope of the full physical bitstream, not only within a
453particular <em>link</em> or <em>segment</em> of the chain.</p>
454
455<h3>Continuous and discontinuous streams</h3>
456
457<p>Within Ogg, each stream must be declared (by the codec) to be
458continuous- or discontinuous-time. Most codecs treat all streams they
459use as either inherently continuous- or discontinuous-time, although
460this is not a requirement. A codec may, as part of its mapping, choose
461according to data in the initial header.
462
463<p>Continuous-time pages are stamped by end-time, discontinuous pages
464are stamped by begin-time. Pages in a multiplexed stream are
465interleaved in order of the time stamp regardless of stream type.
466Both continuous and discontinuous logical streams are used to seek
467within a physical stream, however only continuous streams are used to
468determine buffering depth; because discontinuous streams are stamped
469by start time, they will always 'fall out' at the proper time when
470buffering the continuous streams. See 'Examples' for an illustration
471of the buffering mechanism.
472
473<h2>Multiplexing Requirements</h2>
474
475<p>Multiplexing requirements within Ogg are straightforward. When
476constructing a single-link (unchained) physical bitstream consisting
477of multiple elementary streams:
478
479<ol>
480
481<li><p> The initial header for each stream appears in sequence, each
482header on a single page. All initial headers must appear with no
483intervening data (no auxiliary header pages or packets, no data pages
484or packets). Order of the initial headers is unspecified. The
485'beginning of stream' flag is set on each initial header.
486
487<li><p> All auxiliary headers for all streams must follow. Order
488is unspecified. The final auxiliary header of each stream must flush
489its page.
490
491<li><p>Data pages for each stream follow, interleaved in time order.
492
493<li><p>The final page of each stream sets the 'end of stream' flag.
494Unlike initial pages, terminal pages for the logical bitstreams need
495not occur contiguously; indeed it may not be possible for them to do so.
496</oL>
497
498<p><p>Each grouped bitstream must have a unique serial number within the
499scope of the physical bitstream.</p>
500
501<h3>chaining and multiplexing</h3>
502
503<p>Multiplexed and/or unmultiplexed bitstreams may be chained
504consecutively. Such a physical bitstream obeys all the rules of both
505chained and multiplexed streams. Each link, when unchained, must
506stand on its own as a valid physical bitstream. Chained streams do
507not mix or interleave; a new segment may not begin until all streams
508in the preceding segment have terminated. </p>
509
510<h2>Codec Mapping Requirements</h2>
511
512<p>Each codec is allowed some freedom in deciding how its logical
513bitstream is encapsulated into an Ogg bitstream (even if it is a
514trivial mapping, eg, 'plop the packets in and go'). This is the
515codec's <em>mapping</em>. Ogg imposes a few mapping requirements
516on any codec.
517
518<ol>
519
520<li><p>The <a href="framing.html">framing specification</a> defines
521'beginning of stream' and 'end of stream' page markers via a header
522flag (it is possible for a stream to consist of a single page). A
523correct stream always consists of an integer number of pages, an easy
524requirement given the variable size nature of pages.</p>
525
526<li><p>The first page of an elementary Ogg bitstream consists of a single,
527small 'initial header' packet that must include sufficient information
528to identify the exact CODEC type. From this initial header, the codec
529must also be able to determine its timebase and whether or not it is a
530continuous- or discontinuous-time stream. The initial header must fit
531on a single page. If a codec makes use of auxiliary headers (for
532example, Vorbis uses two auxiliary headers), these headers must follow
533the initial header immediately. The last header finishes its page;
534data begins on a fresh page.
535
536<p><p>As an example, Ogg Vorbis places the name and revision of the
537Vorbis CODEC, the audio rate and the audio quality into this initial
538header. Vorbis comments and detailed codec setup appears in the larger
539auxiliary headers.</p>
540
541<li><p>Granule positions must be translatable to an exact absolute
542time value. As described above, the mux layer is permitted to query a
543codec or codec stub plugin to perform this mapping. It is not
544necessary for an absolute time to be mappable into a single unique
545granule position value.
546
547<li><p>Codecs are not required to use a fixed duration-per-packet (for
548example, Vorbis does not). the mux layer is permitted to query a
549codec or codec stub plugin for the time duration of a packet.
550
551<li><p>Although an absolute time need not be translatable to a unique
552granule position, a codec must be able to determine the unique granule
553position of the current packet using the granule position of a
554preceding packet.
555
556<li><p>Packets and pages must be arranged in ascending
557granule-position and time order.
558
559</ol>
560
561<h2>Examples</h2>
562
563<em>[More to come shortly; this section is currently being revised and expanded]</em>
564
565<p>Below, we present an example of a multiplexed and chained bitstream:</p>
566
567<p><img src="stream.png" alt="stream"/></p>
568
569<p>In this example, we see pages from five total logical bitstreams
570multiplexed into a physical bitstream. Note the following
571characteristics:</p>
572
573<ol>
574<li>Multiplexed bitstreams in a given link begin together; all of the
575initial pages must appear before any data pages. When concurrently
576multiplexed groups are chained, the new group does not begin until all
577the bitstreams in the previous group have terminated.</li>
578
579<li>The ordering of pages of concurrently multiplexed bitstreams is
580goverened by timestamp (not shown here); there is no regular
581interleaving order. Pages within a logical bitstream appear in
582sequence order.</li>
583</ol>
584
585<div id="copyright">
586 The Xiph Fish Logo is a
587 trademark (&trade;) of Xiph.Org.<br/>
588
589 These pages &copy; 1994 - 2010 Xiph.Org. All rights reserved.
590</div>
591
592</div>
593</body>
594</html>
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette