On October 19, the US Patent & Trademark Office published Apple’s patent application titled “Single pass constrained constant bit-rate encoding,” which was originally filed in April 2005.
Data, such as video data, is encoded by identifying a data segment to be encoded. The data segment includes multiple frames. A bit-rate profile for encoding the data segment is generated. The bit-rate profile defines a number of bits associated with each frame in the data segment. Frames are encoded using the bit-rate profile. The bit-rate profile is updated periodically to incorporate past encoding statistics and compensate for any encoding bits deviations from the initial profile.
Video Encoding and Decoding Process
Patent FIG. 1 is a block diagram for illustrating the operation of a video encoding and decoding process 100. An input video sequence 105 includes a series of sequential frames 110. A typical video stream, for example, uses a frame rate of thirty frames per second. The frames 110 are encoded by an encoder 115, such as an MPEG2-compliant encoder.
In MPEG2 encoding, the frames 110 are reordered from a display order to a transmission order, as known by persons of ordinary skill. In addition, the size of the encoded frames vary depending on the complexity of the frame and the type of dependency on another frame. Thus, the encoded video sequence 120 has a variable bit rate.
Many video applications, however, require video data to be provided at a constant bit rate. In such applications, the encoder 115 typically makes calculations to account for buffer loads and/or processing demands on a hypothetical decoder 155. The hypothetical decoder 155 may or may not represent an actual decoder but can be used to illustrate the conceptual design of the encoder 115. To account for the hypothetical decoder’s 155 need to receive data at a constant bit rate, video data is transferred over a data channel 130 at a constant bit rate. For example, transferring video data to and from a digital video camera generally uses a constant bit rate. The constant bit rate generally varies depending on the particular application. For example, video transmitted using DSL typically uses a rate of about 300K bits/second, while 1080i video encoding uses a rate of about 25 M bits/second. In a constant bit-rate encoder, video data is generally buffered to account for the reordering of frames by the encoder 115 and for the variable bit rate of the encoded video sequence.
Variations in the amount of buffered data are permitted within an allowable range. If data is buffered at a faster rate than data is used for decoding, an overflow condition will eventually occur, in which the amount of buffered data exceeds a maximum threshold. An overflow condition can occur, for example, if too many simple frames are encoded consecutively. An overflow condition can be easily addressed, however, by packing extra unused or unnecessary bits in a frame, or by intentionally encoding a frame in an inefficient manner. If data is used for decoding at a faster rate that data is buffered, an underflow condition will eventually occur, in which there is not enough buffered data to decode a video frame. An underflow condition can occur, for example, if too many complex frames are consecutively encoded. An encoder buffer 125 monitors an amount of buffered data. The encoder buffer 125 can be used to buffer data for storage on a storage medium or for transmission over a constant bit rate channel (e.g., data channel 130). In some cases, the encoder buffer 125 is not an actual buffer (e.g., if data is written to a file instead of sent over a data channel) but is a virtual buffer used to emulate the amount of buffering necessary to decode the encoded video sequence 120.
In the hypothetical decoder 155, data enters a virtual buffer verifier (VBV) 135 from the data channel 130 at a constant bit rate. In some cases, the virtual buffer verifier 135 is a calculated buffer level, which can but does not necessarily correspond to an actual buffer level. Data is extracted from the VBV 135 at a variable rate, depending on the amount of data used to encode each frame. In particular, a decoder 140 extracts video data from the buffer 135 at a variable rate that corresponds to the variable rate at which the encoder 115 generates the encoded video sequence 120. The decoder 140 decodes the extracted video data to produce a video sequence 145 having frames 150 corresponding to the input video sequence 105.
A state of a virtual buffer verifier for a constant bit-rate encoder
FIG. 2 is a graph 200 illustrating a state of a virtual buffer verifier for a constant bit-rate encoder. The state or level of the virtual buffer verifier can be measured in terms of a number of stored bits or a VBV delay, which represents an amount of time for the buffer to be filled to its current level. Because the buffer is filled at a constant bit rate, the number of stored bits and the VBV delay can be mapped to one another. For purposes of this description, the number of stored bits for the virtual buffer verifier and the VBV delay can be referred to as a calculated buffer level or calculated buffer state. The graph 200 illustrates a VBV level along a vertical axis 205 and time along a horizontal axis 210.
When encoding a constant bit-rate video sequence, the VBV level cannot exceed a maximum level 215 (e.g., 7,340,032 bits for an MPEG-2 main profile at high 1440 level) or fall below a minimum level 220 (e.g., 0 bits). The initial VBV level can be arbitrarily selected. In this example, the initial VBV level is set at the maximum VBV level. When a frame is displayed (or data corresponding to a frame is extracted from the buffer for decoding), the VBV level decreases (225) by an amount corresponding to the size of the encoded frame in number of bits. Because data enters the VBV at a constant rate corresponding to the bit rate of the data channel, the VBV is filled at a constant bit rate (230). Each frame is displayed for a predetermined amount of time (i.e., the frame duration 235), which corresponds to the frame rate. Accordingly, while each frame is displayed, the VBV fills at a constant rate until the next frame is displayed, at which time the VBV level again decreases (225) by an amount corresponding to the size of the next encoded frame. In this manner, the VBV level fluctuates over time.
To maintain the VBV level within the permissible range, an average target encoding bit rate is defined. The average target encoding bit rate corresponds to the constant bit rate associated with the data channel for the particular application. By attempting to track the average target encoding bit rate (e.g., following a frame having a relatively high encoding bit rate with a frame having a relatively low encoding bit rate to effectively average out the overall encoding bit rate), the VBV level can be maintained within the allowable range even though the encoding bit rate may vary widely for individual frames.
To prevent underflow or overflow conditions, a constant bit-rate encoder needs to monitor a buffer or VBV level and adjust encoding rates accordingly. At the same time, it is generally desirable to maintain the best possible video quality, which includes a consistent video quality over time. Accordingly, a constant bit-rate encoder generally attempts to avoid situations, for example, in which complex frames need to be encoded using a relatively small number of bits to avoid an underflow condition or in which simple frames are inefficiently encoded, thereby using large numbers of bits, at the expense of encoding quality of subsequent complex frames.
Conventional constant bit-rate encoders are unconstrained in that the buffer level can be arbitrary at the beginning of a video sequence and can end at any arbitrary level, provided that the buffer level remains within the maximum and minimum thresholds. In other words, constant bit-rate encoders do not have boundary conditions.
In accordance with some aspects of the present invention, a constrained constant bit-rate (CCBR) encoder enables buffer levels at both the beginning and end of a video sequence to be constrained to a particular level. In other words, the constrained constant bit-rate encoder can generate a video sequence that begins with a buffer level having a first particular value and ends with a buffer level having a second particular value. Such a result can be used, for example, to selectively re-encode edited portions of an overall video sequence, as described in U.S. patent application Ser. No. ______ (Attorney Docket No. P3836US1/18814-009001, entitled “SELECTIVE REENCODING FOR GOP CONFORMITY”, filed Apr. 15, 2005. Selective re-encoding can be used to allow editing of segments in a video sequence without needing to re-encode the entire sequence, which includes unedited segments. Edited segments can include segments to which video effects are added or segments from different sources that are spliced together. Alternatively or in addition, edited segments can include frame sequences that need to be re-encoded because they span a boundary of an added video effect or span a boundary between different video sources that are spliced together.
When a video segment within an overall video sequence is to be re-encoded, the buffer levels at the boundaries generally need to be continuous to comply with encoding standards and to avoid inadvertent drift in buffer levels, which can lead to overflow or underflow conditions. As shown in FIG. 2, a video segment can be added beginning at a first time 240 until a second time 245. The added video segment can be, for example, a modification of an original video segment from the same time period or a different video segment that is inserted into the video sequence. The unedited portions of the video sequence have corresponding buffer levels that fluctuate in accordance with a previous encoding of the video sequence. The buffer level 250 at the first time 240 is used as a starting buffer level for re-encoding the added video segment, and the buffer level 255 at the second time 245 is used as an ending buffer level for re-encoding the added video segment.
The added video segment is encoded in accordance with the average target encoding bit rate. In addition, the added video segment is encoded in a manner that avoids underflow and overflow conditions. Encoding is performed to maintain a substantially consistent video quality. For example, the encoding bit rate can initially be assumed to uniform for all frames to be encoded. The encoding bit rate can be adjusted, however, depending on the type of frame (e.g., an I-frame, P-frame, or B-frame) and/or the complexity of the frame. The added video segment is also encoded to ensure compliance with the starting and ending buffer levels. In some implementations, an ending buffer level can be targeted a small amount above the actual ending buffer level, and the excess buffer level can be cleared by adding extra bits to the current frame. This small amount provides some encoding simplification and flexibility so that the encoding does not have to produce a precise ending buffer level but can clear the small amount of extra buffer space by adding spare bits, which can be essentially ignored for purposes of decoding but that ensure buffer level continuity at the ending boundary. Alternatively or in addition, the effective target bit rate for one or more frames can be targeted a small amount below the actual budgeted target bit rate.
The inventors of patent application 20060233237 are: Jian Lu, Wenging Jiang and Gregory Kent Wallace.
NOTICE: MacNN presents only a brief summary of patents and/or trademarks with associated graphic(s) for journalistic news purposes as each such patent application and/or grant is revealed by the U.S. Patent & Trade Office. Readers are cautioned that the full text of any patent and/or trademark applications and/or grants should be read in its entirety for further details.
Written and researched by Neo.