Posted by: digitalpelican | May 3, 2009

Book Review: The Technology of Audio and Video Streaming

Ever want to know more about the nuts and bolts of how digital video works, how the magic of web video happens?  Ok, it’s not on the top of most people’s list, but I have a geeky side and I’ve given up trying to deny it.  Occasionally I put on my propeller beanie and pretend I understand technology.  Enter “The Technology of Audio and Video Streaming” (Kindle version) by David Austerberry (Second edition, 2005, ISBN: 9780240805801).

Reading Technology of Audio and Video Streaming, I found myself wishing there was a companion “Dummy” version in some of the more technical areas.  I got the impression Austerberry’s approach would be a perfect as the starting place for the server guy in the IT department who has just been given the job of supporting the new streaming media servers.  There is considerable attention paid to Internet transportation protocols specific for streaming media applications and the associated file types.  Which is fine, if that’s your thing.  I wished Austerberry had given more context to help the camera and microphone people understand how the protocols work and why they are out there.

At the same time, I enjoyed the data dump the book gives on what is going on under the hood to make digital media work.  As far as this book is concerned, video and audio is just another data type.  I came away with a sense of appreciation for all those very bright people who figured out the math and engineering to make it all work.  I also have a better understanding of why we need high compression formats to make the pictures on our TVs and monitors work.  Actually, I’m sort of in awe it works as well as it does.  Consider:

A standard video frame is 720 x 483 pixels and is sampled at 13.5 MHz at 8-bit (R=8 + G=8 + B=8) depth at 30 frames per second for a total stream of 248 Mbits/second of data without any synchronization or control information.  Add the synch and control data, and we’re talking about 270 Mbits/s for what is know as “601” quality.  That’s a lot of ones and zeros that have to arrive in the right order at the right time.  Remember the late 1990’s when people still had 56K dial-up modems?  That means to have video play on a dial-up connection, the video data had to be reduced by a factor of about 4,000!  And we were complaining about postage stamp size video windows in our browsers

Ever hear of 4:2:2 video?  This comes from the number of times the analog video signal is sampled and has been adapted from earlier composite digital video systems.  The analog signal was sampled four times the rate of the sub-carrier rate (4 x 3.375 = 13.5 MHz).  Today, this approach as been adopted for digital video.  So, the “4” in 4:2:2 means the luminance (dark-light) information was sampled four times, and the two “2s” means the color components of the signal were only sampled two times.  With this emphasis on the luminance information, the compression approach is similar to that used to create JPEG files.  Evidentially, we humans get a lot of our visual information from the dark-light or grayscale aspect of the visual world and not so much from the color information.  We can lose a lot of that color information without any problem.  There is also a considerable data rate benefit to this approach of only using sampled information from the original analog signal.  4:2:2 is a 2/3 reduction over the original analog signal size and 4:1:1 (used by some formats) is a reduction of ½ the original.

Next, the heavy lifting of compression, temporal compression.  It’s used in formats such as MPEG-1, 2, and 4.  This is a real simplification of the idea of temporal compression, but it only updates the information in a frame that has changed since the last frame.  That’s basically what’s going on.  Rather than update each frame with completely new information (i.e., the complete data needed for an entire video frame), they mostly just update that part that changed.  This is the basics of how video is compressed for use on cable channels, DVDs, and satellites.  Austerberry moves on to talk about some useful hints and tips to improve your video for compression for web delivery.

It seems noise in the video signal is a major file size inflator.  The compression application (i.e., “codec” for coder-decoder) doesn’t know what part of the video is information and what part is noise.  Reducing the noise in the video signal, as well as the audio signal, is an important step in getting video ready for the web.  Audio for web videos is like AM radio, the intelligibility is improved if the dynamic range is compressed, in this case, this means the average loudness of the signal is held relatively constant.

Two more file size reduction steps: de-interlace the video (if it was shot as an interlaced format) and crop the frame size down to the safe action area.  The reason for this last step is that web video is seen as full-frame video unlike the stuff we broadcast or distribute via cable to our homes.

A few more video for the web tips or How to Shoot for the Small Screen:

  • Move in close
  • Keep the frame simple and uncluttered
  • The less motion, the better
  • Keep the camera still – use a tripod
  • Let the subject move around in a still frame
  • If you have to move the camera, use a dolly or steadycam-like device for smooth movement
  • Avoid panning or tilting shots



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: