Dr. Dobb's | Architectures for Forensic Watermarking

Architectures for Forensic Watermarking

This paper introduces watermarking concepts and describes how forensic watermarking is positioned in that framework. It then develops a system model and identifies architectural alternatives for forensic watermark implementation.

December 05, 2006
URL:http://drdobbs.com/security/architectures-for-forensic-watermarking/196601786

Background
An ever increasing variety of entertainment content is being distributed under control of DRM technology. DRM imposes limitations on the storage and rendering (playback) of the content, consistent with the agreement between the content provider and the consumer. Such agreements typically constrain the consumer's ability to make copies or redistribute the content, but DRM technology alone cannot enforce this aspect of the agreement. Once the DRM decrypts (unscrambles) the content to plain-text, it leaves the logical protection domain of the DRM.

Figure 1: Content Exposure

Common exposures are shown in Figure 1. Technologies devised to protect content subsequent to decryption, such as link encryption and local storage encryption, have been routinely circumvented. The chief difficulty is that the content must eventually pass through an "analog hole", to be rendered in a form that the consumer can enjoy. Additionally, many playback devices expose digital plain-text content internally, where it is available to anyone able to probe or eavesdrop inside the device. An internal physical security perimeter can be implemented to deter probing, but adds significant cost. These protection mechanisms are categorized as preventative security controls in that they inhibit consumer behaviors associated with content misuse or piracy. Such controls, if rigidly implemented, can frustrate even legitimate consumer activities, while ultimately failing to deter a determined adversary. Consequently, interest has arisen in an alternative "investigative" control called forensic watermarking, as a complement to DRM technology. Investigative controls tend to be more palatable, since they do not interfere with consumer activity, but rather expose violations of the consumer's agreement with the content provider.

Forensic watermarking embeds information into the content to facilitate the tracing of unauthorized copies back to the last legitimate handling of the content - where the forensic watermark was applied. Forensic watermarks may carry such information as a content purchase transaction identifier, the account under which the content was delivered, the identity of the equipment enforcing the DRM, and the date and time of rendering. Broadly, its purpose is to aid investigators in locating leaks in the content distribution chain, and to provide evidence to support the appropriate corrective action.

Forensic watermarking has also been referred to as fingerprinting - reasonable in that the rendering device leaves its mark on the content that it touches. The term fingerprinting has become ambiguous, however: it is also used to identify the abstraction of an identifying signature from content for the purpose of identifying the content. While watermarks can also be used to identify content, fingerprinting in this context is a fundamentally different technology.

Watermarking Concepts
In the broadest sense, watermarking is the incorporation of some data into content "essence" in such a way so that the data can subsequently be read (recovered) from copies of content. By essence we mean the data representing the picture and / or sound, and not some other side channel. The process of creating a watermark in a content item is called watermark embedding, and the additional data, in its embedded form, is called the watermark. The process of reading the watermark from a copy of the content is called watermark recovery.

Watermark technology is applicable to many media, but is commonly discussed in terms of motion video. This paper will take that perspective, although most of the concepts are directly translatable to sound or even to still images. Content will be consumed (viewed) using a variety of equipment types, commonly set-top boxes. But for the sake of generality, this paper refers to the content playback equipment as a "rendering device".

It is useful to conceptualize watermarking as a communications or signal processing problem. In this paradigm, the watermark information becomes the data to be communicated, and the content becomes the carrier signal. Indeed, the content carrying the watermark is often referred to as the host signal. From a communications standpoint, the perceptible features of the content constitute noise that interferes with the watermark's information signal. The watermark itself may consist of changes to the host signal features; the watermark may thus be viewed as modulating the host signal. In either case, the host signal constitutes a very noisy carrier for the watermark.

Common attributes of watermarks and watermark systems are:
Perceptible vs. Imperceptible Watermarks - Watermarks may either be apparent to the consumer when the content is rendered, or disguised in such a way that the viewer is unlikely to notice the watermark's presence. Perceptible watermarks are commonly used to designate ownership, exemplified by the quite visible logo appearing in many network broadcasts. Forensic watermarks, on the other hand, are typically designed to be imperceptible, in order to secure the watermark information and to avoid degrading the content. In general, watermarks fall along a continuum of perceptibility, according to the needs of the users and the capabilities of the technology. The field of steganography, the technology of hiding messages in content such that the casual observer is unaware of the message's existence, includes imperceptible watermarking.

Readable vs. Detectable Watermarks - A watermark may carry only a single bit of information, that is, the watermark is significant only in its presence or absence. Such watermarks are classified as detectable. A readable watermark, by contrast, contains a more complex message, typically many bits of information. Mathematically, a readable watermark with N bits of information could be conceptualized as having been chosen from a set of 2N detectable watermarks. For a message of useful length, the number of marks in the set becomes unmanageable, so a practical readable watermark implementation must include a means of decomposing the watermark to reconstruct the message from independent parts.

Forensic watermarks must carry a message: a readable watermark, or a series of detectable watermarks is required. In the latter case, the message is treated as a series of independent bits, each of which is represented by a single detectable watermark.

This architecture provides the following advantages:
Bandwidth - Bandwidth refers to the amount of data conveyed by the watermark, in proportion to the amount of content carrying the watermark. For multi-media content, bandwidth is typically expressed in terms of message bits per second. Forensic watermarking makes only modest bandwidth demands: DCI requires only 35 message bits in each 5 minute segment of a motion picture (~.117 bit/sec). ¹

In interpreting bandwidth, however, it is important to distinguish between the original message and an encoded message. Forensic watermarking implementers may apply multiple layers of error control coding (ECC) to the message, to compensate for the "noise" in the channel. Such coding can expand the message several fold, thereby reducing the effective bandwidth of the watermarking technique by the same factor. It is also common to embed several copies of the message into the content. For a robust implementation, the bandwidth requirement is many times that implied by the message length.

Robustness - Robustness is the degree to which the watermark can remain viable against the various transformations the content undergoes before reaching the recovery process. An effective forensic watermarking system must be robust against operations such as rescaling, resampling, recompression, cropping, rotation, resolution changes, deinterlacing, gamma changes, and temporal averaging, all of which may occur in the course of pirating the content. Additionally, a pirate may undertake attacks to directly suppress the watermark by filtering, noise addition, collusion or other signal processing techniques.

Although no watermarking technique can by unconditionally robust, the most effective techniques should require the adversary either to apply an unreasonable amount of effort or to unacceptably degrade the content, in order to successfully suppress the watermark. Viewed as a signal processing problem, robustness tends to increase with the energy of the watermark signal, which is dependent upon the watermark's intensity, size, and duration. Paradoxically, however, if the signal intensity level reaches the threshold of perceptibility, its nature and location become apparent to the attacker, thereby compromising robustness. Consequently, the watermark strength must be carefully calibrated to achieve maximum robustness.

As mentioned previously, error control coding is important to attaining robustness. Alterations to the content may erase or distort significant portions of the watermark signal. Effective recovery must include mechanisms to compensate for missing or erroneous signal segments. In signal processing terms, the watermark system constitutes an extremely noisy channel, requiring aggressive error control.

Renewability - Content pirates have unfailingly adapted to media security technology. Watermarking will not be spared. As watermarking technology is deployed, adversaries will build tools to suppress the watermark. As such tools become widely available, the targeted watermark technology is rendered ineffective. Thus the ability to evolve to more robust watermarking techniques, and to vary the watermark signal, is a hallmark of an effective system.

Flexibility - Analogous to renewability, flexibility describes the ease of adapting watermarking to the needs presented by specific content items. The universe of content includes a broad spectrum of exposures to piracy, as well as sensitivity to quality. Thus content providers may favor watermark perceptibility-robustness tradeoffs that differ by content item and according to content provider policy. Ideally, a watermark system should provide a means of control, to effect the content provider's preferences.

SINGLE-ENDED EMBEDDING ARCHITECTURE
The process of generating a watermark signal and incorporating it into the content is called embedding. From a very high level, the watermark embedding reference model is depicted in Figure 2:

Figure 2: Basic Embedding Model Of course, this reference model tells us very little about what is actually happening. A more detailed model, in the context of the rendering device, in Figure 3 below, is more informative:

Figure 3: Detailed Blind Embedding Model

The protected content passes through the DRM where usage rights are verified and the content decrypted. The content plain-text then passes through a decoder to produce the baseband signal. The DRM provides some information to a Forensic Message Generation process. The Forensic Message is encoded for error control, to form a Forensic Message codeword. Finally, the codeword is transformed into a watermark signal suitable for modulating the content.

This model is called blind embedding because the watermark generator is insensitive to the nature of the content. Blind embedding is of relatively low complexity, but does not exploit local characteristics of the content to make the watermark less perceptible. The watermark must be of low enough energy so as not to damage sensitive areas of the content (low textured objects), but strong enough to be recovered from a degraded copy. It is thus challenging to strike a satisfactory balance between perceptibility and robustness in the blind embedding model.

A more sophisticated approach, termed informed embedding, analyzes the content and modulates the watermark signal so as to take advantage of the masking properties of the content. In the video domain, masking is a function of spatial and temporal frequency, brightness, contrast, and edge orientation. By choosing propitious locations, and by tailoring watermark energy and other characteristics to the local content, informed techniques can embed a much more robust watermark than can a blind embedding technique, without exceeding perceptibility thresholds. The detailed model for informed embedding is shown in Figure 4:

Figure 4: Informed Embedding

The preceding models are termed single ended as their implementation is entirely within the rendering device. Single ended watermarking architecture presents the implementer with a few challenges. The content analysis block must perform complex calculations on the baseband video signal. Data rates are typically in the tens of megabits per second, and greater for HD content. The need to examine entire frames, as well as intensive computation to generate the watermark signal, are likely to drive a requirement for significant buffering where the watermark and content are merged. The processing required to perform the content analysis step is also likely to add significant expense to the device.

Figure 5: Security Perimeter

Secondly, security issues arise with single ended architecture, as shown in Figure 5. Typical DRM implementations protect only the DRM, exposing unmarked plain text content to eavesdropping. A much larger security envelope is required to ensure that only watermarked content is accessible, and thereby protect the investment in watermarking. In many consumer devices, the security envelope is defined by a smart card, ASIC, or similar component. Integrating complex processes, such as video decoding and content analysis, into such devices is impractical.

A third aspect to consider is renewal. Renewal can be accomplished by downloading new software or firmware to the device, within the limitations imposed by its capabilities. The renewal process can become burdensome if it is necessary to employ ASICs to handle high video baseband data rates, particularly those associated with high-definition content. Security of the renewal process must be considered as well, so as to deprive the pirate of less secure, outdated devices.

THE REPLACEMENT MODEL
A refinement of the single ended architecture relieves some of the constraints associated with informed embedding. It is entirely possible to analyze the content once, prior to distribution, and then forward the results of the analysis, as metadata within the DRM security envelope, to the watermark embedding process. The watermark embedding process is thus relieved of a significant processing burden. The cost of the content analysis process is incurred only once, when the content is authored, rather than many times - at each point of consumption.

The price of the up-front content analysis is the overhead of packaging or transmitting the resulting watermark metadata, along with the content. The watermark metadata must be protected, along with the content, by the DRM, or some equivalent mechanism. Protecting watermark metadata is vital, since access to the plain-text would greatly aid an adversary in removing the watermark, or suppressing its embedding. In the architecture shown in Figure 6, the combined content and metadata is decrypted by the DRM in the rendering device, and then separated into the encoded content and metadata streams. The metadata stream provides the watermark modulation process with essential information about the content.

Figure 6: Replacement Model for Informed Embedding

This architecture has been carried a step further in the consumer electronics market. The pre-distribution preparation process not only performs the content analysis, but also watermarks fragments of the content, and includes these watermarked fragments in the metadata. This approach is termed the replacement model, since the embedding process needs only replace fragments of the original content with the watermarked fragments. The embedding process is thereby vastly simplified, and becomes much less challenging to implement in consumer electronics devices.

Another notable feature of the replacement model, as depicted in Figure 6, is the embedding of the watermark in the encoded (compressed) content stream. Embedding watermarks in the encoded domain is practical, provided the watermarked fragments in the metadata are compatibly encoded. This approach permits content decode to take place outside of the security envelope without compromising system security. The content decode process becomes much less sensitive, from a security standpoint, since it operates on already watermarked content.

These prepared watermarks fall into the detectable category. Clearly, it is not possible to provision the embedding process with every possible readable watermark required to represent even a short forensic message. It is, however, entirely possible to treat the forensic message as a series of data bits, with each bit assigned to a set of detectable, prepared watermarked content fragments. The watermark embedding process can then incorporate the forensic message into the content by simply choosing which of the prepared watermarked content fragments to insert into the content stream.

The replacement model architecture offers an attractive paradigm for renewal. The algorithms controlling the placement, strength, and nature of the watermarks are implemented in the mastering or authoring stage, prior to distribution. Consequently, most or all enhancements to the forensic watermarking system can be effected through changes to the mastering or authoring process. Such enhancements are transparent to the low level of "watermark awareness" required in the rendering device. Under the replacement model, a content distributor can quickly react to new attacks by changing the nature of the watermark, without having to wait for a rollout of new code or security hardware to deployed rendering devices.

The replacement model also offers a degree of flexibility not provided by the single ended approach. Just as the essential algorithms can be renewed at the authoring or mastering stage, it is possible to manipulate parameters controlling the watermark process at that point. For example, a content owner might choose to sacrifice perceptibility for robustness for certain piracy prone content, or move in the opposite direction for quality-sensitive content. It is entirely possible for a single rendering device to watermark each of several content items in distinct ways, as dictated by the watermark metadata supplied with the content. The replacement model ensures that all renderings of a particular content item are marked consistently and as expected by the content provider, since the only watermarks permitted are those supplied in the metadata.

Finally, note that the security envelope in the rendering device does not contain heavyweight processes such as decoding or content analysis in the replacement model. Consequently, it is much easier to secure access to the unmarked plain-text content.

The replacement model does, however, offer its own engineering challenges. The content preparation and metadata generation must be coordinated with the embedding process in the rendering devices. Thus the format and semantics of the watermark metadata must be strictly defined. In practice, the replacement model is easiest to implement in environments where a compatible watermark embedding process can be assured in each rendering device.

Some provision must be made for the secure delivery of the watermark metadata, along with the content. As mentioned above, the most reasonable approach would be for the DRM to encrypt both the content and the metadata, as depicted in Figure 6.

Implementations of the replacement model must consider the available bandwidth for metadata. Each watermark is a discrete representation of a single message bit, prepared for insertion at a specific point in the content: the metadata must supply a significant number of distinct and independent watermarked content fragments. Large fragments may exceed the capacity of some channels. Embedding techniques that embed a continuous signal, reflecting the entire message over a large spatial or temporal interval, are not compatible with replacement embedding. WATERMARK RECOVERY
Of course, a watermarking technique is only effective insofar as the forensic message can be recovered from a subject content instance. In general, recovery consists of searching the content instance for the watermark signal(s), extracting the forensic message codeword, and applying the inverse error control algorithms, if used, to estimate the original forensic message. Ideally, the recovery process provides a confidence estimate for the accuracy of the recovered forensic message. This conceptually simple model is shown in Figure 7:

Figure 7: Forensic Watermark Recovery

Informed vs. Blind Recovery - Recovery scenarios are typically classified according to the availability of the original unmarked content to the demodulation process. In the blind scenario, the demodulator has no knowledge of the original content, and thus must be able to separate a weak watermark signal from the strong host signal. In communications terminology, the demodulator must handle very poor signal-to-noise ratios (SNR).

In the Informed recovery scenario, the demodulator greatly improves the SNR by removing the original (unmarked) host signal from the subject content instance, ideally leaving a residual containing only the watermark signal. The only noise arises from the errors made in matching the original host signal to the recovered content. A weaker watermark signal, relative to the blind recovery scenario, can thereby produce the same degree of robustness. Consequently, watermarking for the informed recovery scenario permits less perceptible watermarks and corresponding image quality improvements.

Certain implementations of the replacement model carry informed recovery a step further by using the previously described metadata to anticipate the location and exact content of each watermark. With this additional information, a very subtle watermark can be used, without sacrificing robustness.

Figure 8: Channel Model

In practice, the subject content instance will have often undergone some decidedly unhelpful transformations. These transformations can be modeled at a high level as shown in Figure 8. The content may be altered during capture, particularly if the pirate is not able to (or chooses not to) directly record the digital representation of the content. Resampling of analog signals typically changes quantization, resolution, and alignment. Camcorder captured video content may exhibit substantial rotation, keystone distortions, pincushion, and jitter.

The pirate may also attempt to attack the watermark directly, to disguise the origin of the content. Filters can be designed to obscure watermarks which are confined to narrow frequency bands. Collusion attacks attempt to obscure watermarks by combining several separately marked copies of the content in some manner. The pirate may also edit the content for his intended distribution channel by changing frame rate, aspect ratio, interlace, resolution, gamma, and color mapping.

All of these transformations distort both the watermark signal and the host signal. In the informed recovery scenario, the demodulator must closely match the recovered instance to the original content, in order to accurately remove the host signal. The demodulator can also use the original content to accurately model the transformations that distort the watermark and host signals, and then invert these transformations before searching for the watermark. In the blind recovery scenario, some model of the transformations must also be constructed, so as to arrive at an estimate of the target watermark signal. Much of the art in an effective watermarking system is in the design of a watermark signal that is resistant to the channel transformations, while facilitating successful recovery.

Watermark literature also defines a semi-informed recovery scenario, where the recovery process is provided some embedding information, but not the entire original source. Such information may consist of keys or equivalent secret information which aids the recovery process in determining the location of watermarks or interpreting their meaning. Semi-informed recovery scenarios can improve robustness by depriving the attacker of essential information about the watermarks.

It is important to note that the recovery process must include a means of determining the nature of the watermarks in the subject content. This determination may become problematic in either of the architectures described. The relatively static single-ended embedding architecture is straightforward, but only if the distribution system can ensure that all rendering devices have equivalent watermarking technology. The roll-out of an upgrade may also introduce ambiguity, since the many recovery scenarios would lack a priori knowledge of whether the subject content was rendered before or after the upgrade took effect on the particular device. In either case, if the watermarking technology is not known to the recovery process, recovery must search through a repertoire of techniques. Since most techniques are proprietary, several software implementations may have to be tried to achieve a successful recovery.

The problem takes on a different form in the replacement scenarios. Since the nature of the watermark is determined prior to distribution, all subject instances associated with the distribution will have been watermarked consistently. If, however, the distributor chooses to vary the nature of the watermark between content titles, or separate distributions of the same content, it may become necessary to track what sort of watermark was supplied in the metadata for each title or distribution.

CONCLUSIONS
Two distinct architectural models exist for forensic watermarking. The single-ended approach is conceptually straightforward and appropriate for less demanding or unstructured environments. Where rendering device cost is a significant factor (large deployments), and imperceptibility (quality), robustness, and long term viability (renewability) are at a premium, the replacement model offers compelling advantages.

Footnotes:
¹ Digital Cinema Systems Specification V1.0, July 20, 2005

About Cinea
Cinea, Inc., a Dolby company, develops and commercializes a broad variety of content-protection solutions for the motion picture and television industries. Current customers include many of the major film studios as well as leading service vendors in the entertainment industry. The company's forensic watermarking technology, Running MarksTM, uniquely allows content owners to track pirated content to specific devices; from set-tops to PCs to portable video devices to mobile phones. The company is a founding member of the Digital Watermarking Alliance. For more information about Cinea, Inc. or Cinea technologies, please visit www.cinea.com. A Dolby Company

About the author
Joseph Oren, CISSP, is Security Architect for Cinea, Inc. a Dolby Company. For the past three years he has participated in the design and refinement of anti-piracy technology, and worked to advance the technology in SMPTE and similar organizations. Previously, he worked for Circuit City Stores, Inc. and McDonnel Douglas handling software engineering, system design, and analysis. He received his BS Mathematics from Virginia Tech in 1970. He can be reached at [email protected].