Generating Images from Templates

The question of image recreation from templates is a complicated one. The industry has long held that one of the primary benefits from a security and privacy perspective is that the image (or, more broadly, identifiable data) cannot be recreated or regenerated from the template. Since images cannot be recreated, the logic goes, a hacked database cannot be used to manufacture a fingerprint or other biometric for hostile purposes such as placing a fingerprint at a crime scene or logging into a private network.

However, a recent story from the Canberra Times (Australia) indicated that a student was able to access an unencrypted template, determine how the vendor encoded features, and rebuild an image that was capable of being fed into the system to gain access. This calls into question claims regarding non-recreation of image data from templates, just as recent liveness reports call into question susceptibility to spoofing.

As we've often seen in biometrics, what is positioned as a black and white issue - 'can templates be used to recreate images?' - is more complicated than it appears. The short answer is that under certain circumstances it is very likely that some type of image or visual representation can be recreated from some, if not all, biometric templates. It seems that there is no conceptual impediment to some type of image recreation or, at least, some type of meaningful analysis and representation of template data. Such recreation may be extremely difficult, may require access to highly confidential information, and in the end may have little to no negative impact on system security or personal privacy. However, biometric security may need to be reconceived if it can be demonstrated that images are recreatable from templates.

To frame the discussion:

1. Templates are generated by algorithms which locate and encode distinctive features from an identifiable physiological or behavioral characteristic such as a fingerprint image. In today's biometric industry, algorithms are proprietary to each vendor and in many cases represent key components of a vendor's intellectual property. Templates vary widely from sample to sample, such that in theory only a vendor algorithm can determine whether two templates match.

2. In order to analyze or reverse engineer a template, one must have access to an unencrypted template. This would involve either defeating the encryption used to protect the template or attacking a biometric system which does not utilize encryption (most, but not all, biometric systems encrypt data at various stages of transmission and storage). So one assumes unfettered access to a biometric template stored in a database or intercepted in transmission.

3. Vendors may mean different things when claiming that images cannot be regenerated from templates. There may be an inherent quality of their template generation algorithm that prevents images from being recreated. On the other hand, the 'secret' nature of the algorithm may be the rationale for the inability to recreate images.

4. An important question is "who is attempting to regenerate the image?" The normal assumption is that an external agent is trying to recreate an image utilizing a captured template. The agent would need to reverse-engineer the algorithm in order to determine the method of template generation. This suggests a high degree of trial-and-error experimentation. However, one must assume a second scenario - that the company which created the algorithm is coerced into attempting to regenerate an image from one of its templates. This removes the element having to reverse engineer the algorithm, as the proprietary algorithm is in this case known. From a privacy perspective, the fear is that a government agency could compel a vendor to recreate images from a template database, or that a knowledgeable employee could be bribed into attempting image regeneration. In order for the industry to claim that images cannot be regenerated, both scenarios (external and internal) must be addressed. 

Furthermore, 'recreation' needs to be defined: there are at least three types of 'recreation' possible. 

1. "Feature recreation": recreating an image that bears little resemblance to any piece of identifiable data but which still suffices to fool a biometric system. An image may be meaningful to a biometric system but blatantly unrepresentative of a physiological characteristic to the naked eye. In this case the risk is not of images being "placed" at a crime scene but instead being used to fool a biometric system into granting access. Such a scheme assumes that any liveness detection capabilities are defeated or nonexistent. 

2. "Generic image recreation": recreating an image which resembles an actual fingerprint, but which is clearly not the 'same' fingerprint that was used to create the template. In this scenario, a fake biometric sample could be recreated which looks like a fingerprint or face, but which to an informed observer is clearly not the same fingerprint or face. This is a type of recreation - the question is how to distinguish between recreation of "an" image and recreation of "the" image. 

3. "Total image recreation": recreating an image which can pass as the original to an informed observer, such that one cannot reasonably distinguish between the original data and the recreated data. This is a worst-case scenario in which a perfect fingerprint could be recreated and placed at a crime scene or used for a forensic investigation. Clearly there is gray area between generic and total image recreation, just as gray area exists in manual forensic fingerprint matching. 

Of the three types of 'recreation', the first is the most likely to be achievable, the second is very likely to be achievable, and the third would seem to be very difficult though perhaps not impossible. 

Biometric algorithms generally locate a substantial amount of distinctive data, but do not locate each distinctive feature in the same fashion with every extraction. Algorithms can erroneously fail to locate features which are present, can erroneously locate 'false' features, and may view a large portion of a biometric sample without locating any distinctive characteristics. Because of this, it is very unlikely in any biometric system that a full set of data resides within the template from which a perfect image can be regenerated. From a biometric matching perspective, a 50% correlation between two templates may suffice as a strong match. Even assuming that the method of template generation is cracked, using this data to recreate an image would result in an image 'missing' multiple features.

Once one has located some percentage of legitimate features, generating a reasonable facsimile of an image should not be difficult. Even if distinctive features are placed at arbitrary places, trail and error may allow an imposter to break into a biometric system based on the fact that some of the characteristics are legitimate.

Ironically, the robustness of the underlying feature extraction biometric algorithm drives the level of risk. A perfect extraction algorithm which never misses or mislabels a feature, and is totally resistant to environmental factors, would most likely increase the viability of total image recreation. Such an algorithm does not exist today.

This issues also varies from technology to technology. Biometrics which rely on pattern matching as opposed to minutia-based algorithms may be less susceptible to image recreation, as characteristics are drawn from areas as opposed to points. To our knowledge, this has not been studied in depth. Facial-scan technologies may utilize algorithms which ignore most of the face and focus on the eyes and nose, or may use the entire face; the susceptibility here may vary as well. The potential risks (but not necessarily susceptibility) of image recreation seem to be most strongly linked to facial-scan, in which an unsuspecting person may have a facial image regenerated, and fingerprint, in which a fingerprint could in theory be placed at a crime scene. Recreating hand shape or retinal pattern would seem to bear less risk, unless used to enroll fraudulently in a system. 

Building from this point, what protections can be implemented to reduce both the susceptibility to and the risks of image regeneration?

First, there are the obvious protections provided by encryption. Common sense dictates that sensitive biometric data must be protected wherever possible during transmission and storage. Trusted devices which generate secure sessions are an example of a technology solution which renders template interception very difficult. 

Second, vendors can develop algorithms which generating templates whose composition is such that the 'location' of distinctive points is difficult to extrapolate. Many vendors have already taken this step, such that there is no discernible correlation between manipulated data points, for example, and the resultant change in the template. One may assume that even highly complex algorithms may eventually be reverse engineerable, but there may be a point of diminishing returns where the level of effort is higher than the potential reward.

Third, some vendors have claimed a one-way hash functionality, such that their template only effectively exists in encrypted form. An agent would need to know the hashing mechanism to begin to attack the binary data for the purposes of deducing the algorithm.

These protections do not address the 'hostile vendor' issue. This is an area where liveness detection can play a major role: The risks associated with regeneration of images are reduced when fake images cannot spoof systems. However, the challenges involved in liveness detection are becoming well-known. 

There is an additional speculative wrinkle: the development of 'open' algorithms which are published, vetted, and used to generate enrollment and verification templates. If template generation methods cease to be proprietary, which may eventually come to pass due to government prodding, then stronger protections will be necessary to prevent template loss and to resist manufactured images.

On a final note, it is interesting that the student mentioned above studied under Roger Clarke, an Australian privacy expert strongly skeptical of the use of biometrics in large-scale identification. As suggested in the liveness paper, one hopes that the biometric industry can take it upon itself to address security and system integrity issues without relying on third parties to demonstrate vulnerabilities.