Human image synthesis

Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery have featured synthetic images of human-like characters digitally composited onto the real or other simulated film material. Towards the end of the 2010s deep learning artificial intelligence has been applied to synthesize images and video that look like humans, without need for human assistance, once the training phase has been completed, whereas the old school 7D-route required massive amounts of human work.

Timeline of human image synthesis

In 1971 Henri Gouraud made the first CG geometry capture and representation of a human face. Modeling was his wife Sylvie Gouraud. The 3D model was a simple wire-frame model and he applied the Gouraud shader he is most known for to produce the first known representation of human-likeness on computer .
The 1972 short film A Computer Animated Hand by Edwin Catmull and Fred Parke was the first time that computer-generated imagery was used in film to simulate moving human appearance. The film featured a computer simulated hand and face .
The 1976 film Futureworld reused parts of A Computer Animated Hand on the big screen.
The 1983 music video for song Musique Non-Stop by German band Kraftwerk aired in 1986. Created by the artist Rebecca Allen, it features non-realistic looking, but clearly recognizable computer simulations of the band members.
The 1994 film The Crow was the first film production to make use of digital compositing of a computer simulated representation of a face onto scenes filmed using a body double. Necessity was the muse as the actor Brandon Lee portraying the protagonist was tragically killed accidentally on-stage.
In 1999 Paul Debevec et al. of USC captured the reflectance field of a human face with their first version of a light stage. They presented their method at the SIGGRAPH 2000
In 2003 audience debut of photo realistic human-likenesses in the 2003 films The Matrix Reloaded in the burly brawl sequence where up-to-100 Agent Smiths fight Neo and in The Matrix Revolutions where at the start of the end showdown Agent Smith's cheekbone gets punched in by Neo leaving the digital look-alike unnaturally unhurt. The Matrix Revolutions bonus DVD documents and depicts the process in some detail and the techniques used, including facial motion capture and limbal motion capture, and onto models.
In 2003 The Animatrix: Final Flight of the Osiris a state-of-the-art want-to-be human likenesses not quite fooling the watcher made by Square Pictures.
In 2003 digital likeness of Tobey Maguire was made for movies Spider-man 2 and Spider-man 3 by Sony Pictures Imageworks.
In 2009 Debevec et al. presented new digital likenesses, made by Image Metrics, this time of actress Emily O'Brien whose reflectance was captured with the USC light stage 5 Motion looks fairly convincing contrasted to the clunky run in the Animatrix: Final Flight of the Osiris which was state-of-the-art in 2003 if photorealism was the intention of the animators.
In 2009 a digital look-alike of a younger Arnold Schwarzenegger was made for the movie Terminator Salvation though the end result was critiqued as unconvincing. Facial geometry was acquired from a 1984 mold of Schwarzenegger.
In 2010 Walt Disney Pictures released a sci-fi sequel entitled ' with a digitally rejuvenated digital look-alike of actor Jeff Bridges playing the antagonist CLU.
In SIGGGRAPH 2013 Activision and USC presented a real time "Digital Ira" a digital face look-alike of Ari Shapiro, an ICT USC research scientist, utilizing the USC light stage X by Ghosh et al. for both reflectance field and motion capture. The end result both precomputed and real-time rendering with the modernest game GPU shown and looks fairly realistic.
In 2014 The Presidential Portrait by USC ICT in conjunction with the Smithsonian Institution was made using the latest USC mobile light stage wherein President Barack Obama had his geometry, textures and reflectance captured.
In 2014 Ian Goodfellow et al. presented the principles of a generative adversarial network. GANs made the headlines in early 2018 with the deepfakes controversies.
For the 2015 film Furious 7 a digital look-alike of actor Paul Walker who died in an accident during the filming was done by Weta Digital to enable the completion of the film.
In 2016 techniques which allow near real-time counterfeiting of facial expressions in existing 2D video have been believably demonstrated.
In 2016 a digital look-alike of Peter Cushing was made for the Rogue One film where its appearance would appear to be of same age as the actor was during the filming of the original 1977 Star Wars film.
In SIGGRAPH 2017 an audio driven digital look-alike of upper torso of Barack Obama was presented by researchers from University of Washington. It was driven only by a voice track as source data for the animation after the training phase to acquire lip sync and wider facial information from training material consisting 2D videos with audio had been completed.
Late 2017 and early 2018 saw the surfacing of the deepfakes controversy where porn videos were doctored utilizing deep machine learning so that the face of the actress was replaced by the software's opinion of what another persons face would look like in the same pose and lighting.
In 2018 GDC Epic Games and Tencent Games demonstrated "Siren", a digital look-alike of the actress Bingjie Jiang. It was made possible with the following technologies: CubicMotion's computer vision system, 3Lateral's facial rigging system and Vicon's motion capture system. The demonstration ran in near real time at 60 frames per second in the Unreal Engine 4.
In 2018 at the World Internet Conference in Wuzhen the Xinhua News Agency presented two digital look-alikes made to the resemblance of its real news anchors Qiu Hao and Zhang Zhao. The digital look-alikes were made in conjunction with Sogou. Neither the speech synthesis used nor the gesturing of the digital look-alike anchors were good enough to deceive the watcher to mistake them for real humans imaged with a TV camera.
In September 2018 Google added “involuntary synthetic pornographic imagery” to its ban list, allowing anyone to that falsely depict them as “nude or in a sexually explicit situation.”
In February 2019 Nvidia open sources StyleGAN, a novel generative adversarial network. Right after this Phillip Wang made the website with StyleGAN to demonstrate that unlimited amounts of often photo-realistic looking facial portraits of no-one can be made automatically using a GAN. Nvidia's StyleGAN was presented in a not yet peer reviewed paper in late 2018.
At the June 2019 CVPR the MIT CSAIL presented that synthesizes likely faces based on just a recording of a voice. It was trained with massive amounts of video of people speaking.
Since July 1 2019 Virginia has criminalized the sale and dissemination of unauthorized synthetic pornography, but not the manufacture., as became part of the Code of Virginia. The law text states: "Any person who, with the intent to coerce, harass, or intimidate, maliciously disseminates or sells any videographic or still image created by any means whatsoever that depicts another person who is totally nude, or in a state of undress so as to expose the genitals, pubic area, buttocks, or female breast, where such person knows or has reason to know that he is not licensed or authorized to disseminate or sell such videographic or still image is guilty of a Class 1 misdemeanor.". The identical bills were presented by Delegate Marcus Simon to the Virginia House of Delegates on January 14 2019 and three day later an identical was introduced to the Senate of Virginia by Senator Adam Ebbin.
Since September 1 2019 Texas senate bill amendments to the election code came into effect, giving candidates in elections a 30-day protection period to the elections during which making and distributing digital look-alikes or synthetic fakes of the candidates is an offense. The law text defines the subject of the law as "a video, created with the intent to deceive, that appears to depict a real person performing an action that did not occur in reality"
In September 2019 Yle, the Finnish public broadcasting company, aired a result of experimental journalism, of the President in office Sauli Niinistö in its main news broadcast for the purpose of highlighting the advancing disinformation technology and problems that arise from it.
January 1 2020 California the state law came into effect banning the manufacturing and distribution of synthetic pornography without the consent of the people depicted. AB-602 provides victims of synthetic pornography with injunctive relief and poses legal threats of statutory and punitive damages on criminals making or distributing synthetic pornography without consent. The bill AB-602 was signed into law by California Governor Gavin Newsom on October 3 2019 and was authored by California State Assembly member Marc Berman.
January 1 2020, Chinese law requiring that synthetically faked footage should bear a clear notice about its fakeness came into effect. Failure to comply could be considered a crime the Cyberspace Administration of China stated on its website. China announced this new law in November 2019. The Chinese government seems to be reserving the right to prosecute both users and online video platforms failing to abide by the rules.
In July 2020''' the project by MIT's Center for Advanced Virtuality publishes a synthetic human-like fake in the appearance and almost in the sound of Nixon.
Key breakthrough to photorealism: reflectance capture

In 1999 Paul Debevec et al. of USC did the first known reflectance capture over the human face with their extremely simple light stage. They presented their method and results in SIGGRAPH 2000.
for human skin likeness requires both BRDF and special case of BTDF where light enters the skin, is transmitted and exits the skin.
The scientific breakthrough required finding the subsurface light component which can be found using knowledge that light that is reflected from the oil-to-air layer retains its polarization and the subsurface light loses its polarization. So equipped only with a movable light source, movable video camera, 2 polarizers and a computer program doing extremely simple math and the last piece required to reach photorealism was acquired.
For a believable result both light reflected from skin and within the skin which together make up the BSDF must be captured and simulated.

Capture

The 3D geometry and textures are captured onto a 3D model by a 3D reconstruction method, such as sampling the target by means of 3D scanning with an RGB XYZ scanner such as Arius3d or Cyberware, stereophotogrammetrically from synchronized photos or even from enough repeated non-simultaneous photos. Digital sculpting can be used to make up models of the body parts for which data cannot be acquired e.g. parts of the body covered by clothing.
For believable results also the reflectance field must be captured or an approximation must be picked from the libraries to form a 7D reflectance model of the target.
Synthesis

The whole process of making digital look-alikes i.e. characters so lifelike and realistic that they can be passed off as pictures of humans is a very complex task as it requires photorealistically modeling, animating, cross-mapping, and rendering the soft body dynamics of the human appearance.
Synthesis with an actor and suitable algorithms is applied using powerful computers. The actor's part in the synthesis is to take care of mimicking human expressions in still picture synthesizing and also human movement in motion picture synthesizing. Algorithms are needed to simulate laws of physics and physiology and to map the models and their appearance, movements and interaction accordingly.
Often both physics/physiology based and image-based modeling and rendering are employed in the synthesis part. Hybrid models employing both approaches have shown best results in realism and ease-of-use. Morph target animation reduces the workload by giving higher level control, where different facial expressions are defined as deformations of the model, which facial allows expressions to be tuned intuitively. Morph target animation can then morph the model between different defined facial expressions or body poses without much need for human intervention.
Using displacement mapping plays an important part in getting a realistic result with fine detail of skin such as pores and wrinkles as small as 100 µm.

Machine learning approach

In the late 2010s, machine learning, and more precisely generative adversarial networks, were used by NVIDIA to produce random yet photorealistic human-like portraits.
The system, named StyleGAN, was trained on a database of 70,000 images from the images depository website Flickr.
Outputs of the generator network from random input were made publicly available on a website.
The source code was made public on GitHub in 2019.
Similarly, since 2018, deepfake technology has allowed GANs to swap faces between actors; combined with the ability to fake voices, GANs can thus generate fake videos that seem convincing.

Applications

Main applications fall within the domains of virtual cinematography, computer and video games and covert disinformation attacks.
Furthermore, some research suggests that it can have therapeutic effects as "psychologists and counselors have also begun using avatars to deliver therapy to clients who have phobias, a history of trauma, addictions, Asperger’s syndrome or social anxiety." The strong memory imprint and brain activation effects caused by watching a digital look-alike avatar of yourself is dubbed the doppelgänger effect.
The doppelgänger effect can heal when covert disinformation attack is exposed as such to the targets of the attack.

Related issues

The speech synthesis has been verging on being completely indistinguishable from a recording of a real human's voice since the 2016 introduction of the voice editing and generation software Adobe Voco, a prototype slated to be a part of the Adobe Creative Suite and DeepMind WaveNet, a prototype from Google.
Ability to steal and manipulate other peoples voices raises obvious ethical concerns.
At the 2018 Conference on Neural Information Processing Systems researchers from Google presented the work , which transfers learning from speaker verification to achieve text-to-speech synthesis, that can be made to sound almost like anybody from a speech sample of only 5 seconds .
Digital sound-alikes technology found its way to the hands of criminals as in 2019 Symantec researchers knew of 3 cases where technology has been used for crime.
This coupled with the fact that techniques which allow near real-time counterfeiting of facial expressions in existing 2D video have been believably demonstrated increases the stress on the disinformation situation.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...