How Getty Images Built a Generative AI Model Without Scraping the Web

Generative AI can conjure up just about any image, but it rarely tells you where that image came from or who deserves credit. The recent controversy surrounding Studio Ghibli and OpenAI offered a glimpse of what’s at stake, as AI-generated images mimicking the studio’s distinctive animation style went viral, despite having no connection to Hayao Miyazaki and no authorization to imitate his work. In an AI-saturated world where AI models are often trained on scraped and unlicensed content, Getty Images is offering a different kind of tool: an image generation model custom-built entirely on licensed, human-created content, with a royalty system that ensures contributors are paid for their work.  To learn more about how this works in practice, AIwire spoke with Andrea Gagliano, Getty Images’ head of AI/ML. Her team oversees the company’s search and generative AI efforts, rooted in the Creative side of Getty Images, comprising the images, illustrations, and videos used in advertising and marketing campaigns. Unlike the company’s Editorial content, which covers celebrities, politics, and current events, the Creative library provides a foundation that’s free of copyright concerns, drawn entirely from licensed contributor content.  Getty Images has built strict safeguards into its AI generator: it will not generate known likenesses or recognizable trademarks, ensuring the content is safe for commercial use. Gagliano says customers need visuals they can use freely without worrying about legal risk. The goal is to support creativity on both ends: empowering users to push boundaries while continuing to invest in the artists who make it all possible.  “We really think that it can elevate creativity and allow our customers, creatives, and artists to be more conceptual or to push the boundaries in terms of creativity, but we want to harness that power while also making sure that we do so in a way that protects creators and is done in a commercially safe way,” Gagliano said.  Rather than training their model on public data scraped from the internet, Getty relies entirely on its licensed creative library of about 200 million images, and contributors are compensated through a revenue-share model that rewards them for the life of the product. Gagliano says the content is “licensed from photographers and contributors, and it gives compensation back to those contributors on a recurring basis, so not just a one-time fee, but as a percentage of revenue here into eternity, based on how much that generative tool makes.” Unlike many generative tools, Getty Images’ model offers something concrete: legal assurance and commercial usability. Generated visuals come with automatic legal protection of up to $50,000 per image, and the company offers uncapped indemnification as part of its enterprise solutions, along with perpetual and worldwide usage rights, and no limits on print runs or digital impressions. Additionally, user outputs are never added to Getty’s searchable creative library, and prompt safeguards are in place to prevent the generation of known brands, logos, or celebrity likenesses. “Safe for commercial use” isn’t just a claim but a foundation of the tool.  Promising a truly copyright-free image is not an easy task. To ensure that standard, Getty Images’ generative model wasn’t adapted from any existing foundation model. Instead, it was built from scratch in partnership with Nvidia using NVIDIA Edify, a multimodal architecture for developing visual generative AI. Getty Images trained and customized the model using the NVIDIA AI Foundry, an end-to-end platform for building custom models. That approach gives the company control not only over the data pipeline but also over how the model evolves, sidestepping the legal and creative risks that come with pre-trained, publicly sourced models.  The company also avoids common technical shortcuts that could compromise quality or originality over time. Getty Images does not use reinforcement learning or train on the model’s own outputs. This decision was made to prevent a phenomenon known as model collapse, which can happen when generated images gradually narrow into a repetitive, homogenous style.  “Basically, the outputs of the model begin to converge to a very small sort of distribution of pixels,” Gagliano explained. “It's really important to us that our model stays more generalized, so that it can produce a lot of different pixels and a lot of different things.”  To counteract model collapse, Getty feeds in roughly 10 million new creative images each quarter, all contributed by its global network of artists and photographers. The result is a system that not only reflects current visual trends, from fashion to cultural aesthetics, but also preserves the diversity and novelty essential for storytelling through images.  “We have a large team of people that work with our photographers and our contributors that are constantly doing research, quantitative and qualitative, into finding

Apr 18, 2025 - 03:37
 0
How Getty Images Built a Generative AI Model Without Scraping the Web

Generative AI can conjure up just about any image, but it rarely tells you where that image came from or who deserves credit. The recent controversy surrounding Studio Ghibli and OpenAI offered a glimpse of what’s at stake, as AI-generated images mimicking the studio’s distinctive animation style went viral, despite having no connection to Hayao Miyazaki and no authorization to imitate his work.

In an AI-saturated world where AI models are often trained on scraped and unlicensed content, Getty Images is offering a different kind of tool: an image generation model custom-built entirely on licensed, human-created content, with a royalty system that ensures contributors are paid for their work. 

To learn more about how this works in practice, AIwire spoke with Andrea Gagliano, Getty Images’ head of AI/ML. Her team oversees the company’s search and generative AI efforts, rooted in the Creative side of Getty Images, comprising the images, illustrations, and videos used in advertising and marketing campaigns. Unlike the company’s Editorial content, which covers celebrities, politics, and current events, the Creative library provides a foundation that’s free of copyright concerns, drawn entirely from licensed contributor content. 

Getty Images has built strict safeguards into its AI generator: it will not generate known likenesses or recognizable trademarks, ensuring the content is safe for commercial use. Gagliano says customers need visuals they can use freely without worrying about legal risk. The goal is to support creativity on both ends: empowering users to push boundaries while continuing to invest in the artists who make it all possible. 

“We really think that it can elevate creativity and allow our customers, creatives, and artists to be more conceptual or to push the boundaries in terms of creativity, but we want to harness that power while also making sure that we do so in a way that protects creators and is done in a commercially safe way,” Gagliano said. 

Rather than training their model on public data scraped from the internet, Getty relies entirely on its licensed creative library of about 200 million images, and contributors are compensated through a revenue-share model that rewards them for the life of the product. Gagliano says the content is “licensed from photographers and contributors, and it gives compensation back to those contributors on a recurring basis, so not just a one-time fee, but as a percentage of revenue here into eternity, based on how much that generative tool makes.”

AIwire tested Getty Images’ AI generation model to create this image of an artist and his AI assistant. The user interface was highly intuitive, and the built-in prompt builder was effective and simple to use. We were also able to fine-tune the image using a special tool to highlight areas we wanted to refine using additional prompting.

Unlike many generative tools, Getty Images’ model offers something concrete: legal assurance and commercial usability. Generated visuals come with automatic legal protection of up to $50,000 per image, and the company offers uncapped indemnification as part of its enterprise solutions, along with perpetual and worldwide usage rights, and no limits on print runs or digital impressions. Additionally, user outputs are never added to Getty’s searchable creative library, and prompt safeguards are in place to prevent the generation of known brands, logos, or celebrity likenesses. “Safe for commercial use” isn’t just a claim but a foundation of the tool. 

Promising a truly copyright-free image is not an easy task. To ensure that standard, Getty Images’ generative model wasn’t adapted from any existing foundation model. Instead, it was built from scratch in partnership with Nvidia using NVIDIA Edify, a multimodal architecture for developing visual generative AI. Getty Images trained and customized the model using the NVIDIA AI Foundry, an end-to-end platform for building custom models. That approach gives the company control not only over the data pipeline but also over how the model evolves, sidestepping the legal and creative risks that come with pre-trained, publicly sourced models. 

The company also avoids common technical shortcuts that could compromise quality or originality over time. Getty Images does not use reinforcement learning or train on the model’s own outputs. This decision was made to prevent a phenomenon known as model collapse, which can happen when generated images gradually narrow into a repetitive, homogenous style. 

“Basically, the outputs of the model begin to converge to a very small sort of distribution of pixels,” Gagliano explained. “It's really important to us that our model stays more generalized, so that it can produce a lot of different pixels and a lot of different things.” 

To counteract model collapse, Getty feeds in roughly 10 million new creative images each quarter, all contributed by its global network of artists and photographers. The result is a system that not only reflects current visual trends, from fashion to cultural aesthetics, but also preserves the diversity and novelty essential for storytelling through images. 

Andrea Gagliano

“We have a large team of people that work with our photographers and our contributors that are constantly doing research, quantitative and qualitative, into finding the gaps in our library,” Gagliano said. The content team works with contributors to address the gaps, adding new subjects, styles, and underrepresented perspectives, supporting both the company’s core licensing business and the health of the generative model.

That emphasis on freshness and diversity helps keep the model relevant and expansive, but it also points to a deeper challenge in the generative AI field, one that Gagliano believes hasn’t been fully addressed: a dependence on ever-expanding volumes of data. “These models are hungry,” she said. “Just feed them more and more data. And that’s the power that gets you better outputs, which is true. But I think there’s a whole area of research that hasn’t really been tapped into yet, which is, how do we make these models more efficient to work with less data?” 

That question is central to Getty’s approach. Because the company is committed to licensing content and compensating creators, it cannot take shortcuts that rely on massive, indiscriminate datasets. Instead, Gagliano said, the focus is on developing model architectures that can do more with high-value, curated content. 

“In a world where we want to compensate creators, sometimes we have to do that with less data,” she said.

While synthetic data is often pitched as the solution, Gagliano cautioned that it is not always a clean fix. “Synthetic data can be great,” she said, “but only if the synthetic data itself is trained on models that are trained on licensed content.” Otherwise, the artists are not being compensated, and models are just generating more data from unlicensed sources. 

This delicate balance between innovation and artistic integrity is something Gagliano understands from both sides. Before she led AI efforts at Getty, she was, and still is, a visual artist herself, uniquely positioning her to tackle these challenges. 

“It gives me an appreciation for what makes a good visual versus a less good visual,” she said. “And it gives me empathy and understanding for both sides: for the technical drive to innovate, and for protecting artists and creators. I really try to think hard about how we find a more nuanced solution, one that isn’t a polarized all or nothing.”