Did you miss a session at the Data Summit? Watch On-Demand Here.
This article is contributed by Rick Hao, lead deep tech partner at pan-European VC Speedinvest.
With an annual growth rate of 44%, the market for AI and machine learning is drawing continued interest from business leaders across every industry. With some projections estimating that AI will boost the GDP of some local economies by 26% by 2030, it’s easy to see the rationale for the investment and hype.
Among AI researchers and data scientists, one of the major steps in ensuring AI delivers on the promise of enhanced growth and productivity is through expanding the range and capabilities of models available for organizations to use. And top of the agenda is the development, training and deployment of Deep Generative Models (DGMs) — which I consider to be some of the most exciting models set for use in industry. But why?
What are DGMs?
You’ve likely already seen the results of a DGM in action — they’re actually the same type of AI models that produce deepfakes or impressionistic art. DGMs have long excited academics and researchers in computer labs, owing to the fact that they bring together two very important techniques that represent the confluence of deep learning and probabilistic modeling: the generative model paradigm and neural networks.
A generative model is one of two major categories of AI models and, as its name suggests, it is a model that can take a dataset and generate new data points based on the input it’s received so far. This contrasts with the more commonly used — and far easier to develop — discriminative models, which look at a data point in a dataset and then label or classify it.
The “D” in “DGM” refers to the fact that, alongside being generative models, they leverage deep neural networks. Neural networks are computing architectures that give programs the ability to learn new patterns over time — what makes a neural network “deep” is an increased level of complexity offered by multiple hidden “layers” of inferences between a model’s input and a model’s output. This depth gives deep neural networks the ability to operate with extremely complex datasets with many variables at play.
Put together, this means that DGMs are models that can generate new data points based on data fed into them, and that can handle particularly complex datasets and subjects.
The opportunities of DGMs
As mentioned above, DGMs already have some notable creative and imaginative uses, such as deepfakes or art generation. However, the potential full range of commercial and industrial applications for DGMs is vast and promises to up-end a variety of sectors.
For example, consider the issue of protein folding. Protein folding — discovering the 3D structure of proteins — allows us to find out which medicines and compounds interact with various types of human tissue, and how. This is essential to drug discovery and medical innovation, but discovering how proteins fold is very difficult, requiring scientists to dissolve and crystallize proteins before analyzing them, which means the whole process for a single protein can last weeks or months. Traditional deep learning models are also insufficient to help tackle the protein folding problem, as their focus is primarily on classifying existing data sets rather than being able to generate outputs of their own.
By contrast, last year the DeepMind team’s AlphaFold model succeeded in reliably being able to anticipate how proteins would fold based solely on data regarding their chemical composition. By being able to generate results in hours or minutes, AlphaFold has the potential to save months of lab work and vastly accelerate research in just about every field of biology.
We’re also seeing DGMs emerge in other domains. Last month, DeepMind released AlphaCode, a code-generating AI model that’s successfully outperformed the average developer in trials. And the applicability of DGMs can be seen in fields as far-flung as physics, financial modelling, or logistics: through being able to tacitly learn subtle and complex patterns that humans and other deep learning networks are unable to spot, DGMs promise to be able to generate surprising and insightful results in just about every field.
DGMs face some notable technical challenges, such as the difficulty in training them optimally (especially with limited data sets) and ensuring that they can yield consistently accurate outputs in real applications. This is a major driver of the need for further investment to ensure DGMs can be widely deployed in production environments and thus deliver on their economic and social promises.
Beyond the technical hurdles, however, a big challenge for DGMs is in ethics and compliance. Owing to their complexity, the decision-making process for DGMs is very difficult to understand or explain, especially by those who don’t understand their architecture or operations. This lack of explainability can create a risk of an AI model developing unjustified or unethical biases without the knowledge of its operators, in turn generating outputs that are inaccurate or discriminatory.
In addition, the fact that DGMs operate on such a layer of high complexity means that there’s a risk of it being difficult to reproduce their results. This difficulty with reproducibility can make it hard for researchers, regulators, or the general public to have confidence in the results provided by a model.
Ultimately, to mitigate risks around explainability and reproducibility, devops teams and data scientists looking to leverage DGMs need to ensure they’re using best practices in formatting their models and that they employ recognized explainability tools in their deployments.
While only just beginning to enter production environments at scale, DGMs represent some of the most promising developments in the AI world. Ultimately, through being able to look at some of the most subtle and fundamental patterns in society and nature, these models will prove transformative in just about every industry. And despite the challenges of ensuring compliance and transparency, there’s every reason to be optimistic and excited about the future DGMs promise for technology, our economy and society as a whole.
Rick Hao is lead deep tech partner at pan-European VC Speedinvest.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!