# What are Hypernetworks where AI Builds AI?

### Introduction to Hypernetworks and GHN-2

At AiSupremacy we pride ourselves (well it’s just me for now) on trying to cover some of the academic news in AI.

There are so many AI research papers and Ph.D. students doing incredible things in the space, it’s a very exciting time. It’s also realistically nearly impossible to cover, this is why I’m literally trying to write every day.

For more academic insights into A.I. I recommend Synced, if you have a more technical frame of reference. Here at AiSupremacy we try to share A.I. news for everyone.

QuantumMagazine first broke the story. Boris Knyazev of the University of Guelph in Ontario and his colleagues have designed and trained a “hypernetwork” — a kind of overlord of other neural networks — that could speed up the training process.

As a result of neural optimization, you could even make the case that Hypernetworks represent a world where AI is building other AI.

Today’s neural networks are even hungrier for data and power. Training them requires carefully tuning the values of millions or even billions of parameters that characterize these networks, representing the strengths of the connections between artificial neurons.

#### Hypernetworks are Sort of a Big Deal, Here’s Why

Given a new, untrained deep neural network designed for some task, the hypernetwork predicts the parameters for the new network in fractions of a second, and in theory could make training unnecessary.

Because the hypernetwork learns the extremely complex patterns in the designs of deep neural networks, the work may also have deeper theoretical implications.

Soon we might have AI training and building AI at scale, as we learn more how to implement this.

For now, the hypernetwork performs surprisingly well in certain settings, but there’s still room for it to grow — which is only natural given the magnitude of the problem. If they can solve it, “this will be pretty impactful across the board for machine learning,”said Petar Veličković, a staff research scientist at DeepMind in London.

Given a new, untrained deep neural network designed for some tasks, the hypernetwork predicts the parameters for the new network in fractions of a second, and in theory could make training unnecessary.

If A.I. training can skip tests, we can build A.I. faster and it can be more involved in the optimization process.

### Hypernetworks in AI Training

Currently, the best methods for training and optimizing deep neural networks are variations of a technique called stochastic gradient descent (SGD).

One can, in theory, start with lots of architectures, then optimize each one and pick the best. However this can be a laggy time-consumer process.

In 2018, Mengye Ren, now a visiting researcher at Google Brain, along with his former University of Toronto colleague Chris Zhang and their adviser Raquel Urtasun, tried a different approach. They designed what they called a graph hypernetwork (GHN) to find the best deep neural network architecture to solve some task, given a set of candidate architectures.

The name outlines their approach. “Graph” refers to the idea that the architecture of a deep neural network can be thought of as a mathematical graph — a collection of points, or nodes, connected by lines, or edges.

Here the nodes represent computational units (usually, an entire layer of a neural network), and edges represent the way these units are interconnected.

A graph hypernetwork starts with any architecture that needs optimizing (let’s call it the candidate). It then does its best to predict the ideal parameters for the candidate. The team then sets the parameters of an actual neural network to the predicted values and tests it on a given task. Ren’s team showed that this method could be used to rank candidate architectures and select the top performer.

When Knyazev and his colleagues came upon the graph hypernetwork idea, they realized they could build upon it. In their new paper, the team shows how to use GHNs not just to find the best architecture from some set of samples, but also to predict the parameters for the best network such that it performs well in an absolute sense.

### AI Building AI Metaphor

In essence, AI itself is becoming more useful in its own training and doing more and more of the optimization process itself. Read the paper here.

https://arxiv.org/abs/2110.13100

Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient.

By leveraging advances in graph neural networks, they propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU.

The proposed model achieves surprisingly good performance on unseen and diverse networks. Their model also learns a strong representation of neural architectures enabling their analysis.

The paper was submitted on October 25th, 2021.

**Parameter Prediction for Unseen Deep Architectures**

Boris Knyazev, Michal Drozdzal, Graham W. Taylor, Adriana Romero-Soriano

I consider this a fairly important breakthrough in Machine Learning.

Anything that demonstrates AI involved in its own training or learning is greatly interesting, even if here it’s more just an optimization mechanism that speeds up training.

If you enjoy articles like this one you might enjoy my Datascience Newsletter as well where I will be covering also programming trends.

My goal is to create a Network of Newsletters that **inspires and informs** on a range of topics I’m interested in.

## What is GHN-2

**Training the Trainer**

Knyazev and his team call their hypernetwork GHN-2, and it improves upon two important aspects of the graph hypernetwork built by Ren and colleagues.

This optimization of AI by AI will only get better. Eventually we can assume A.I. could be trained to completely train other A.I.

First, they relied on Ren’s technique of depicting the architecture of a neural network as a graph

The second idea they drew on was the method of training the hypernetwork to make predictions for new candidate architectures.

This requires two other neural networks. The first enables computations on the original candidate graph, resulting in updates to information associated with each node, and the second takes the updated nodes as input and predicts the parameters for the corresponding computational units of the candidate neural network.

These two networks also have their own parameters, which must be optimized before the hypernetwork can correctly predict parameter values.

###### Image Credit: Samuel Velasco/Quanta Magazine; source: arxiv.org/abs/2110.13100

For now all articles on AiSupremacy are free and I’m asking for community contributions, however the ratio of free to paid posts may change since currently this isn’t able to fund my writing or my (basic needs) lifestyle.

So why is this exciting? This isn’t really AI that builds other AIs, but it’s a headline for sure. Because the **hypernetwork learns** the extremely complex patterns in the designs of deep neural networks, the work may also have deeper theoretical implications.

For now, the hyper network performs surprisingly well in certain settings, but there's still room for it to grow -- which is only natural given the magnitude of the problem.

Research like this in the early 2020s is display how nascent A.I. and its training truly is and where it can go from here. The way we train artificial neural network (ANN) architectures is changing, and AI is automating some of the steps better improving the speed of optimization.

Knyazev’s team took these ideas and wrote their own software from scratch, since Ren’s team didn’t publicize their source code. Then Knyazev and colleagues improved upon it. Academic institutions mostly in China, the U.K. and the U.S. (and Canada) are at the bleeding edge of AI research. Toronto Canada may soon have the highest concentration of A.I. researchers in the world outside of the top Ivy League schools.

Boris Knyazev of the University of Guelph in Ontario has helped build a hypernetwork that’s designed to predict the parameters for an untrained neural network.

Read the original Quanta Magazine article if you want to break down the steps better.

**Beyond GHN-2**

Despite these successes, Knyazev thinks the machine learning community will at first resist using graph hypernetworks. He likens it to the resistance faced by deep neural networks before 2012.

I find that position pretty fascinating since these are the young minds supposedly at the forefront of AI research.

GHN-2 can only be trained to predict parameters to solve a given task, such as classifying either CIFAR-10 or ImageNet images, but not at the same time. In the future, he imagines training graph hypernetworks on a greater diversity of architectures and on different types of tasks (image recognition, speech recognition and natural language processing, for instance).

Where have we heard that before? Ah yes with Data2vec as well. A potentially big problem if hypernetworks like GHN-2 ever do become the standard method for optimizing neural networks. With graph hypernetworks, he said, “you have a neural network — essentially a black box — predicting the parameters of another neural network. AI bias could become so complex, it would be difficult to correct.

GHN-2 showcases the ability of graph neural networks to find patterns in complicated data.

GHN-2 finds patterns in the graphs of completely random neural network architectures. “That’s very complicated data.”

And yet, GHN-2 can generalize — meaning it can make reasonable predictions of parameters for unseen and even out-of-distribution network architectures. Black box AI is about to get an upgrade.

Thanks for reading!

If you think I write (currate, collect, format, aggregate) quality articles that many people would enjoy, by supporting me you enable me to seek out a greater audience.

My hopes is to build a Discord around A.I. and further a dynamic indie media startup publisher. Obviously I wouldn’t be able to write this much if I didn’t love what I do.

It indeed is an interesting article containing curated information logically assembled to portray an image of uniformity and consistency for a developing craft. I am looking forward to a time where this craftmanship of self-learning-AI becomes significant enough to collaborate among themselves to support humanities and human endeavors without further human intervention. Such a state-of-the-art of GHN-2 and beyond might result in higher standards of well being given the fragility of bio lives and microbiomes.