fbpx

Task representations in neural networks trained to perform many cognitive tasks Nature Neuroscience

It is well documented that the left primary motor cortex of the cerebrum controls the movement of the fingers of the right-hand and the right primary motor cortex controls the fingers of the left-hand, i.e., the somatomotor representations11. It is well documented the decussate cerebrocerebellar circuit, i.e., the left cerebral cortex is connected to the right cerebellar cortex and the right cerebral cortex is connected to the left cerebellar cortex, respectively. Accordingly, tapping the fingers of the right-hand should activate the contralateral cerebrocerebellar circuit with respect to the cerebrum, as evidenced in both resting-state and task-fMRI studies6,14,15,16,17. Further studies are needed to replicate this finding and explore the functional role of this ipsilateral cerebrocerebellar circuit. For example, we can consider multi-edge graphs or multigraphs, where a pair of nodes can share multiple types of edges, this happens when we want to model the interactions between nodes differently based on their type.

Task area of neural networks

Hard parameter sharing is the most commonly used approach to MTL in neural networks and goes back to [6]. It is generally applied by sharing the hidden layers between all tasks, while keeping several task-specific output layers. In this paper, towards NeuroAI, we have proposed the roadmap for task-based neurons via symbolic regression, which is a new frontier of neural network research compared to the architecture design.

Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings

When focusing on one node, after k-layers, the updated node representation has a limited viewpoint of all neighbors up to k-distance, essentially a subgraph representation. We can notice that models with higher dimensionality tend to have better mean and lower bound performance but the same trend is not found for the maximum. Since higher dimensionality is going to also involve a higher number of parameters, these observations go in hand with the previous figure. Note that in this simplest GNN formulation, weโ€™re not using the connectivity of the graph at all inside the GNN layer.

Task area of neural networks

As we loosely saw, the more graph attributes are communicating the more we tend to have better models. In this particular case, we could consider making molecular graphs more feature rich, by adding additional spatial relationships between nodes, adding edges that are not bonds, or explicit learnable relationships between subgraphs. When exploring the architecture choices above, you might have found some models have better performance than others. Are there some clear GNN design choices that will give us better performance? The answers are going to depend on the data, , and even different ways of featurizing and constructing graphs can give different answers.

Data Availability Statement

This GNN uses a separate multilayer perceptron (MLP) (or your favorite differentiable model) on each component of a graph; we call this a GNN layer. We do the same for each edge, learning a per-edge embedding, and also for the global-context vector, learning a single embedding for the entire graph. The structure of real-world graphs can vary greatly between different types of dataโ€‰โ€”โ€‰some graphs have many nodes with few connections between them, or vice versa.

Third, we build a modern GNN, walking through each of the parts of the model, starting with historic modeling innovations in the field. We move gradually from a bare-bones implementation to a state-of-the-art GNN model. Fourth and finally, we provide a GNN playground where you can play around with a real-word task and dataset to build a stronger intuition of how each component of a GNN model contributes to the predictions it makes. Over the course of this blog post, I will try to give a general overview of the current state of multi-task learning, in particular when it comes to MTL with deep neural networks. I will then introduce the two most frequently employed methods for MTL in Deep Learning. Subsequently, I will describe mechanisms that together illustrate why MTL works in practice.

Dynamic activity of human brain task-specific networks

In brief, the strong static synapses preserve the temporal variations in the presynaptic activity. It thus results in choice-selective inputs that are on the order of the spike-threshold. The recurrent inhibition then cancels the strong mean excitatory input, leading to a total excitatory and inhibitory inputs that are both on the order of the spike-threshold and choice-selective (see Prediction 4 in Methods). This is a similar mechanism that explains how, without training or functional structure, orientation-selective neurons can emerge in the primary visual cortex with a โ€˜salt-and-pepperโ€™ organization50,51. We found that, in the untrained excitatory neurons, the variance explained by the cortical-like activity increased monotonically with the modulation strength of the trained inhibitory neurons.

Task area of neural networks

These weights help determine the importance of any given variable, with larger ones contributing more significantly to the output compared to other inputs. Afterward, the output is passed through an activation function, which determines the output. If that output exceeds a given threshold, it โ€œfiresโ€ (or activates) the node, passing data to the next layer in the network.

1 Vectorized Symbolic Regression

Since these are high dimensional vectors, we reduce them to 2D via principal component analysis (PCA). A perfect model would visibility separate labeled data, but since we are reducing dimensionality and also have imperfect models, this boundary might be harder to see. We represent each molecule as a graph, where atoms are nodes containing a one-hot encoding for its atomic identity (Carbon, Nitrogen, Oxygen, Fluorine) and bonds are edges containing a one-hot encoding its bond type (single, double, triple or aromatic). We can incorporate the information from neighboring edges in the same way we used neighboring node information earlier, by first pooling the edge information, transforming it with an update function, and storing it. Just as pooling can be applied to either nodes or edges, message passing can occur between either nodes or edges.

Task area of neural networks

These networks can be incredibly complex and consist of millions of parameters to classify and recognize the input it receives. In this section, we have discussed different auxiliary tasks that can be used to leverage MTL even if we only care about one task. We still do not know, though, what auxiliary task will be useful in practice. Finding an auxiliary task is largely based on the assumption that the auxiliary task should be related to the main task in some way and that it should be helpful for predicting the main task. In order to encourage similarity between different tasks, they propose to make the mean task-dependent and introduce a clustering of the tasks using a mixture distribution.

Machine learning vs. deep learning

These results suggested that the stronger rate modulations in the fast-spiking ALM neurons enabled the trained inhibitory neurons in the model to spread their activity patterns to the untrained neurons more effectively. Accumulating evidence shows that inhibition in cortex is highly plastic (e.g. see review by46). We found that the fidelity of spreading the activity was higher when the inhibitory neurons were trained instead of the excitatory ones. We speculate that this is a characteristic of the operating regime of cortical networks, in which typically the baseline spiking rates of inhibitory neurons is higher than the excitatory neurons. Inhibitory neurons can thus support stronger rate modulations (Fig. 4B), which in turn improves the fidelity of the spread (Fig. 4A, Fig. 5, Methods).

  • Looking at the weights of individual connections wonโ€™t answer that question.
  • Then we analyzed the loadings of the dominant PC mode in the ALM data, which were the slopes of the ramping activity of the synaptic inputs.
  • The fraction equals 1 (left) if all the neurons in the trained subnetwork are trained.
  • To address this question, we considered two training scenarios where either the excitatory or the inhibitory subnetwork (but not both) was trained to generate the target activity patterns (Fig.ย 4A, right).
  • We set the fixed random seed to ensure that the symbolic regression can be repeated.
  • This result suggested that the leading PC modes, due to their strong modulations, can spread more robustly to the rest of the neurons, promoting low-dimensional neural dynamics across a strongly coupled network.

Selecting and designing optimal aggregation operations is an open research topic. A desirable property of an aggregation operation is that similar how to use neural network inputs provide similar aggregated outputs, and vice-versa. Some very simple candidate permutation-invariant operations are sum, mean, and max.

So around the turn of the century, neural networks were supplanted by support vector machines, an alternative approach to machine learning thatโ€™s based on some very clean and elegant mathematics. It tries to simulate the human brain, so it has many layers of โ€œneuronsโ€ just like the neurons in our brain. The first layer of neurons will receive inputs like images, video, sound, text, etc. This input data goes through all the layers, as the output of one layer is fed into the next layer.

Task area of neural networks


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *