IBM Research recently announced an Augmented Memory Neural Network (MANN) AI system consisting of a neural network controller and phase change memory (PCM) hardware. By performing in-memory analog calculations on large-dimension (HD) binary vectors, the system learns few-hit classification tasks on the Omniglot benchmark with only a 2.7% drop in accuracy compared to 32-bit software implementations .
The team described the system and a set of experiments in an article published in Nature Communications. The neural network component of the system learns an attention mechanism that maps inputs to keys, which are then used to query the most similar value from memory. Inspired by vector-symbolic computing, the researchers chose to represent keys as large-dimensional binary vectors — that is, vectors whose elements are only 1 or 0. Keys are stored in a content addressable memory, and their binary representation allows an efficient hardware implementation of the O (1) complexity of the similarity request. In a few-shot image classification experiment using the Omniglot dataset, the system achieved 91.83% accuracy, compared to 94.53% by a software implementation using 32-bit true-valued vectors. . According to the authors,
The critical insight provided by our work, namely directed engineering of HD vector representations as explicit memory for MANNs, facilitates efficient few-tap learning tasks using in-memory computing. It could also allow applications beyond classification, such as merging, compressing, and reasoning at the symbolic level.
While deep learning models are often good enough at generalizing from example data, they sometimes exhibit catastrophic oversight – a loss of previously learned data that can occur as the model learns new information. To solve this problem, MANN systems incorporate an external, content-addressable memory that stores keys, or learned patterns, and values, which are outputs associated with the patterns. The neural network component of a MANN system then learns to map an entry to a query based on a memory key; an attention mechanism then uses the results of the query to produce final output. In particular, the network learns to map dissimilar input elements to almost orthogonal keys, i.e. with low cosine similarity. However, to query from memory, a query key must be compared, applying cosine similarity, to all keys to find the best matching key in memory. This creates a performance bottleneck in the system.
To resolve this performance bottleneck, IBM researchers used a key representation derived from vector-symbolic architectures (VSA). In these systems, the concepts or symbols, such as “blue color” or “square shape”, are represented as very large vectors — on the order of a thousand. This high dimension means that the vectors chosen at random are likely to be orthogonal, which makes the cosine similarity computation robust to noise. For this reason, the team showed that vector components can be “clipped” to be just +1 or -1, and that they can additionally be stored as 1 or 0. These vectors are then stored in a cross table of memristive devices, which provides an efficient hardware implementation of cosine similarity.
To demonstrate the performance of the system, the researchers used it to perform some training image classification tasks. The network was formed on a subset of the Omniglot dataset, which contains images of handwritten characters from 50 different alphabets. The training data consisted of five examples, each of 100 classes of images, which the authors say is the “biggest problem ever tried” on Omniglot.
In a discussion of the work on Hacker News, users noted the trend for “hybrid” AI systems that combine neural networks with other AI techniques:
I think the old AI led to the first AI winter because it had bad mechanisms for dealing with uncertainty. However, many mechanisms in old expert systems will make a comeback once we know how to combine symbolic systems with neural and probabilistic systems. Take a look at The Art of Prolog. Many ideas are reused there in modern inductive logic and response set programming systems.
InfoQ recently covered several of these hybrid systems, as well as more efficient hardware implementations of AI computation. Earlier this year, Baidu announced its ERNIE model which combines deep learning on unstructured text with structured knowledge graph data, to produce more consistent generated responses, and Facebook opened its BlenderBot chatbot which incorporates memory to long term that can follow the context of the conversation over several weeks or even months. MIT recently announced a prototype photonic device for deep learning inference that reduces power consumption compared to traditional electronic devices.