Research Lecture Notes

Research talk 10 - Structured Prediction and Deep Learning

Andrew McCallum is a Professor at University of Massachusetts Amherst. His well-known work Conditional Random Fields has been widely used in Natural Language Processing as well as other domains. His recent research interest is regarding the intersection of structured prediction and deep learning, hoping to extend the current success of deep learning to more NLP applications.

Structured prediction includes a bunch of common tasks in natural language processing, like spam filtering, chinese word segmentation and multi-label document/image classification. In all these prediction tasks, the long-range dependency plays a critical role and should be considered carefully. So far the most successes of structured prediction are achieved by graphical models, and CRF is one special graphical model which is proved to be quite powerful. Actually the graphical model is very related to neural networks, and in some way it could be improved by incorporating the deep learning thought.

There are multiple ways to apply the deep learning on structured prediction. Firstly we can just make use of the feature representation in deep neural networks, like directly impose the deep features and predict each target dependently. However all these methods have shortcomings like doesn’t consider global dependency, hard to optimize. So in this lecture, he mainly introduced their work called “structured prediction energy networks”, which uses a deep architecture to learn rich representations of output dependencies. Most importantly, their model replaces the factors in the factor graph with a neural network yielding a scalar energy. Interestingly, this idea is also appropriate to multiple output scenarios like keyphrase prediction.

Basically, considering the dependency of multiple outputs is a good idea which is borrowed from some graphical models, and the results demonstrate it’s power on most important structured prediction tasks. It’s very exciting to see the incorporation of graphical model and neural networks, and this may bring breakthrough on explainable deep models.

Research talk 9 - The Next Frontier in AI Unsupervised Learning

Yann Lecun is one of the most famous scientist in Deep Learning domain. He is the creator of Convolutional Neural Networks and the director of AI Research at Facebook.

Deep learning has made great processes in several domains such as autonomous transportation and medical image understanding. But so far the most influential achievements have been made in Deep Learning are by supervised learning, in which the machine is trained with inputs labeled by humans. However the labeling work is time-consuming and expensive, and supervised is not the primary way of real human intelligence. In contrast, unsupervised learning is the more promising method of building real artificial intelligence.

AI systems today do not possess “common sense”, which humans and animals acquire by observing the world, acting in it, and understanding the physical constraints of it. So unsupervised learning is believed to be the key towards machines with common sense, which allows machines to learn from raw, unlabeled data, such as video or text.

The image above is very interesting. Yann likened the artificial intelligence problem to a cake. In this image he pointed out which part can be solved by different method, and he referred the main body to unsupervised learning. Both reinforcement learning and supervised learning can solve very small part of the whole problem, as these two methods are limited to certain problems where a direct guidance (e.g. labeled data or judgment) is available.

In this talk, two major research outputs he showed are Memory Networks and Generative Adversarial Networks. The former one provide a new architecture which implements a key-value function, enabling the neural networks to call very long term memories. The latter one is described as the most promising idea in Deep Learning so far by Yann. In this work, there are a image generater and a discriminator. The generater tries to fake a new image to confuse the discriminator. And the discriminator aims to distinguish the faked ones from real images. By training in a adversarial way, both two models will improve their performance. There’s no labeled data needed for GAN, but it can generate very impressive results.

Research talk 8 - Explain and answer - Intelligent systems which communicate about what they see

Marcus Rohrbach is a PostDoc at UC Berkley and his work has won the best paper award in NAACL 2016. In this lecture, he provides a very exciting combination of state-of-the-art Computer Vision and Natural Language Processing. We have been quite familiar with another CV-NLP task in which to generate descriptive caption for a image. In this work, instead of generating descriptive text, their system could answer questions with regard to a given image.

In real life, the most information is gained by interacting with the visual world. And language is the most important channel for humans to communicate about what they see. However for now the language and vision are only loosely connected. Thanks to the recent development on computer vision and natural language processing, we are able to achieve a better integration on these two parts. Marcus’s work is mainly focused on two components. One component in a successful communication is the ability to answer natural language questions about the visual world. A second component is the ability of the system to explain in natural language, why it gave a certain answer, allowing a human to trust and understand it.

Basically, the vision part is implemented by Convolutional Neural Networks and language is processed by Recurrent Neural Networks. They build models which answer questions, and at the same time the system would highlight the semantically relevant parts on image, which exposes their semantic reasoning structure. Besides, a number of tasks are accomplished, e.g. video caption generation, automatic role matching.

Research talk 7 - Artificial intelligence for Wall Street

Dr. James Zhang is a senior engineer from the R&D Machine Learning team in Bloomberg. In this talk he spoke on the applications of Machine Learning and Natural Language Processing on finance, which add value to the huge streams of real-time data being generated by the global financial markets.

James talked about some examples in the real world which reflect the characteristics of fintech. As the stock price is very sensitive to the media, a highly responsive system is needed to capture any possible important news and extract the information of interest. Like in a example James gave, the shares of a stock surged more than 20% just two minutes after a important report was announced. What also worth-noting is that the key content is actually hidden within only a few sentences, which requires strong natural language processing techniques to pinpoint.

He also showed some other interesting problems in Bloomberg, such as analyzing and making predictions from millions of news stories and social media messages each day.

Research talk 6 - Active Optimization for Embedded Learning Systems

An excellent research talk from Jeff Schneider at CMU. He presented his research on active learning as well as some amazing applications on embedded systems, including the recently emerged Uber self-driving car.

The Outline of the lecture

The Outline of the lecture

The active optimization mainly concerns the problem of learning and optimizing in an ever-changing environment, so it’s a perpetual on-line activity rather than a one-off task. He started with the development of the autonomous vehicle technologies, in which CMU has been making great progresses. The changes on car and technology, e.g. more data, more computing power, more precise and less costly sensors, bring opportunities of developing a real self-driving car.

Uber autonomous vehicle

Uber autonomous vehicle

The active optimization methods are the main topic of this talk, and also one of the most important algorithms driving the Uber self-driving car. Jeff introduced the basic idea of active optimization and an interesting application robot snake. However the classic models are not capable of handling the high dimensionality and multi-fidelity environment. So he referred to two algorithms which focus on scaling up the dimensionality and managing multi-fidelity evaluations. These algorithms work well on different tasks and are beneficial to autopilot.

Active Optimization for Embedded Learning Systems

Active Optimization for Embedded Learning Systems

Research talk 5 - A Computational Model and Theory of Cortex

Related paper: What must a global theory of cortex explain?

and video@youtube: What Should a Computational Theory of Cortex Explain?

This is a keynote from the Turing Award winner Leslie Valiant at WI2016. It’s really lucky and exciting to see the keynote from a giant in computer science domain, especially it’s about a very hot topic regarding the human intelligence! Actually this is a quite high level theory and requires a bunch of prerequisites in neural science domain. Though I cannot fully understand these contents, hopefully it may help inspire new idea in the future.

A Computational Model and Theory of Cortex, keynote at WI2016

A Computational Model and Theory of Cortex, keynote at WI2016

He notes that there are four basic model tasks that are supported by neural circuits and associated algorithms and each requires some circuit modification: memory allocation, association, supervised memorization, and inductive learning of threshold functions. The capacity of these circuits created for the cumulative efficacy of the many past acts. Thus the earlier acts of learning need to be retained without undue interference from the more recent ones.

A basic prerequisite for this endeavor is that of devising an appropriate model of computation that reflects the gross quantitative parameters of cortex. The goals of this revised model are shown below:

  1. Use neural model that underestimates the brain (don’t over interpret)
  2. Specify some challenging set of multiple task types.
  3. Show that these task types can be executed on the model.
  4. Show that sequences of thousands of interleaved tasks of these types can be supported without degradation of earliest

He mentioned a bunch of important concepts and hypotheses that underlie his computation theory, including positive representations, random access, the neuroidal model, and vicinal algorithms. Though these hypotheses have been proved somehow, they are not perfect. For example he mentioned that the Hierarchical Memorization mechanism is unstable, but the hippocampus can work as a stable memory allocator for cortex. Thus he proposed a theory of cortex with the mediation of hippocampus which may explain the memory mechanism.

Research talk 4 - Context-Awareness In Information Retrieval and Recommender Systems

Related slides: Tutorial: Context-Awareness In Information Retrieval and Recommender Systems

This is a tutorial from Yong Zheng at WI2016.

Context-Awareness In Information Retrieval and Recommender Systems, tutorial at WI2016

Context-Awareness In Information Retrieval and Recommender Systems, tutorial at WI2016

Currently, the use of most information systems are independent from their users’ environmental factors, say, systems disregard the context of users. However these factors can work as critical information supplement, especially when the system are unable to make users’ needs explicit based merely on queries and profile information. Obviously, information retrieval and recommender system are the two most appropriate applications to utilize contextual information. The main difference between these two is that query is explicitly required by information retrieval, but not by recommender system.

So what’s context? A well-recognized definition is Context is any information that can be used to characterize the situation of an entity. The contexts can be collected from multiple sources, like sensors, user inputs(survey or user interactions) and inference from user reviews. The most common contextual variables include:

  • Time and Location
  • User intent or purpose
  • User emotional states
  • Devices
  • Topics of interests, e.g., apple vs. Apple
  • Others: companion, weather, budget, etc

Five contextual modeling methods are highlighted here:

  • Tensor Factorization, 2010
  • Context-aware Matrix Factorization, 2011
  • Factorization Machines, 2011
  • Deviation-Based Contextual Modeling, 2014
  • Similarity-Based Contextual Modeling, 2015

And a few worth-noting challenges:

  • Numeric v.s. Categorical Context Information: various kinds of context features.
  • Explanations by Context: hard to explain the recommended results.
  • New User Interfaces and Interactions: design new UI to help collect/interact/explain context information.
  • User Intent Predictions or References in IR and RecSys
  • Cold-start and Data Sparisty Problems

Research talk 3 - Recommending Remedial Learning Materials to the Students by Filling their Knowledge Gaps

Related paper: Recommending Remedial Learning Materials to the Students by Filling their Knowledge Gaps

This is a invited talk from Konstanin Bauman at WI2016.

Recommending Remedial Learning Materials to the Students by Filling their Knowledge Gaps, invited talk at WI2016

Recommending Remedial Learning Materials to the Students by Filling their Knowledge Gaps, invited talk at WI2016

Their research mainly concerns the online education scenario. They proposed a method to identify gaps in students’ knowledge of online courses and recommend students with helpful remedial materials. Specifically speaking, the knowledge they try to capture is based on a knowledge tree, in which the leaves are topics from a domain taxonomy(automatically built from the syllabus). The knowledge gap is identified based on the feedback from course quizzes questions. If a student has a low performance score in a topic, we may presume that this student has a knowledge gap in this topic. Thus we can provide him/her recommendation materials by emailing to help fill the gap.

An A/B test on 910 students has been carried out to evaluate this approach. The improvement of average score on exam indicates the effectiveness of proposed recommendation method.

Research talk 2 - Modelling Interactive IR by Norbert Fuhr

Related slides:

It’s a wonderful talk given by the Salton Award winner Norbert Fuhr. Speaking of interactive IR, my first reaction is about HCI. As an expert in probabilistic information retrieval, Norbert interpreters this problem in a probabilistic manner.

Modelling Interactive IR by Norbert Fuhr, lecture at ASIRF 2016

Modelling Interactive IR by Norbert Fuhr, lecture at ASIRF 2016

He assumes that the search behavior of a user consists of a series of actions, which are provided by the system. User may evaluate these possible actions and perform one of them, as only positive decisions are of benefit for a user. In this way, users’ search behavior can be modeled as a Markov Decision Process (MDP). And accordingly we can embed users’ session information into ranking.

Research talk 1 - Expert Search by Andreas Henrich

Andreas Henrich is a professor at University of Bamberg. In this talk, he mainly introduced an important application of Information Retrieval in industry - Expert Search. Different from other general academic talks, he didn’t talk too much about some novel scientific ideas. Instead, he reviewed many mature techniques in expert search and emphasized the content of how to implement a real system.

Expert Search by Andreas Henrich, lecture at ASIRF 2016

Expert Search by Andreas Henrich, lecture at ASIRF 2016

The primary models used in expert search include:

  • Candidate Generation Model: like language model, score of expert is estimated by the query and his/her documents. But this method may fail due to the little query information.
  • Topic Generation Model: improve the CGM by introducing prior knowledge and avoiding query-based generation.
  • Voting Model: the ranking of documents with respect to query can be treated as a voting for candidate experts.
  • Other models like graph-based model, measuring experts’ influence based on their centrality in the network.

Unlike general document retrieval, entity retrieval primarily concerns the problem of finding the relevant items, in which the expert retrieval is a representative. However it’s still a difficult problem regarding how to represent an entity.