Photos: Talk at Geneva Motor Show 2019:a,b,c,d. Talk in Football Stadium 2019:a,b,c,d. WEF, Davos (2018): a,b. National Geographic (2017): a,d. Others: e,f,g. More:2015,2015,2015,2010,more pics (1963-2007)
MEN who left their mark:Einstein (general relativity, 1915),Zuse (first computer, 1935-41),Goedel (limits of math and computation, 1931),Turing (Turing machine, 1936: Nature 429 p 501),Gauss (mathematician of the millennium),Leibniz (1st computer scientist),Schickard (father of the computer age),Solomonoff (theory of optimal prediction),Darwin (Nature 452 p 530),Haber & Bosch (1913: most influential invention of the 20th century),Archimedes (greatest scientist ever?)
China and former empires (letters in Newsweek, 2004-05). The European Union - A New Kind of Empire? (2009)
FAMILY Ulrike Krommer (wife) Julia & Leonie (kids) Schmidhuber's little brother Christof, a theoretical physicist turned finance guru (see interview). His papers: most famous / most readable / best / craziest; his wife:Prof. Beliakova, a topologist.
Artificial Recurrent Neural Networks (1989-2014). Most work in machine learning focuses on machines with reactive behavior. RNNs, however, are more general sequence processors inspired by human brains. They have adaptive feedback connections and are in principle as powerful as any computer. The first RNNs could not learn to look far back into the past. But our "Long Short-Term Memory" (LSTM) RNN overcomes this fundamental problem, and efficiently learns to solve many previously unlearnable tasks. It can be used for speech recognition, time series prediction, music composition, etc. In 2009, our LSTM RNNs became the first recurrent Deep Learningsystems to win official international competitions (with secret test set known only to the organisers) - they outperformed all other known methods on the difficult problem of recognizing unsegmented cursive handwriting, and also on aspects of speech recognition.They learn through gradient descent and / or evolution or both. Compare the RNN Book Preface. LSTM has become popular: Google, Apple, Microsoft, Facebook, IBM, Baidu, and many other companiesused LSTM RNNs to improve large vocabulary speech recognition, machine translation, language identification / time series prediction / text-to-speech synthesis, etc. Deep Learning & Computer Vision with Fast Deep Neural Nets. The future of search engines and robotics lies in image and video recognition. Since 2009, ourDeep Learning team has won 9 (nine) first prizes in important and highly competitive international contests (with secret test sets known only to the organisers), far more than any other team. Our neural nets also set numerous world records, and were the first Deep Learners to win pattern recognition contests in general (2009),the first to win object detection contests (2012), thefirst to win a pure image segmentation contest (2012), and the first machine learning methods to reach superhuman visual recognition performance in a contest (2011). Compare this Google Tech Talk (2011)and JS' first Deep Learning system of 1991, with a Deep Learning timeline 1962-2013. See also thehistory of computer vision contests won by deep CNNs on GPU since 2011. And check out the amazingHighway Networks (2015), the deepest of them all.Gödel machine: An old dream of computer scientists is to build an optimally efficient universal problem solver. The Gödel machinecan be implemented on a traditional computer and solves any given computational problem in an optimal fashion inspired by Kurt Gödel's celebrated self-referential formulas (1931). It starts with an axiomatic description of itself, and we may plug in any utility function, such as the expected future reward of a robot. Using an efficient proof searcher, the Gödel machine will rewrite any part of its software (including the proof searcher)as soon as it has found a proof that this will improve its future performance, given the utility function and the typically limited computational resources. Self-rewrites are globally optimal (no local maxima!) since provably none of all the alternative rewrites and proofs (those that could be found by continuing the proof search) are worth waiting for. The Gödel machine formalizes I. J. Good's informal remarks (1965) on an "intelligence explosion" through self-improving "super-intelligences".Summary.FAQ.Optimal Ordered Problem Solver.OOPS solves one task after another, through search for solution- computing programs. The incremental method optimally exploits solutions to earlier tasks when possible - compare principles of Levin's optimal universal search. OOPS can temporarily rewrite its own search procedure, efficiently searching for faster search methods (metasearching ormetalearning). It is applicable to problems of optimization or prediction. Talk slides.Super Omegas and Generalized Kolmogorov Complexity and Algorithmic Probability.Kolmogorov's (left) complexity K(x) of a bitstring x is the length of the shortest program that computes x and halts. Solomonoff'salgorithmic probability of x is the probability of guessing a program for x. Chaitin's Omega is the halting probability of a Turing machine with random input (Omega is known as the "number of wisdom" because it compactly encodes all mathematical truth). Schmidhuber generalized all of this to non-halting but converging programs. This led to the shortest possible formal descriptions and to non-enumerable but limit-computable measures and Super Omegas, and even has consequences for computable universes and optimal inductive inference. Slides. Universal Learning Algorithms.There is a theoretically optimal way of predicting the future, given the past. It can be used to define an optimal (though noncomputable) rational agent that maximizes its expected reward in almost arbitrary environments sampled from computable probability distributions. This work represents the first mathematically sound theory of universal artificial intelligence - most previous work on AI was either heuristic or very limited. Speed Prior.Occam's Razor: prefer simple solutions to complex ones. But what exactly does "simple" mean? According to tradition something is simple if it has a short description or program, that is, it has low Kolmogorov complexity.This leads to Solomonoff's & Levin's miraculous probability measure which yields optimal though noncomputable predictions, given past observations. The Speed Prioris different though: it is a new simplicity measure based on the fastest way of describing objects, not the shortest. Unlike the traditional one, it leads to near-optimal computable predictions, and provokes unusual prophecies concerning the future of our universe.Talk slides.Transcript ofTEDx talk. In the Beginning was the Code.In 1996 Schmidhuber wrote the first paper about all possible computable universes. His_`Great Programmer'_is consistent with Zuse's thesis (1967) of computable physics, against which there is no physical evidence, contrary to common belief. If everything is computable, then which exactly is our universe's program? It turns out that the simplest program computes all universes,not just ours. Later work (2000) on Algorithmic Theories of Everything analyzed all the universes with limit-computable probabilities as well as the very limits of formal describability. This paper led to above-mentioned generalizations of algorithmic information and probability and Super Omegas as well as theSpeed Prior.See comments on Wolfram's 2002 book and letteron randomness in physics (Nature 439, 2006). Talk slides,TEDx video,transcript.Learning Robots.Some hardwired robots achieve impressive feats. But they do not learn like babies do. Traditionalreinforcement learning algorithmsare limited to simple reactive behavior and do not work well for realistic robots. Hence robot learning requires novel methods for learning to identify important past events and memorize them until needed. Our group is focusing on the above-mentionedrecurrent neural networks, RNN evolution,Compressed Network Search, and policy gradients.Collaborations: with UniBW on robot cars, with TUM-AM onhumanoids learning to walk, with DLR on artificial hands. New IDSIA projects on developmental roboticswith curious adaptive humanoids have started in 2009. SeeAAAI 2013 Best Student Video.Financial Forecasting.Our most lucrative neural network application employs a second-order methodfor finding the simplest model of stock market training data. Learning attentive vision.Humans and other biological systems use sequential gaze shifts for pattern recognition. This can be much more efficient than fully parallel approaches to vision. In 1990 we built an artificial fovea controlled by an adaptive neural controller. Without a teacher, it learns to find targets in a visual scene, and to track moving targets..