TY - JOUR
TI - A Predicate/State Transformer Semantics for Bayesian Learning
AU - Jacobs, Bart
AU - Zanasi, Fabio
T2 - Electronic Notes in Theoretical Computer Science
T3 - The Thirty-second Conference on the Mathematical Foundations of Programming Semantics (MFPS XXXII)
AB - This paper establishes a link between Bayesian inference (learning) and predicate and state transformer operations from programming semantics and logic. Specifically, a very general definition of backward inference is given via first applying a predicate transformer and then conditioning. Analogously, forward inference involves first conditioning and then applying a state transformer. These definitions are illustrated in many examples in discrete and continuous probability theory and also in quantum theory.
DA - 2016/10/05/
PY - 2016
DO - 10/ggdgbb
DP - ScienceDirect
VL - 325
SP - 185
EP - 200
J2 - Electronic Notes in Theoretical Computer Science
LA - en
SN - 1571-0661
UR - http://www.sciencedirect.com/science/article/pii/S1571066116300883
Y2 - 2019/11/24/12:04:12
KW - Bayesianism
KW - Categorical ML
KW - Categorical probability theory
KW - Effectus theory
KW - Programming language theory
KW - Semantics
ER -
TY - JOUR
TI - A Tutorial on Learning With Bayesian Networks
AU - Heckerman, David
AB - A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can …
DA - 1995/03/01/
PY - 1995
DP - www.microsoft.com
LA - en-US
UR - https://www.microsoft.com/en-us/research/publication/a-tutorial-on-learning-with-bayesian-networks/
Y2 - 2019/11/22/19:09:15
KW - Bayesianism
KW - Classical ML
KW - Machine learning
ER -
TY - JOUR
TI - Adversarial examples in the physical world
AU - Kurakin, Alexey
AU - Goodfellow, Ian
AU - Bengio, Samy
T2 - arXiv:1607.02533 [cs, stat]
AB - Most existing machine learning classifiers are highly vulnerable to adversarial examples. An adversarial example is a sample of input data which has been modified very slightly in a way that is intended to cause a machine learning classifier to misclassify it. In many cases, these modifications can be so subtle that a human observer does not even notice the modification at all, yet the classifier still makes a mistake. Adversarial examples pose security concerns because they could be used to perform an attack on machine learning systems, even if the adversary has no access to the underlying model. Up to now, all previous work have assumed a threat model in which the adversary can feed data directly into the machine learning classifier. This is not always the case for systems operating in the physical world, for example those which are using signals from cameras and other sensors as an input. This paper shows that even in such physical world scenarios, machine learning systems are vulnerable to adversarial examples. We demonstrate this by feeding adversarial images obtained from cell-phone camera to an ImageNet Inception classifier and measuring the classification accuracy of the system. We find that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera.
DA - 2017/02/10/
PY - 2017
DP - arXiv.org
UR - http://arxiv.org/abs/1607.02533
Y2 - 2019/11/23/14:08:43
KW - Adversarial attacks
KW - Classical ML
KW - Machine learning
ER -
TY - JOUR
TI - Adversarial Patch
AU - Brown, Tom B.
AU - Mané, Dandelion
AU - Roy, Aurko
AU - Abadi, Martín
AU - Gilmer, Justin
T2 - arXiv:1712.09665 [cs]
AB - We present a method to create universal, robust, targeted adversarial image patches in the real world. The patches are universal because they can be used to attack any scene, robust because they work under a wide variety of transformations, and targeted because they can cause a classifier to output any target class. These adversarial patches can be printed, added to any scene, photographed, and presented to image classifiers; even when the patches are small, they cause the classifiers to ignore the other items in the scene and report a chosen target class. To reproduce the results from the paper, our code is available at https://github.com/tensorflow/cleverhans/tree/master/examples/adversarial_patch
DA - 2018/05/16/
PY - 2018
DP - arXiv.org
UR - http://arxiv.org/abs/1712.09665
Y2 - 2019/11/23/14:10:12
KW - Adversarial attacks
KW - Classical ML
KW - Machine learning
ER -
TY - SLIDE
TI - Algebra and Artiﬁcial Intelligence
A2 - Murfet, Daniel
LA - en
KW - Algebra
KW - Classical ML
KW - Machine learning
KW - Sketchy
ER -
TY - CONF
TI - Algebraic classifiers: a generic approach to fast cross-validation, online training, and parallel training
AU - Izbicki, Michael
AB - We use abstract algebra to derive new algorithms for fast cross-validation, online learning, and parallel learning. To use these algorithms on a classification model, we must show that the model has appropriate algebraic structure. It is easy to give algebraic structure to some models, and we do this explicitly for Bayesian classifiers and a novel variation of decision stumps called HomStumps. But not all classifiers have an obvious structure, so we introduce the Free HomTrainer. This can be used to give a "generic" algebraic structure to any classifier. We use the Free HomTrainer to give algebraic structure to bagging and boosting. In so doing, we derive novel online and parallel algorithms, and present the first fast cross-validation schemes for these classifiers.
C3 - ICML
DA - 2013///
PY - 2013
DP - Semantic Scholar
ST - Algebraic classifiers
KW - Algebra
KW - Categorical ML
KW - Machine learning
ER -
TY - ELEC
TI - Algebraic Geometry and Statistical Learning Theory
AU - Watanabe, Sumio
T2 - Cambridge Core
AB - Cambridge Core - Pattern Recognition and Machine Learning - Algebraic Geometry and Statistical Learning Theory - by Sumio Watanabe
DA - 2009/08//
PY - 2009
LA - en
UR - /core/books/algebraic-geometry-and-statistical-learning-theory/9C8FD1BDC817E2FC79117C7F41544A3A
Y2 - 2019/11/22/18:05:57
KW - Algebra
KW - Bayesianism
KW - Purely theoretical
KW - Statistical learning theory
ER -
TY - JOUR
TI - Algebraic Machine Learning
AU - Martin-Maroto, Fernando
AU - de Polavieja, Gonzalo G.
T2 - arXiv:1803.05252 [cs, math]
AB - Machine learning algorithms use error function minimization to fit a large set of parameters in a preexisting model. However, error minimization eventually leads to a memorization of the training dataset, losing the ability to generalize to other datasets. To achieve generalization something else is needed, for example a regularization method or stopping the training when error in a validation dataset is minimal. Here we propose a different approach to learning and generalization that is parameter-free, fully discrete and that does not use function minimization. We use the training data to find an algebraic representation with minimal size and maximal freedom, explicitly expressed as a product of irreducible components. This algebraic representation is shown to directly generalize, giving high accuracy in test data, more so the smaller the representation. We prove that the number of generalizing representations can be very large and the algebra only needs to find one. We also derive and test a relationship between compression and error rate. We give results for a simple problem solved step by step, hand-written character recognition, and the Queens Completion problem as an example of unsupervised learning. As an alternative to statistical learning, algebraic learning may offer advantages in combining bottom-up and top-down information, formal concept derivation from data and large-scale parallelization.
DA - 2018/03/14/
PY - 2018
DP - arXiv.org
UR - http://arxiv.org/abs/1803.05252
Y2 - 2019/10/10/11:42:39
ER -
TY - COMP
TI - amzn/milan
AU - Borchert, Tom
AB - Milan is a Scala API and runtime infrastructure for building data-oriented systems, built on top of Apache Flink.
DA - 2019/11/25/T14:52:44Z
PY - 2019
DP - GitHub
LA - Scala
PB - Amazon
UR - https://github.com/amzn/milan
Y2 - 2019/11/27/19:46:21
KW - Implementation
KW - Machine learning
KW - Probabilistic programming
ER -
TY - JOUR
TI - Analogues of mental simulation and imagination in deep learning
AU - Hamrick, Jessica B
T2 - Current Opinion in Behavioral Sciences
T3 - SI: 29: Artificial Intelligence (2019)
AB - Mental simulation—the capacity to imagine what will or what could be—is a salient feature of human cognition, playing a key role in a wide range of cognitive abilities. In artificial intelligence, the last few years have seen the development of methods which are analogous to mental models and mental simulation. This paper outlines recent methods in deep learning for constructing such models from data and learning to use them via reinforcement learning, and compares such approaches to human mental simulation. Model-based methods in deep learning can serve as powerful tools for building and scaling cognitive models. However, a number of challenges remain in matching the capacity of human mental simulation for efficiency, compositionality, generalization, and creativity.
DA - 2019/10/01/
PY - 2019
DO - 10.1016/j.cobeha.2018.12.011
DP - ScienceDirect
VL - 29
SP - 8
EP - 16
J2 - Current Opinion in Behavioral Sciences
SN - 2352-1546
UR - http://www.sciencedirect.com/science/article/pii/S2352154618301670
Y2 - 2019/10/10/19:15:54
ER -
TY - JOUR
TI - Attention and Augmented Recurrent Neural Networks
AU - Olah, Chris
AU - Carter, Shan
T2 - Distill
AB - A visual overview of neural attention, and the powerful extensions of neural networks being built on top of it.
DA - 2016/09/08/
PY - 2016
DO - 10/gf33sg
DP - distill.pub
VL - 1
IS - 9
SP - e1
J2 - Distill
LA - en
SN - 2476-0757
UR - http://distill.pub/2016/augmented-rnns
Y2 - 2019/11/22/20:09:48
KW - Classical ML
KW - Machine learning
ER -
TY - JOUR
TI - Automatic differentiation in machine learning: a survey
AU - Baydin, Atilim Gunes
AU - Pearlmutter, Barak A.
AU - Radul, Alexey Andreyevich
AU - Siskind, Jeffrey Mark
T2 - arXiv:1502.05767 [cs, stat]
AB - Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences, and engineering design optimization. Until very recently, the fields of machine learning and AD have largely been unaware of each other and, in some cases, have independently discovered each other's results. Despite its relevance, general-purpose AD has been missing from the machine learning toolbox, a situation slowly changing with its ongoing adoption under the names "dynamic computational graphs" and "differentiable programming". We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. By precisely defining the main differentiation techniques and their interrelationships, we aim to bring clarity to the usage of the terms "autodiff", "automatic differentiation", and "symbolic differentiation" as these are encountered more and more in machine learning settings.
DA - 2018/02/05/
PY - 2018
DP - arXiv.org
ST - Automatic differentiation in machine learning
UR - http://arxiv.org/abs/1502.05767
Y2 - 2019/11/22/22:28:45
KW - Automatic differentiation
KW - Classical ML
KW - Differentiation
KW - Machine learning
ER -
TY - JOUR
TI - Backprop as Functor: A compositional perspective on supervised learning
AU - Fong, Brendan
AU - Spivak, David I.
AU - Tuyéras, Rémy
T2 - arXiv:1711.10455 [cs, math]
AB - A supervised learning algorithm searches over a set of functions $A \to B$ parametrised by a space $P$ to find the best approximation to some ideal function $f\colon A \to B$. It does this by taking examples $(a,f(a)) \in A\times B$, and updating the parameter according to some rule. We define a category where these update rules may be composed, and show that gradient descent---with respect to a fixed step size and an error function satisfying a certain property---defines a monoidal functor from a category of parametrised functions to this category of update rules. This provides a structural perspective on backpropagation, as well as a broad generalisation of neural networks.
DA - 2019/05/01/
PY - 2019
DP - arXiv.org
ST - Backprop as Functor
UR - http://arxiv.org/abs/1711.10455
Y2 - 2019/11/23/14:42:07
KW - Categorical ML
KW - Machine learning
KW - Purely theoretical
ER -
TY - JOUR
TI - Bayesian machine learning via category theory
AU - Culbertson, Jared
AU - Sturtz, Kirk
T2 - arXiv:1312.1445 [math]
AB - From the Bayesian perspective, the category of conditional probabilities (a variant of the Kleisli category of the Giry monad, whose objects are measurable spaces and arrows are Markov kernels) gives a nice framework for conceptualization and analysis of many aspects of machine learning. Using categorical methods, we construct models for parametric and nonparametric Bayesian reasoning on function spaces, thus providing a basis for the supervised learning problem. In particular, stochastic processes are arrows to these function spaces which serve as prior probabilities. The resulting inference maps can often be analytically constructed in this symmetric monoidal weakly closed category. We also show how to view general stochastic processes using functor categories and demonstrate the Kalman filter as an archetype for the hidden Markov model.
DA - 2013/12/05/
PY - 2013
DP - arXiv.org
UR - http://arxiv.org/abs/1312.1445
Y2 - 2019/11/22/17:32:35
KW - Bayesianism
KW - Categorical ML
KW - Categorical probability theory
KW - Purely theoretical
ER -
TY - JOUR
TI - Categorical Aspects of Parameter Learning
AU - Jacobs, Bart
T2 - arXiv:1810.05814 [cs]
AB - Parameter learning is the technique for obtaining the probabilistic parameters in conditional probability tables in Bayesian networks from tables with (observed) data --- where it is assumed that the underlying graphical structure is known. There are basically two ways of doing so, referred to as maximal likelihood estimation (MLE) and as Bayesian learning. This paper provides a categorical analysis of these two techniques and describes them in terms of basic properties of the multiset monad M, the distribution monad D and the Giry monad G. In essence, learning is about the reltionships between multisets (used for counting) on the one hand and probability distributions on the other. These relationsips will be described as suitable natural transformations.
DA - 2018/10/13/
PY - 2018
DP - arXiv.org
UR - http://arxiv.org/abs/1810.05814
Y2 - 2019/11/21/20:38:28
KW - Bayesianism
KW - Categorical ML
KW - Categorical probability theory
KW - Machine learning
ER -
TY - JOUR
TI - Characterizing the invariances of learning algorithms using category theory
AU - Harris, Kenneth D.
T2 - arXiv:1905.02072 [cs, math, stat]
AB - Many learning algorithms have invariances: when their training data is transformed in certain ways, the function they learn transforms in a predictable manner. Here we formalize this notion using concepts from the mathematical field of category theory. The invariances that a supervised learning algorithm possesses are formalized by categories of predictor and target spaces, whose morphisms represent the algorithm's invariances, and an index category whose morphisms represent permutations of the training examples. An invariant learning algorithm is a natural transformation between two functors from the product of these categories to the category of sets, representing training datasets and learned functions respectively. We illustrate the framework by characterizing and contrasting the invariances of linear regression and ridge regression.
DA - 2019/05/06/
PY - 2019
DP - arXiv.org
UR - http://arxiv.org/abs/1905.02072
Y2 - 2019/10/10/11:53:28
ER -
TY - JOUR
TI - Deep Probabilistic Programming
AU - Tran, Dustin
AU - Hoffman, Matthew D.
AU - Saurous, Rif A.
AU - Brevdo, Eugene
AU - Murphy, Kevin
AU - Blei, David M.
T2 - arXiv:1701.03757 [cs, stat]
AB - We propose Edward, a Turing-complete probabilistic programming language. Edward defines two compositional representations---random variables and inference. By treating inference as a first class citizen, on a par with modeling, we show that probabilistic programming can be as flexible and computationally efficient as traditional deep learning. For flexibility, Edward makes it easy to fit the same model using a variety of composable inference methods, ranging from point estimation to variational inference to MCMC. In addition, Edward can reuse the modeling representation as part of inference, facilitating the design of rich variational models and generative adversarial networks. For efficiency, Edward is integrated into TensorFlow, providing significant speedups over existing probabilistic systems. For example, we show on a benchmark logistic regression task that Edward is at least 35x faster than Stan and 6x faster than PyMC3. Further, Edward incurs no runtime overhead: it is as fast as handwritten TensorFlow.
DA - 2017/03/07/
PY - 2017
DP - arXiv.org
UR - http://arxiv.org/abs/1701.03757
Y2 - 2019/11/27/23:15:14
KW - Bayesian inference
KW - Implementation
KW - Machine learning
KW - Probabilistic programming
ER -
TY - JOUR
TI - Derivatives of Turing machines in Linear Logic
AU - Murfet, Daniel
AU - Clift, James
T2 - arXiv:1805.11813 [math]
AB - We calculate denotations under the Sweedler semantics of the Ehrhard-Regnier derivatives of various encodings of Turing machines into linear logic. We show that these derivatives calculate the rate of change of probabilities naturally arising in the Sweedler semantics of linear logic proofs. The resulting theory is applied to the problem of synthesising Turing machines by gradient descent.
DA - 2019/01/28/
PY - 2019
DP - arXiv.org
UR - http://arxiv.org/abs/1805.11813
Y2 - 2019/11/21/20:33:27
KW - Abstract machines
KW - Categorical ML
KW - Differentiation
KW - Linear logic
KW - Machine learning
ER -
TY - CONF
TI - Differentiable Causal Computations via Delayed Trace
AU - Sprunger, David
AU - Katsumata, Shin-ya
T2 - 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)
AB - We investigate causal computations taking sequences of inputs to sequences of outputs where the nth output depends on the ﬁrst n inputs only. We model these in category theory via a construction taking a Cartesian category C to another category St(C) with a novel trace-like operation called “delayed trace”, which misses yanking and dinaturality axioms of the usual trace. The delayed trace operation provides a feedback mechanism in St(C) with an implicit guardedness guarantee.
C1 - Vancouver, BC, Canada
C3 - 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)
DA - 2019/06//
PY - 2019
DO - 10/ggdf98
DP - Crossref
SP - 1
EP - 12
LA - en
PB - IEEE
SN - 978-1-72813-608-0
UR - https://ieeexplore.ieee.org/document/8785670/
Y2 - 2019/11/23/16:57:38
KW - Categorical ML
KW - Differentiation
ER -
TY - COMP
TI - dmurfet/2simplicialtransformer
AU - Murfet, Daniel
AB - Code for the 2-simplicial Transformer paper. Contribute to dmurfet/2simplicialtransformer development by creating an account on GitHub.
DA - 2019/10/14/T08:10:47Z
PY - 2019
DP - GitHub
LA - Python
UR - https://github.com/dmurfet/2simplicialtransformer
Y2 - 2019/11/22/16:50:05
KW - Abstract machines
KW - Algebra
KW - Implementation
KW - Machine learning
KW - Semantics
ER -
TY - COMP
TI - dmurfet/deeplinearlogic
AU - Murfet, Daniel
AB - Deep learning and linear logic. Contribute to dmurfet/deeplinearlogic development by creating an account on GitHub.
DA - 2018/07/14/T01:08:44Z
PY - 2018
DP - GitHub
LA - Jupyter Notebook
UR - https://github.com/dmurfet/deeplinearlogic
Y2 - 2019/11/22/16:44:43
KW - Categorical ML
KW - Implementation
KW - Linear logic
KW - Machine learning
KW - Semantics
ER -
TY - COMP
TI - dmurfet/polysemantics
AU - Murfet, Daniel
AB - Polynomial semantics of linear logic. Contribute to dmurfet/polysemantics development by creating an account on GitHub.
DA - 2018/04/29/T20:41:43Z
PY - 2018
DP - GitHub
LA - Python
UR - https://github.com/dmurfet/polysemantics
Y2 - 2019/11/22/16:45:35
KW - Categorical ML
KW - Implementation
KW - Linear logic
KW - Machine learning
KW - Semantics
ER -
TY - JOUR
TI - Explaining and Harnessing Adversarial Examples
AU - Goodfellow, Ian J.
AU - Shlens, Jonathon
AU - Szegedy, Christian
T2 - arXiv:1412.6572 [cs, stat]
AB - Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset.
DA - 2015/03/20/
PY - 2015
DP - arXiv.org
UR - http://arxiv.org/abs/1412.6572
Y2 - 2019/11/23/14:10:23
KW - Adversarial attacks
KW - Classical ML
KW - Machine learning
ER -
TY - JOUR
TI - Generative Adversarial Networks
AU - Goodfellow, Ian J.
AU - Pouget-Abadie, Jean
AU - Mirza, Mehdi
AU - Xu, Bing
AU - Warde-Farley, David
AU - Ozair, Sherjil
AU - Courville, Aaron
AU - Bengio, Yoshua
T2 - arXiv:1406.2661 [cs, stat]
AB - We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
DA - 2014/06/10/
PY - 2014
DP - arXiv.org
UR - http://arxiv.org/abs/1406.2661
Y2 - 2019/11/28/11:44:28
KW - Adversarial attacks
KW - Classical ML
KW - Implementation
KW - Machine learning
ER -
TY - CHAP
TI - Graphical Models: Overview
AU - Wermuth, N.
AU - Cox, D. R.
T2 - International Encyclopedia of the Social & Behavioral Sciences
A2 - Smelser, Neil J.
A2 - Baltes, Paul B.
AB - Graphical Markov models provide a method of representing possibly complicated multivariate dependencies in such a way that the general qualitative features can be understood, that statistical independencies are highlighted, and that some properties can be derived directly. Variables are represented by the nodes of a graph. Pairs of nodes may be joined by an edge. Edges are directed if one variable is a response to the other variable considered as explanatory, but are undirected if the variables are on an equal footing. Absence of an edge typically implies statistical independence, conditional, or marginal depending on the kind of graph. The need for a number of types of graph arises because it is helpful to represent a number of different kinds of dependence structures. Of special importance are chain graphs in which variables are arranged in a sequence or chain of blocks, the variables in any one block being on an equal footing, some being possibly joint responses to variables in the past and some being jointly explanatory to variables in the future of the block considered. Some main properties of such systems are outlined, and recent research results are sketched. Suggestions for further reading are given. As an illustrative example, some analysis of data on the treatment of chronic pain is presented.
CY - Oxford
DA - 2001/01/01/
PY - 2001
DP - ScienceDirect
SP - 6379
EP - 6386
LA - en
PB - Pergamon
SN - 978-0-08-043076-8
ST - Graphical Models
UR - http://www.sciencedirect.com/science/article/pii/B008043076700440X
Y2 - 2019/11/22/19:12:23
KW - Bayesianism
KW - Classical ML
KW - Machine learning
ER -
TY - JOUR
TI - Is There Something Out There? Inferring Space from Sensorimotor Dependencies
AU - Philipona, David
AU - O’Regan, J.
AU - Nadal, Jean-Pierre
T2 - Neural computation
AB - This letter suggests that in biological organisms, the perceived structure of reality, in particular the notions of body, environment, space, object, and attribute, could be a consequence of an effort on the part of brains to account for the dependency between their inputs and their outputs in terms of a small number of parameters. To validate this idea, a procedure is demonstrated whereby the brain of a (simulated) organism with arbitrary input and output connectivity can deduce the dimensionality of the rigid group of the space underlying its input-output relationship, that is, the dimension of what the organism will call physical space.
DA - 2003/10/01/
PY - 2003
DO - 10/frg7gs
DP - ResearchGate
VL - 15
SP - 2029
EP - 49
J2 - Neural computation
ST - Is There Something Out There?
KW - Algebra
KW - Neuroscience
ER -
TY - SLIDE
TI - Linear logic and deep learning
A2 - Murfet, Daniel
A2 - Hu, Huiyi
LA - en
KW - Categorical ML
KW - Linear logic
KW - Machine learning
KW - Semantics
ER -
TY - JOUR
TI - Logic and the $2$-Simplicial Transformer
AU - Murfet, Daniel
AU - Clift, James
AU - Doryn, Dmitry
AU - Wallbridge, James
T2 - arXiv:1909.00668 [cs, stat]
AB - We introduce the $2$-simplicial Transformer, an extension of the Transformer which includes a form of higher-dimensional attention generalising the dot-product attention, and uses this attention to update entity representations with tensor products of value vectors. We show that this architecture is a useful inductive bias for logical reasoning in the context of deep reinforcement learning.
DA - 2019/09/02/
PY - 2019
DP - arXiv.org
UR - http://arxiv.org/abs/1909.00668
Y2 - 2019/11/21/20:31:14
KW - Abstract machines
KW - Algebra
KW - Machine learning
KW - Semantics
ER -
TY - JOUR
TI - Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge
AU - Serafini, Luciano
AU - Garcez, Artur d'Avila
T2 - arXiv:1606.04422 [cs]
AB - We propose Logic Tensor Networks: a uniform framework for integrating automatic learning and reasoning. A logic formalism called Real Logic is defined on a first-order language whereby formulas have truth-value in the interval [0,1] and semantics defined concretely on the domain of real numbers. Logical constants are interpreted as feature vectors of real numbers. Real Logic promotes a well-founded integration of deductive reasoning on a knowledge-base and efficient data-driven relational machine learning. We show how Real Logic can be implemented in deep Tensor Neural Networks with the use of Google's tensorflow primitives. The paper concludes with experiments applying Logic Tensor Networks on a simple but representative example of knowledge completion.
DA - 2016/07/07/
PY - 2016
DP - arXiv.org
ST - Logic Tensor Networks
UR - http://arxiv.org/abs/1606.04422
Y2 - 2019/11/24/16:33:44
KW - Abstract machines
KW - Machine learning
KW - Symbolic logic
ER -
TY - CONF
TI - Machine Learning Biochemical Networks from Temporal Logic Properties
AU - Fages, François
AU - Calzone, Laurence
AU - Chabrier-Rivier, Nathalie
AU - Soliman, Sylvain
A2 - Priami, Corrado
A2 - Plotkin, Gordon
T3 - Lecture Notes in Computer Science
AB - One central issue in systems biology is the definition of formal languages for describing complex biochemical systems and their behavior at different levels. The biochemical abstract machine BIOCHAM is based on two formal languages, one rule-based language used for modeling biochemical networks, at three abstraction levels corresponding to three semantics: boolean, concentration and population; and one temporal logic language used for formalizing the biological properties of the system. In this paper, we show how the temporal logic language can be turned into a specification language. We describe two algorithms for inferring reaction rules and kinetic parameter values from a temporal specification formalizing the biological data. Then, with an example of the cell cycle control, we illustrate how these machine learning techniques may be useful to the modeler.
C1 - Berlin, Heidelberg
C3 - Transactions on Computational Systems Biology VI
DA - 2006///
PY - 2006
DO - 10/dd8
DP - Springer Link
SP - 68
EP - 94
LA - en
PB - Springer
SN - 978-3-540-46236-1
KW - Abstract machines
KW - Biology
KW - Classical ML
KW - Machine learning
KW - Symbolic logic
KW - Systems biology
ER -
TY - SLIDE
TI - Mathematics of AlphaGo
A2 - Murfet, Daniel
KW - Classical ML
KW - Machine learning
ER -
TY - BOOK
TI - Model-Based Machine Learning
AU - Winn, John Michael
AB - This book is unusual for a machine learning text book in that the authors do not review dozens of different algorithms. Instead they introduce all of the key ideas through a series of case studies involving real-world applications. Case studies play a central role because it is only in the context of applications that it makes sense to discuss modelling assumptions. Each chapter therefore introduces one case study which is drawn from a real-world application that has been solved using a model-based approach.
DA - 2019/06//
PY - 2019
DP - Google Books
SP - 400
LA - en
PB - Taylor & Francis Incorporated
SN - 978-1-4987-5681-5
KW - Bayesian inference
KW - Classical ML
KW - Implementation
ER -
TY - JOUR
TI - Neural Logic Machines
AU - Dong, Honghua
AU - Mao, Jiayuan
AU - Lin, Tian
AU - Wang, Chong
AU - Li, Lihong
AU - Zhou, Denny
T2 - arXiv:1904.11694 [cs, stat]
AB - We propose the Neural Logic Machine (NLM), a neural-symbolic architecture for both inductive learning and logic reasoning. NLMs exploit the power of both neural networks---as function approximators, and logic programming---as a symbolic processor for objects with properties, relations, logic connectives, and quantifiers. After being trained on small-scale tasks (such as sorting short arrays), NLMs can recover lifted rules, and generalize to large-scale tasks (such as sorting longer arrays). In our experiments, NLMs achieve perfect generalization in a number of tasks, from relational reasoning tasks on the family tree and general graphs, to decision making tasks including sorting arrays, finding shortest paths, and playing the blocks world. Most of these tasks are hard to accomplish for neural networks or inductive logic programming alone.
DA - 2019/04/26/
PY - 2019
DP - arXiv.org
UR - http://arxiv.org/abs/1904.11694
Y2 - 2019/11/24/16:33:13
KW - Abstract machines
KW - Machine learning
KW - Symbolic logic
ER -
TY - JOUR
TI - Neural Nets via Forward State Transformation and Backward Loss Transformation
AU - Jacobs, Bart
AU - Sprunger, David
T2 - arXiv:1803.09356 [cs]
AB - This article studies (multilayer perceptron) neural networks with an emphasis on the transformations involved --- both forward and backward --- in order to develop a semantical/logical perspective that is in line with standard program semantics. The common two-pass neural network training algorithms make this viewpoint particularly fitting. In the forward direction, neural networks act as state transformers. In the reverse direction, however, neural networks change losses of outputs to losses of inputs, thereby acting like a (real-valued) predicate transformer. In this way, backpropagation is functorial by construction, as shown earlier in recent other work. We illustrate this perspective by training a simple instance of a neural network.
DA - 2018/03/25/
PY - 2018
DP - arXiv.org
UR - http://arxiv.org/abs/1803.09356
Y2 - 2019/11/21/20:40:18
KW - Categorical ML
KW - Effectus theory
KW - Machine learning
ER -
TY - CONF
TI - Neural Networks, Knowledge and Cognition: A Mathematical Semantic Model Based upon Category Theory
AU - Healy, Michael J.
AU - Caudell, Thomas P.
AB - Category theory can be applied to mathematically model the semantics of cognitive neural systems. We discuss semantics as a hierarchy of concepts, or symbolic descriptions of items sensed and represented in the connection weights distributed throughout a neural network. The hierarchy expresses subconcept relationships, and in a neural network it becomes represented incrementally through a Hebbian-like learning process. The categorical semantic model described here explains the learning process as the derivation of colimits and limits in a concept category. It explains the representation of the concept hierarchy in a neural network at each stage of learning as a system of functors and natural transformations, expressing knowledge coherence across the regions of a multi-regional network equipped with multiple sensors. The model yields design principles that constrain neural network designs capable of the most important aspects of cognitive behavior.
DA - 2004///
PY - 2004
DP - Semantic Scholar
ST - Neural Networks, Knowledge and Cognition
ER -
TY - JOUR
TI - Neural Turing Machines
AU - Graves, Alex
AU - Wayne, Greg
AU - Danihelka, Ivo
T2 - arXiv:1410.5401 [cs]
AB - We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.
DA - 2014/12/10/
PY - 2014
DP - arXiv.org
UR - http://arxiv.org/abs/1410.5401
Y2 - 2019/11/21/21:09:35
KW - Abstract machines
KW - Classical ML
KW - Machine learning
ER -
TY - JOUR
TI - On the Computational Power of Neural Nets
AU - Siegelmann, H. T.
AU - Sontag, E. D.
T2 - Journal of Computer and System Sciences
AB - This paper deals with finite size networks which consist of interconnections of synchronously evolving processors. Each processor updates its state by applying a "sigmoidal" function to a linear combination of the previous states of all units. We prove that one may simulate all Turing machines by such nets. In particular, one can simulate any multi-stack Turing machine in real time, and there is a net made up of 886 processors which computes a universal partial-recursive function. Products (high order nets) are not required, contrary to what had been stated in the literature. Non-deterministic Turing machines can be simulated by non-deterministic rational nets, also in real time. The simulation result has many consequences regarding the decidability, or more generally the complexity, of questions about recursive nets.
DA - 1995/02/01/
PY - 1995
DO - 10/dvwtc3
DP - ScienceDirect
VL - 50
IS - 1
SP - 132
EP - 150
J2 - Journal of Computer and System Sciences
LA - en
SN - 0022-0000
UR - http://www.sciencedirect.com/science/article/pii/S0022000085710136
Y2 - 2019/11/28/17:50:06
KW - Classical ML
KW - Machine learning
ER -
TY - JOUR
TI - Probabilistic machine learning and artificial intelligence
AU - Ghahramani, Zoubin
T2 - Nature
DA - 2015/05//
PY - 2015
DO - 10/gdxwhq
DP - Crossref
VL - 521
IS - 7553
SP - 452
EP - 459
LA - en
SN - 0028-0836, 1476-4687
UR - http://www.nature.com/articles/nature14541
Y2 - 2019/11/28/12:16:49
KW - Bayesian inference
KW - Classical ML
KW - Machine learning
KW - Probabilistic programming
ER -
TY - JOUR
TI - Relational inductive biases, deep learning, and graph networks
AU - Battaglia, Peter W.
AU - Hamrick, Jessica B.
AU - Bapst, Victor
AU - Sanchez-Gonzalez, Alvaro
AU - Zambaldi, Vinicius
AU - Malinowski, Mateusz
AU - Tacchetti, Andrea
AU - Raposo, David
AU - Santoro, Adam
AU - Faulkner, Ryan
AU - Gulcehre, Caglar
AU - Song, Francis
AU - Ballard, Andrew
AU - Gilmer, Justin
AU - Dahl, George
AU - Vaswani, Ashish
AU - Allen, Kelsey
AU - Nash, Charles
AU - Langston, Victoria
AU - Dyer, Chris
AU - Heess, Nicolas
AU - Wierstra, Daan
AU - Kohli, Pushmeet
AU - Botvinick, Matt
AU - Vinyals, Oriol
AU - Li, Yujia
AU - Pascanu, Razvan
T2 - arXiv:1806.01261 [cs, stat]
AB - Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one's experiences--a hallmark of human intelligence from infancy--remains a formidable challenge for modern AI. The following is part position paper, part review, and part unification. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between "hand-engineering" and "end-to-end" learning, and instead advocate for an approach which benefits from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias--the graph network--which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning. As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice.
DA - 2018/06/04/
PY - 2018
DP - arXiv.org
UR - http://arxiv.org/abs/1806.01261
Y2 - 2019/10/10/19:16:22
ER -
TY - JOUR
TI - Robust Physical-World Attacks on Deep Learning Models
AU - Eykholt, Kevin
AU - Evtimov, Ivan
AU - Fernandes, Earlence
AU - Li, Bo
AU - Rahmati, Amir
AU - Xiao, Chaowei
AU - Prakash, Atul
AU - Kohno, Tadayoshi
AU - Song, Dawn
T2 - arXiv:1707.08945 [cs]
AB - Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations.Therefore, understanding adversarial examples in the physical world is an important step towards developing resilient learning algorithms. We propose a general attack algorithm,Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions. Using the real-world case of road sign classification, we show that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints. Due to the current lack of a standardized testing method, we propose a two-stage evaluation methodology for robust physical adversarial examples consisting of lab and field tests. Using this methodology, we evaluate the efficacy of physical adversarial manipulations on real objects. Witha perturbation in the form of only black and white stickers,we attack a real stop sign, causing targeted misclassification in 100% of the images obtained in lab settings, and in 84.8%of the captured video frames obtained on a moving vehicle(field test) for the target classifier.
DA - 2018/04/10/
PY - 2018
DP - arXiv.org
UR - http://arxiv.org/abs/1707.08945
Y2 - 2019/11/23/14:08:00
KW - Adversarial attacks
KW - Classical ML
KW - Machine learning
ER -
TY - BOOK
TI - The Combinatory Programme
AU - Engeler, Erwin
T2 - Progress in Theoretical Computer Science
AB - Combinatory logic started as a programme in the foundation of mathematics and in an historical context at a time when such endeavours attracted the most gifted among the mathematicians. This small volume arose under quite differ ent circumstances, namely within the context of reworking the mathematical foundations of computer science. I have been very lucky in finding gifted students who agreed to work with me and chose, for their Ph. D. theses, subjects that arose from my own attempts 1 to create a coherent mathematical view of these foundations. The result of this collaborative work is presented here in the hope that it does justice to the individual contributor and that the reader has a chance of judging the work as a whole. E. Engeler ETH Zurich, April 1994 lCollected in Chapter III, An Algebraization of Algorithmics, in Algorithmic Properties of Structures, Selected Papers of Erwin Engeler, World Scientific PubJ. Co. , Singapore, 1993, pp. 183-257. I Historical and Philosophical Background Erwin Engeler In the fall of 1928 a young American turned up at the Mathematical Institute of Gottingen, a mecca of mathematicians at the time; he was a young man with a dream and his name was H. B. Curry. He felt that he had the tools in hand with which to solve the problem of foundations of mathematics mice and for all. His was an approach that came to be called "formalist" and embodied that later became known as Combinatory Logic.
DA - 1995///
PY - 1995
DP - www.springer.com
LA - en
PB - Birkhäuser Basel
SN - 978-0-8176-3801-6
UR - https://www.springer.com/gb/book/9780817638016
Y2 - 2019/11/26/14:23:14
KW - Algebra
KW - Programming language theory
KW - Purely theoretical
ER -
TY - CHAP
TI - Tomaso A. Poggio autobiography
AU - Poggio, Tomaso
DA - 2013///
PY - 2013
SP - 54
UR - http://poggio-lab.mit.edu/sites/default/files/cv/tomasopoggio.pdf
KW - Classical ML
KW - Compendium
KW - Machine learning
ER -
TY - JOUR
TI - Understanding deep learning requires rethinking generalization
AU - Zhang, Chiyuan
AU - Bengio, Samy
AU - Hardt, Moritz
AU - Recht, Benjamin
AU - Vinyals, Oriol
T2 - arXiv:1611.03530 [cs]
AB - Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models.
DA - 2017/02/26/
PY - 2017
DP - arXiv.org
UR - http://arxiv.org/abs/1611.03530
Y2 - 2019/11/22/20:11:42
KW - Classical ML
KW - Machine learning
ER -
TY - JOUR
TI - What is a statistical model?
AU - McCullagh, Peter
T2 - The Annals of Statistics
DA - 2002/10//
PY - 2002
DO - 10/bkts3m
DP - Crossref
VL - 30
IS - 5
SP - 1225
EP - 1310
LA - en
UR - http://projecteuclid.org/euclid.aos/1035844977
Y2 - 2019/11/22/17:39:10
KW - Bayesianism
KW - Categorical ML
KW - Categorical probability theory
KW - Compendium
KW - Purely theoretical
KW - Statistical learning theory
ER -
TY - JOUR
TI - Why does Deep Learning work? - A perspective from Group Theory
AU - Paul, Arnab
AU - Venkatasubramanian, Suresh
T2 - arXiv:1412.6621 [cs, stat]
AB - Why does Deep Learning work? What representations does it capture? How do higher-order representations emerge? We study these questions from the perspective of group theory, thereby opening a new approach towards a theory of Deep learning. One factor behind the recent resurgence of the subject is a key algorithmic step called pre-training: first search for a good generative model for the input samples, and repeat the process one layer at a time. We show deeper implications of this simple principle, by establishing a connection with the interplay of orbits and stabilizers of group actions. Although the neural networks themselves may not form groups, we show the existence of {\em shadow} groups whose elements serve as close approximations. Over the shadow groups, the pre-training step, originally introduced as a mechanism to better initialize a network, becomes equivalent to a search for features with minimal orbits. Intuitively, these features are in a way the {\em simplest}. Which explains why a deep learning network learns simple features first. Next, we show how the same principle, when repeated in the deeper layers, can capture higher order representations, and why representation complexity increases as the layers get deeper.
DA - 2015/02/28/
PY - 2015
DP - arXiv.org
ST - Why does Deep Learning work?
UR - http://arxiv.org/abs/1412.6621
Y2 - 2019/11/22/17:38:08
KW - Classical ML
KW - Machine learning
ER -