TY - JOUR
TI - Bayesian machine learning via category theory
AU - Culbertson, Jared
AU - Sturtz, Kirk
T2 - arXiv:1312.1445 [math]
AB - From the Bayesian perspective, the category of conditional probabilities (a variant of the Kleisli category of the Giry monad, whose objects are measurable spaces and arrows are Markov kernels) gives a nice framework for conceptualization and analysis of many aspects of machine learning. Using categorical methods, we construct models for parametric and nonparametric Bayesian reasoning on function spaces, thus providing a basis for the supervised learning problem. In particular, stochastic processes are arrows to these function spaces which serve as prior probabilities. The resulting inference maps can often be analytically constructed in this symmetric monoidal weakly closed category. We also show how to view general stochastic processes using functor categories and demonstrate the Kalman filter as an archetype for the hidden Markov model.
DA - 2013/12/05/
PY - 2013
DP - arXiv.org
UR - http://arxiv.org/abs/1312.1445
Y2 - 2019/11/22/17:32:35
KW - Bayesianism
KW - Categorical ML
KW - Categorical probability theory
KW - Purely theoretical
ER -
TY - JOUR
TI - Probabilistic machine learning and artificial intelligence
AU - Ghahramani, Zoubin
T2 - Nature
DA - 2015/05//
PY - 2015
DO - 10/gdxwhq
DP - Crossref
VL - 521
IS - 7553
SP - 452
EP - 459
LA - en
SN - 0028-0836, 1476-4687
UR - http://www.nature.com/articles/nature14541
Y2 - 2019/11/28/12:16:49
KW - Bayesian inference
KW - Classical ML
KW - Machine learning
KW - Probabilistic programming
ER -
TY - JOUR
TI - A Tutorial on Learning With Bayesian Networks
AU - Heckerman, David
AB - A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can …
DA - 1995/03/01/
PY - 1995
DP - www.microsoft.com
LA - en-US
UR - https://www.microsoft.com/en-us/research/publication/a-tutorial-on-learning-with-bayesian-networks/
Y2 - 2019/11/22/19:09:15
KW - Bayesianism
KW - Classical ML
KW - Machine learning
ER -
TY - JOUR
TI - Categorical Aspects of Parameter Learning
AU - Jacobs, Bart
T2 - arXiv:1810.05814 [cs]
AB - Parameter learning is the technique for obtaining the probabilistic parameters in conditional probability tables in Bayesian networks from tables with (observed) data --- where it is assumed that the underlying graphical structure is known. There are basically two ways of doing so, referred to as maximal likelihood estimation (MLE) and as Bayesian learning. This paper provides a categorical analysis of these two techniques and describes them in terms of basic properties of the multiset monad M, the distribution monad D and the Giry monad G. In essence, learning is about the reltionships between multisets (used for counting) on the one hand and probability distributions on the other. These relationsips will be described as suitable natural transformations.
DA - 2018/10/13/
PY - 2018
DP - arXiv.org
UR - http://arxiv.org/abs/1810.05814
Y2 - 2019/11/21/20:38:28
KW - Bayesianism
KW - Categorical ML
KW - Categorical probability theory
KW - Machine learning
ER -
TY - JOUR
TI - A Predicate/State Transformer Semantics for Bayesian Learning
AU - Jacobs, Bart
AU - Zanasi, Fabio
T2 - Electronic Notes in Theoretical Computer Science
T3 - The Thirty-second Conference on the Mathematical Foundations of Programming Semantics (MFPS XXXII)
AB - This paper establishes a link between Bayesian inference (learning) and predicate and state transformer operations from programming semantics and logic. Specifically, a very general definition of backward inference is given via first applying a predicate transformer and then conditioning. Analogously, forward inference involves first conditioning and then applying a state transformer. These definitions are illustrated in many examples in discrete and continuous probability theory and also in quantum theory.
DA - 2016/10/05/
PY - 2016
DO - 10/ggdgbb
DP - ScienceDirect
VL - 325
SP - 185
EP - 200
J2 - Electronic Notes in Theoretical Computer Science
LA - en
SN - 1571-0661
UR - http://www.sciencedirect.com/science/article/pii/S1571066116300883
Y2 - 2019/11/24/12:04:12
KW - Bayesianism
KW - Categorical ML
KW - Categorical probability theory
KW - Effectus theory
KW - Programming language theory
KW - Semantics
ER -
TY - JOUR
TI - What is a statistical model?
AU - McCullagh, Peter
T2 - The Annals of Statistics
DA - 2002/10//
PY - 2002
DO - 10/bkts3m
DP - Crossref
VL - 30
IS - 5
SP - 1225
EP - 1310
LA - en
UR - http://projecteuclid.org/euclid.aos/1035844977
Y2 - 2019/11/22/17:39:10
KW - Bayesianism
KW - Categorical ML
KW - Categorical probability theory
KW - Compendium
KW - Purely theoretical
KW - Statistical learning theory
ER -
TY - JOUR
TI - Deep Probabilistic Programming
AU - Tran, Dustin
AU - Hoffman, Matthew D.
AU - Saurous, Rif A.
AU - Brevdo, Eugene
AU - Murphy, Kevin
AU - Blei, David M.
T2 - arXiv:1701.03757 [cs, stat]
AB - We propose Edward, a Turing-complete probabilistic programming language. Edward defines two compositional representations---random variables and inference. By treating inference as a first class citizen, on a par with modeling, we show that probabilistic programming can be as flexible and computationally efficient as traditional deep learning. For flexibility, Edward makes it easy to fit the same model using a variety of composable inference methods, ranging from point estimation to variational inference to MCMC. In addition, Edward can reuse the modeling representation as part of inference, facilitating the design of rich variational models and generative adversarial networks. For efficiency, Edward is integrated into TensorFlow, providing significant speedups over existing probabilistic systems. For example, we show on a benchmark logistic regression task that Edward is at least 35x faster than Stan and 6x faster than PyMC3. Further, Edward incurs no runtime overhead: it is as fast as handwritten TensorFlow.
DA - 2017/03/07/
PY - 2017
DP - arXiv.org
UR - http://arxiv.org/abs/1701.03757
Y2 - 2019/11/27/23:15:14
KW - Bayesian inference
KW - Implementation
KW - Machine learning
KW - Probabilistic programming
ER -
TY - ELEC
TI - Algebraic Geometry and Statistical Learning Theory
AU - Watanabe, Sumio
T2 - Cambridge Core
AB - Cambridge Core - Pattern Recognition and Machine Learning - Algebraic Geometry and Statistical Learning Theory - by Sumio Watanabe
DA - 2009/08//
PY - 2009
LA - en
UR - /core/books/algebraic-geometry-and-statistical-learning-theory/9C8FD1BDC817E2FC79117C7F41544A3A
Y2 - 2019/11/22/18:05:57
KW - Algebra
KW - Bayesianism
KW - Purely theoretical
KW - Statistical learning theory
ER -
TY - CHAP
TI - Graphical Models: Overview
AU - Wermuth, N.
AU - Cox, D. R.
T2 - International Encyclopedia of the Social & Behavioral Sciences
A2 - Smelser, Neil J.
A2 - Baltes, Paul B.
AB - Graphical Markov models provide a method of representing possibly complicated multivariate dependencies in such a way that the general qualitative features can be understood, that statistical independencies are highlighted, and that some properties can be derived directly. Variables are represented by the nodes of a graph. Pairs of nodes may be joined by an edge. Edges are directed if one variable is a response to the other variable considered as explanatory, but are undirected if the variables are on an equal footing. Absence of an edge typically implies statistical independence, conditional, or marginal depending on the kind of graph. The need for a number of types of graph arises because it is helpful to represent a number of different kinds of dependence structures. Of special importance are chain graphs in which variables are arranged in a sequence or chain of blocks, the variables in any one block being on an equal footing, some being possibly joint responses to variables in the past and some being jointly explanatory to variables in the future of the block considered. Some main properties of such systems are outlined, and recent research results are sketched. Suggestions for further reading are given. As an illustrative example, some analysis of data on the treatment of chronic pain is presented.
CY - Oxford
DA - 2001/01/01/
PY - 2001
DP - ScienceDirect
SP - 6379
EP - 6386
LA - en
PB - Pergamon
SN - 978-0-08-043076-8
ST - Graphical Models
UR - http://www.sciencedirect.com/science/article/pii/B008043076700440X
Y2 - 2019/11/22/19:12:23
KW - Bayesianism
KW - Classical ML
KW - Machine learning
ER -
TY - BOOK
TI - Model-Based Machine Learning
AU - Winn, John Michael
AB - This book is unusual for a machine learning text book in that the authors do not review dozens of different algorithms. Instead they introduce all of the key ideas through a series of case studies involving real-world applications. Case studies play a central role because it is only in the context of applications that it makes sense to discuss modelling assumptions. Each chapter therefore introduces one case study which is drawn from a real-world application that has been solved using a model-based approach.
DA - 2019/06//
PY - 2019
DP - Google Books
SP - 400
LA - en
PB - Taylor & Francis Incorporated
SN - 978-1-4987-5681-5
KW - Bayesian inference
KW - Classical ML
KW - Implementation
ER -