TY - JOUR
TI - Bayesian machine learning via category theory
AU - Culbertson, Jared
AU - Sturtz, Kirk
T2 - arXiv:1312.1445 [math]
AB - From the Bayesian perspective, the category of conditional probabilities (a variant of the Kleisli category of the Giry monad, whose objects are measurable spaces and arrows are Markov kernels) gives a nice framework for conceptualization and analysis of many aspects of machine learning. Using categorical methods, we construct models for parametric and nonparametric Bayesian reasoning on function spaces, thus providing a basis for the supervised learning problem. In particular, stochastic processes are arrows to these function spaces which serve as prior probabilities. The resulting inference maps can often be analytically constructed in this symmetric monoidal weakly closed category. We also show how to view general stochastic processes using functor categories and demonstrate the Kalman filter as an archetype for the hidden Markov model.
DA - 2013/12/05/
PY - 2013
DP - arXiv.org
UR - http://arxiv.org/abs/1312.1445
Y2 - 2019/11/22/17:32:35
KW - Bayesianism
KW - Categorical ML
KW - Categorical probability theory
KW - Purely theoretical
ER -
TY - JOUR
TI - A Tutorial on Learning With Bayesian Networks
AU - Heckerman, David
AB - A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can …
DA - 1995/03/01/
PY - 1995
DP - www.microsoft.com
LA - en-US
UR - https://www.microsoft.com/en-us/research/publication/a-tutorial-on-learning-with-bayesian-networks/
Y2 - 2019/11/22/19:09:15
KW - Bayesianism
KW - Classical ML
KW - Machine learning
ER -
TY - JOUR
TI - Categorical Aspects of Parameter Learning
AU - Jacobs, Bart
T2 - arXiv:1810.05814 [cs]
AB - Parameter learning is the technique for obtaining the probabilistic parameters in conditional probability tables in Bayesian networks from tables with (observed) data --- where it is assumed that the underlying graphical structure is known. There are basically two ways of doing so, referred to as maximal likelihood estimation (MLE) and as Bayesian learning. This paper provides a categorical analysis of these two techniques and describes them in terms of basic properties of the multiset monad M, the distribution monad D and the Giry monad G. In essence, learning is about the reltionships between multisets (used for counting) on the one hand and probability distributions on the other. These relationsips will be described as suitable natural transformations.
DA - 2018/10/13/
PY - 2018
DP - arXiv.org
UR - http://arxiv.org/abs/1810.05814
Y2 - 2019/11/21/20:38:28
KW - Bayesianism
KW - Categorical ML
KW - Categorical probability theory
KW - Machine learning
ER -
TY - JOUR
TI - A Predicate/State Transformer Semantics for Bayesian Learning
AU - Jacobs, Bart
AU - Zanasi, Fabio
T2 - Electronic Notes in Theoretical Computer Science
T3 - The Thirty-second Conference on the Mathematical Foundations of Programming Semantics (MFPS XXXII)
AB - This paper establishes a link between Bayesian inference (learning) and predicate and state transformer operations from programming semantics and logic. Specifically, a very general definition of backward inference is given via first applying a predicate transformer and then conditioning. Analogously, forward inference involves first conditioning and then applying a state transformer. These definitions are illustrated in many examples in discrete and continuous probability theory and also in quantum theory.
DA - 2016/10/05/
PY - 2016
DO - 10/ggdgbb
DP - ScienceDirect
VL - 325
SP - 185
EP - 200
J2 - Electronic Notes in Theoretical Computer Science
LA - en
SN - 1571-0661
UR - http://www.sciencedirect.com/science/article/pii/S1571066116300883
Y2 - 2019/11/24/12:04:12
KW - Bayesianism
KW - Categorical ML
KW - Categorical probability theory
KW - Effectus theory
KW - Programming language theory
KW - Semantics
ER -
TY - JOUR
TI - What is a statistical model?
AU - McCullagh, Peter
T2 - The Annals of Statistics
DA - 2002/10//
PY - 2002
DO - 10/bkts3m
DP - Crossref
VL - 30
IS - 5
SP - 1225
EP - 1310
LA - en
UR - http://projecteuclid.org/euclid.aos/1035844977
Y2 - 2019/11/22/17:39:10
KW - Bayesianism
KW - Categorical ML
KW - Categorical probability theory
KW - Compendium
KW - Purely theoretical
KW - Statistical learning theory
ER -
TY - ELEC
TI - Algebraic Geometry and Statistical Learning Theory
AU - Watanabe, Sumio
T2 - Cambridge Core
AB - Cambridge Core - Pattern Recognition and Machine Learning - Algebraic Geometry and Statistical Learning Theory - by Sumio Watanabe
DA - 2009/08//
PY - 2009
LA - en
UR - /core/books/algebraic-geometry-and-statistical-learning-theory/9C8FD1BDC817E2FC79117C7F41544A3A
Y2 - 2019/11/22/18:05:57
KW - Algebra
KW - Bayesianism
KW - Purely theoretical
KW - Statistical learning theory
ER -
TY - CHAP
TI - Graphical Models: Overview
AU - Wermuth, N.
AU - Cox, D. R.
T2 - International Encyclopedia of the Social & Behavioral Sciences
A2 - Smelser, Neil J.
A2 - Baltes, Paul B.
AB - Graphical Markov models provide a method of representing possibly complicated multivariate dependencies in such a way that the general qualitative features can be understood, that statistical independencies are highlighted, and that some properties can be derived directly. Variables are represented by the nodes of a graph. Pairs of nodes may be joined by an edge. Edges are directed if one variable is a response to the other variable considered as explanatory, but are undirected if the variables are on an equal footing. Absence of an edge typically implies statistical independence, conditional, or marginal depending on the kind of graph. The need for a number of types of graph arises because it is helpful to represent a number of different kinds of dependence structures. Of special importance are chain graphs in which variables are arranged in a sequence or chain of blocks, the variables in any one block being on an equal footing, some being possibly joint responses to variables in the past and some being jointly explanatory to variables in the future of the block considered. Some main properties of such systems are outlined, and recent research results are sketched. Suggestions for further reading are given. As an illustrative example, some analysis of data on the treatment of chronic pain is presented.
CY - Oxford
DA - 2001/01/01/
PY - 2001
DP - ScienceDirect
SP - 6379
EP - 6386
LA - en
PB - Pergamon
SN - 978-0-08-043076-8
ST - Graphical Models
UR - http://www.sciencedirect.com/science/article/pii/B008043076700440X
Y2 - 2019/11/22/19:12:23
KW - Bayesianism
KW - Classical ML
KW - Machine learning
ER -