Stochastic chains with variable length memory and the algorithm Context
Stochastic chains with variable length memory define an interesting family of stochastic chains of infinite order on a finite alphabet. The idea is that for each past, only a finite suffix of the past, called "context", is enough to predict the next symbol. The set of contexts can be represented by a rooted tree with finite labeled branches. The law of the chain is characterized by its tree of contexts and by an associated family of transition probabilities indexed by the tree. These models were first introduced in the information theory literature by Rissanen (1983) as an universal tool to perform data compression. Recently, they have been used to model up scientific data in areas as different as biology, linguistics and music. Originally called "finite memory source" or "tree machines", these models became quite popular in the statistics literature under the name of "Variable Length Markov Chains" coined by Buhlmann and Wyner (1999). In my talk I will present some of the basic ideas, problems and examples of application in the field. I will focus on the algorithm Context which estimates the tree of contexts and the associated family of transition probabilities defining the chain.