*SPatt* (Statistic for Patterns) is a suite of C++ programs designed for the computation of pattern occurrences p-value on text.
Assuming the text is generated according to Markov model, the p-value of a given observation is its probability to occur.
The lower is the p-value, the more unlikely is the observation.
For example, this tools can be used to find patterns with unusual behaviour in DNA or proteins sequences.

The DNA motif/pattern

Here is the command to run ("-S" for the provided sequence, "-p" for the pattern, for the alphabet descriptor, "-m" for the Markov model order, "-1" means independent and identically uniformely distributed, "--over" to compute over-representation p-value):

spatt -S phage_lambda.fasta -p "GCTGG|CCAGC" -a "ACGT" -m -1 --over

and here is its (truncated) result:

distribution: P(N=0)=9.698565e-42 P(N=1)=9.130414e-40 P(N=2)=4.298067e-38 P(N=3)=1.348945e-36 P(N=4)=3.175459e-35 [...] P(N=206)=2.629107e-23 P(N=207)=1.210827e-23 P(N=208)=5.549911e-24 P(N=209)=2.531807e-24 P(N>=210)=2.090885e-24 pattern=GCTGG|CCAGC Nobs=210 P(N>=Nobs)=2.090885e-24

This result indicates that the observation of "at least 210 occurrences of

Computing the distribution of pattern in random sequences is a challenging and computationally intensive
task for which it exists many concurrent approaches. The goal of *SPatt* is to implement make
available the most relevant ones in a single easy-to-use package. Here is a list of the current features
implemented in *SPatt*:

- arbitrary alphabet (DNA, protein, binary, others);
- automatic detection of case sensitive alphabets;
- regex-like syntax allowing for complex patterns;
- homogeneous Markov model of abitrary order;
- exact computations for a single sequence or a set of sequences;
- Gaussian approximations;
- overlapping or renewal counting;
- presence/absence counting when dealing with datasets with several sequences;
- efficient implementation using optimal Markov chain embedding through deterministic finite automata;
- output of a scilab source code of the Markov chain embedding parameter (mostly for educational purpose);
- optional output of dot (graphviz package) files for representing automata.

