Marina M.-C. Vidovic , Nico Görnitz, Klaus-Robert Müller , Gunnar Rätsch , Marius Kloft . Opening the Black Box: Revealing Interpretable Sequence Motifs in Kernel-Based Learning Algorithms. In Machine Learning and Knowledge Discovery in Databases (ECML PKDD), 2015.
Download preprint: not available
Download from publisher: http://link.springer.com/chapter/10.1007/978-3-319-23525-7_9
Related web page: not available
Bibliography entry: BibTeX
Abstract:
This work is in the context of kernel-based learning algorithms for sequence data. We present a probabilistic approach to automatically extract, from the output of such string-kernel-based learning algorithms, the subsequences—or motifs—truly underlying the machine’s predictions. The proposed framework views motifs as free parameters in a probabilistic model, which is solved through a global optimization approach. In contrast to prevalent approaches, the proposed method can discover even difficult, long motifs, and could be combined with any kernel-based learning algorithm that is based on an adequate sequence kernel. We show that, by using a discriminate kernel machine such as a support vector machine, the approach can reveal discriminative motifs underlying the kernel predictor. We demonstrate the efficacy of our approach through a series of experiments on synthetic and real data, including problems from handwritten digit recognition and a large-scale human splice site data set from the domain of computational biology.