A Bootstrap Based Neyman-Pearson Test for Identifying Variable Importance

Gregory Ditzler, Robi Polikar, Gail Rosen

    Research output: Contribution to journalArticle

    15 Citations (Scopus)

    Abstract

    Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.

    Original languageEnglish (US)
    Article number6823119
    Pages (from-to)880-886
    Number of pages7
    JournalIEEE Transactions on Neural Networks and Learning Systems
    Volume26
    Issue number4
    DOIs
    StatePublished - Apr 1 2015

    Fingerprint

    Feature extraction
    Statistical tests
    Data structures

    All Science Journal Classification (ASJC) codes

    • Software
    • Computer Science Applications
    • Computer Networks and Communications
    • Artificial Intelligence

    Cite this

    @article{6dc4f749d6334bdd87efa7d391dafa68,
    title = "A Bootstrap Based Neyman-Pearson Test for Identifying Variable Importance",
    abstract = "Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.",
    author = "Gregory Ditzler and Robi Polikar and Gail Rosen",
    year = "2015",
    month = "4",
    day = "1",
    doi = "10.1109/TNNLS.2014.2320415",
    language = "English (US)",
    volume = "26",
    pages = "880--886",
    journal = "IEEE Transactions on Neural Networks and Learning Systems",
    issn = "2162-237X",
    publisher = "IEEE Computational Intelligence Society",
    number = "4",

    }

    A Bootstrap Based Neyman-Pearson Test for Identifying Variable Importance. / Ditzler, Gregory; Polikar, Robi; Rosen, Gail.

    In: IEEE Transactions on Neural Networks and Learning Systems, Vol. 26, No. 4, 6823119, 01.04.2015, p. 880-886.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - A Bootstrap Based Neyman-Pearson Test for Identifying Variable Importance

    AU - Ditzler, Gregory

    AU - Polikar, Robi

    AU - Rosen, Gail

    PY - 2015/4/1

    Y1 - 2015/4/1

    N2 - Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.

    AB - Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.

    UR - http://www.scopus.com/inward/record.url?scp=85028170986&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85028170986&partnerID=8YFLogxK

    U2 - 10.1109/TNNLS.2014.2320415

    DO - 10.1109/TNNLS.2014.2320415

    M3 - Article

    AN - SCOPUS:85028170986

    VL - 26

    SP - 880

    EP - 886

    JO - IEEE Transactions on Neural Networks and Learning Systems

    JF - IEEE Transactions on Neural Networks and Learning Systems

    SN - 2162-237X

    IS - 4

    M1 - 6823119

    ER -