Skip to content

napsu/SPaiK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPaiK - Scalable Pairwise Kernel Learning Software

SPaiK is a scalable software package for pairwise kernel learning. It combines the stochastic inexact limited-memory bundle method (StoILMBM) for optimization, the stochastic generalized vec trick (sGVT) for efficient computation with pairwise Kronecker kernels, and a rich set of kernel functions provided by RLScore.

The included loss-functions for the pairwise kernel problem are:

  • squared loss,
  • squared epsilon-insensitive loss,
  • epsilon-insensitive squared loss,
  • epsilon-insensitive absolute loss,
  • absolute loss.

Note that only the epsilon-insensitive squared loss has been tested for functionality.

Files included

  • spaik.py

    • Main Python file. Includes RLScore calls.
  • pkl_utility.py

    • Python utility programs.
  • spaik.f95

    • Building block between Python and Fortran for pairwise learning software.
  • slmba.f95

    • StoLMBM - a stochastic limited memory bundle method for nonsmooth optimization (specially developed for SPaiK).
  • objfun.f95

    • Computation of the function and subgradients values with different loss functions. Selection between loss functions is made in spaik.py. Includes sGVT.
  • initpkl.f95

    • Initialization of parameters and variables for SPaiK and StoLMBM. Includes modules:
      • initpkl - Initialization of parameters for SPaiK.
      • initslmba - Initialization of StoLMBM.
  • parameters.f95

    • Parameters for Fortran. Inludes modules:
      • r_precision - Precision for reals,
      • param - Parameters.
  • subpro.f95

    • Subprograms for StoLMBM.
  • data.py

    • Contains functions to load the example datasets in SPaiK. The data files are assumed to be located in a folder "data". This repository does not include the datasets themselves; links to all example datasets are provided in repository github.com/TurkuML.

    • Contains functions to create train-test-validation splits. Splits are created for every experimental setting IDIT, IDOT, ODIT, and ODOT (see the references below).

  • Makefile

    • makefile: Builds a shared library to allow StoLMBM (Fortran95 code) to be called from Python program SPaiK. Uses f2py, Python3.7, and requires a Fortran compiler (gfortran) to be installed.

Installation and usage

The source uses f2py and Python3.7, and requires a Fortran compiler (gfortran by default) and the RLScore to be installed.

To use the code:

  1. Select the data and loss function from spaik.py file.
  2. Run Makefile (by typing "make") to build a shared library that allows spaik.f95 (Fortran95 code) to be called from Python program spaik.py.
  3. Finally, just type "python3.7 spaik.py".

The algorithm returns a csv-file with performance measures (C-index, IC-index, and MSE) computed in the test set under different experimental settings IDIT, IDOT, ODIT, and ODOT. The best results are selected using a separate validation set and validated w.r.t. C-index. In addition, separate csv-files with predictions under different experimental settings are returned.

References:

Acknowledgements

The work was financially supported by the Research Council of Finland, Project No. #340182 and #345804 led by Tapio Pahikkala and Project No. #340140 and #345805 led by Antti Airola.

About

Scalable pairwise kernel learning algorithm with stochastic generalized vec trick

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors