Documentation

General

This web platform provides a user-interface for analysis of of Multi Capillary Column - Ion Mobility Spectrometry (MCC-IMS) data. In particular, it allows automated supervised feature extraction and class prediction from user supplied measurements and provides visualizations. By applying a selection of pre-processing and peak detection methods it extracts peak-intensities. These features or are then used to build a prediction model, which can be used to assign learned class labels to new measurements. The process of creating such a prediction model is available in an automated and guided fashion, either selecting standard model parameters or guiding the user through the selection process of deciding between a range of pre-processing and evaluation methods.

For your convenience we provide a selection of sample datasets from various backgrounds.

Screencast

Datasets

You can select between a sample dataset or upload your own zip-archive. When uploading your own dataset make sure to include a class_labels.tsv (tab separated values) or class_labels.csv (comma separated values) file. The first row should be a header row. Following that we expect the first column to hold the names of the MCC-IMS measurements, while the second column should hold the class labels. When you want to perform Peak-Detection using the VISUALNOWLAYER approach, you need to include the VISUALNOWLAYER in the .zip-archive. For more details see #visualnowlayer and #visualnowlayer_format.

File formats

When uploading your own dataset for training or classification, we expect a zip-archive with following requirements:

MCC/IMS measurements (each filename ending in '_ims.csv')
Class labels referencing the measurements' names in the archive (filename ending in 'class_labels')
(optional) A Visualnow-Layer file (filename ending in 'layer')

When making use of the "Existing Results" route, you are able to upload your own peak detection results or a feature matrix. Right now you are limited to one peak detection method per analysis in this route.

Class labels referencing the measurements' names in the peak detection result / feature matrix (filename ending in 'class_labels')
(either) Peak detection results with each filename ending in '_peak_detection_result.csv'
(or) Feature matrix ending in '_feature_matrix.csv'

MCC/IMS measurements

Make sure to follow the correct MCC/IMS measurement format such as used in sample measurement or described in:

Vautz et al., 2008, Recommendation of a standard format for data sets from GC/IMS with sensor-controlled sampling, International Journal for Ion Mobility Spectrometry

This also means your measurements should be named in accordance with the scheme "device-serial-number_YYMMDDhhmm_ims.csv", eg. BD18_1408280826_ims.csv.

Class labels

The class labels file should end with the suffix class_labels.csv, class_labels.tsv or class_labels.txt, eg. candy_class_labels.csv. The first row should be a header row in the class labels file. The first column should reference all measurement names in the zip-archive, while the second column assigns the class label such as:

Name	Label
BD18_1408280826_ims.csv	menthol
BD18_1408280834_ims.csv	citrus

See example files for .csv, .tsv and .txt.

Visualnow-Layer file format

The Visualnowlayer file defines the peak positions to extract intensities from and can be used as a peak detection method. When uploading your own dataset, make sure the file is named with the suffix "_layer.csv" or "_layer.xls" and it's included in the archive. The annotation file should feature one of the two following schemes:

Scheme1

(original from VisualNow): Export you peak layer from VisualNow and separately save the "layer" sheet.

3 Comment lines
1/K0	1/K0 radius	RT	RT radius
0.3	0.15	25	3
...	...	...	...

See Layer Example 1 (xls) and Layer Example 2 (csv).

Scheme2

(custom):

3 Comment lines
inverse_reduced_mobility	radius_inverse_reduced_mobility	retention_time	radius_retention_time
0.3	0.15	25	3
...	...	...	...

Make sure to choose utf-8 as encoding scheme when saving your layer to prevent encoding issues.

Peak Detection Results

Each peak detection result file should end with the suffix _peak_detection_result.csv. If the name contains any of the peak detection result names TOPHAT, PEAX, WATERSHED, JIBB or VISUALNOWLAYER, it will be assigned as peak detection method. Rows should hold measurement names. The first line should be the header. The first column should hold the measurement's name, while the other columns should be as follows:

measurement_name	peak_id	retention_time	inverse_reduced_mobility	intensity
BD99_1905190837_ims.csv	1	3.062	0.410	0.005
BD99_1905190837_ims.csv	2	...	...	...

Feature matrix

Each feature matrix file should end with the suffix _feature_matrix.csv. If the name contains any of the peak detection result names TOPHAT, PEAX, WATERSHED, JIBB or VISUALNOWLAYER, it will be assigned as peak detection method. Rows should hold measurement names. The first line should be the header. The first column should hold the measurement names, while the other columns should be as follows:

Measurement	Peak_0067	Peak_0072	Peak_0122	...
BD99_1905190837_ims.csv	0.021	0.059	0.086	...
BD99_1906190842_ims.csv	0.040	0.051	0.000	...

Preprocessing

Preprocessing aims to increase the signal to noise ratio in the samples and improve the accuracy of peak detection methods. It mainly involves compensation for artifacts of the MCC/IMS, normalization and smoothing procedures.

BASELINE_CORRECTION

To correct for the Reactant Ion Peak we apply baseline correction. This method reduces the effect of the RIP-tailing and lowers the baseline of the affected spectra.

INTENSITY_NORMALIZATION

Apply normalization of spectra intensities in reference to the maximum intensity. Leads to intensities between 0 and 1.

CROP_INVERSE_REDUCED_MOBILITY

As practically no peaks occur with inverse reduced ion mobilities < 0.4 Vs/cm^2, we remove the majority of spectra prior to the RIP.

NOISE_SUBTRACTION

To reduce noise we subtract a constant factor from all intensities. To determine the noise level we average the intensities with inverse reduced ion mobility values < 0.4 Vs/cm^2.

DISCRETE_WAVELET_TRANSFORMATION

Apply a compression algorithm to the spectra. This algorithm decomposes the signal of the spectra similarly to a Fourier transformation and applies a high and a low pass filter to the spectra. All signals smaller than a cutoff threshold are removed and the signal is reconstructed without the noise. We make use of the Daubechies 8 wavelet and the implementation of PyWavelets.

GAUSSIAN_FILTER

Removes noise by applying a gaussian kernel and merging intensities with neighboring signals. A two dimensional gaussian filter is applied with a fixed kernel size.

MEDIAN_FILTER

Removes noise by replacing intensities with the median of neighboring signals using a fixed window size.

SAVITZKY_GOLAY_FILTER

Removes noise by replacing intensities with a weighted average and a fixed window size.

Peak Detection

PEAX

PEAX is a non-commercial automated peak extraction method for MCC/IMS measurements. It's core idea is to first extract a lower dimensional peak model from the spectra and merge them info two-dimensional peak models. For more details see (2014, D’Addario et. al).

VISUALNOWLAYER

Extract peaks in rectangles based on the positions provided in the layer/annotation file. We support both .xls and .csv format. See #visualnowlayer_format for reference.

TOPHAT

Extracts peaks in a two-step process: First: tophat filtering, second local maxima extraction. In the first step a noise-threshold is applied removing all intensities below this threshold and a mask is created highlighting the areas of high intensities. In the second step the local maxima of each connected component is extracted and saved as intensity value.

JIBB

Naive peak extraction approach. Considers an area a peak, if it's intensity is 1.5 times above noise level and 5 consecutive signal points are raising continuously towards the local maxima in inverse reduced mobility direction and 7 consecutive signal points are raising in retention time dimension, while reflecting the inverse behavior when moving away from the maximum.

WATERSHED

Approach resembling a falling waterlevel that is lowered from maximum intensity value until it reaches the noise level. Local maxima reaching out of the water-level are labeled as peaks. Similarly used in the iPHEx software (2011, Bunkowski).

Peak Alignment

The Peak Alignment Method is applied after Peak Detection to merge peaks close to each other and reduce the number of peaks. We support 2 clustering algorithms: DBSCAN and PROBE_CLUSTERING.

DBSCAN

DBSCAN - Density Based Spatial Clustering of Applications with Noise (1996, Ester et. al) is a frequently used clustering algorithm popular for its intuitive density based approach.

PROBE_CLUSTERING

Clustering method to reduce groups of peaks using fixed grid positions. This ensures unique and consistent PeakIds when using the same grid parameters. Especially important for prediction of class labels from raw data.