Documentation
General
This web platform provides a user-interface for analysis of of Multi Capillary Column - Ion Mobility Spectrometry (MCC-IMS) data.
In particular, it allows automated supervised feature extraction and class prediction from user supplied measurements and provides visualizations.
By applying a selection of pre-processing and peak detection methods it extracts peak-intensities.
These features or are then used to build a prediction model, which can be used to assign learned class labels to new measurements.
The process of creating such a prediction model is available in an automated and guided fashion, either selecting standard model parameters or guiding
the user through the selection process of deciding between a range of pre-processing and evaluation methods.
For your convenience we provide a selection of sample datasets from various backgrounds.
Screencast
Datasets
You can select between a sample dataset or upload your own zip-archive.
When uploading your own dataset make sure to include a class_labels.tsv (tab separated values) or class_labels.csv (comma separated values) file.
The first row should be a header row. Following that we expect the first column to hold the names of the MCC-IMS measurements,
while the second column should hold the class labels. When you want to perform Peak-Detection using the VISUALNOWLAYER approach,
you need to include the VISUALNOWLAYER in the .zip-archive. For more details see #visualnowlayer and #visualnowlayer_format.
File formats
When uploading your own dataset for training or classification, we expect a zip-archive with following requirements:
When making use of the "Existing Results" route, you are able to upload your own peak detection results or a feature matrix. Right now you are limited to one peak detection method per analysis in this route.
See Layer Example 1 (xls) and Layer Example 2 (csv).
- MCC/IMS measurements (each filename ending in '_ims.csv')
- Class labels referencing the measurements' names in the archive (filename ending in 'class_labels')
- (optional) A Visualnow-Layer file (filename ending in 'layer')
When making use of the "Existing Results" route, you are able to upload your own peak detection results or a feature matrix. Right now you are limited to one peak detection method per analysis in this route.
- Class labels referencing the measurements' names in the peak detection result / feature matrix (filename ending in 'class_labels')
- (either) Peak detection results with each filename ending in '_peak_detection_result.csv'
- (or) Feature matrix ending in '_feature_matrix.csv'
MCC/IMS measurements
Make sure to follow the correct MCC/IMS measurement format such as used in sample measurement or described in:
Vautz et al., 2008,This also means your measurements should be named in accordance with the scheme "device-serial-number_YYMMDDhhmm_ims.csv", eg. BD18_1408280826_ims.csv.Recommendation of a standard format for data sets from GC/IMS with sensor-controlled sampling, International Journal for Ion Mobility Spectrometry
Class labels
The class labels file should end with the suffix class_labels.csv, class_labels.tsv or class_labels.txt, eg. candy_class_labels.csv.
The first row should be a header row in the class labels file. The first column should reference all measurement names in the zip-archive, while the second column assigns the class label such as:
See example files for .csv,
.tsv and
.txt.
Name | Label |
BD18_1408280826_ims.csv | menthol |
BD18_1408280834_ims.csv | citrus |
Visualnow-Layer file format
The Visualnowlayer file defines the peak positions to extract intensities from and can be used as a peak detection method. When uploading your own dataset, make sure the file is named with the suffix "_layer.csv" or "_layer.xls" and it's included in the archive. The annotation file should feature one of the two following schemes:Scheme1
3 Comment lines | |||
1/K0 | 1/K0 radius | RT | RT radius |
0.3 | 0.15 | 25 | 3 |
... | ... | ... | ... |
Scheme2
3 Comment lines | |||
inverse_reduced_mobility | radius_inverse_reduced_mobility | retention_time | radius_retention_time |
0.3 | 0.15 | 25 | 3 |
... | ... | ... | ... |
Make sure to choose utf-8 as encoding scheme when saving your layer to prevent encoding issues.
Peak Detection Results
Each peak detection result file should end with the suffix _peak_detection_result.csv. If the name contains any of the peak detection result names TOPHAT, PEAX, WATERSHED, JIBB or VISUALNOWLAYER, it will be assigned as peak detection method. Rows should hold measurement names. The first line should be the header. The first column should hold the measurement's name, while the other columns should be as follows:measurement_name | peak_id | retention_time | inverse_reduced_mobility | intensity |
BD99_1905190837_ims.csv | 1 | 3.062 | 0.410 | 0.005 |
BD99_1905190837_ims.csv | 2 | ... | ... | ... |
Feature matrix
Each feature matrix file should end with the suffix _feature_matrix.csv. If the name contains any of the peak detection result names TOPHAT, PEAX, WATERSHED, JIBB or VISUALNOWLAYER, it will be assigned as peak detection method. Rows should hold measurement names. The first line should be the header. The first column should hold the measurement names, while the other columns should be as follows:Measurement | Peak_0067 | Peak_0072 | Peak_0122 | ... |
BD99_1905190837_ims.csv | 0.021 | 0.059 | 0.086 | ... |
BD99_1906190842_ims.csv | 0.040 | 0.051 | 0.000 | ... |
Preprocessing
Preprocessing aims to increase the signal to noise ratio in the samples and improve the accuracy of peak detection methods. It mainly involves compensation for artifacts of the MCC/IMS, normalization and smoothing procedures.
BASELINE_CORRECTION
To correct for the Reactant Ion Peak we apply baseline correction. This method reduces the effect of the RIP-tailing and lowers the baseline of the affected spectra.
INTENSITY_NORMALIZATION
Apply normalization of spectra intensities in reference to the maximum intensity. Leads to intensities between 0 and 1.
CROP_INVERSE_REDUCED_MOBILITY
As practically no peaks occur with inverse reduced ion mobilities < 0.4 Vs/cm^2, we remove the majority of spectra prior to the RIP.
NOISE_SUBTRACTION
To reduce noise we subtract a constant factor from all intensities. To determine the noise level we average the intensities with inverse reduced ion mobility values < 0.4 Vs/cm^2.
DISCRETE_WAVELET_TRANSFORMATION
Apply a compression algorithm to the spectra.
This algorithm decomposes the signal of the spectra similarly to a Fourier transformation
and applies a high and a low pass filter to the spectra.
All signals smaller than a cutoff threshold are removed and the signal is reconstructed without the noise. We make use of the Daubechies 8 wavelet and the implementation of PyWavelets.
GAUSSIAN_FILTER
Removes noise by applying a gaussian kernel and merging intensities with neighboring signals. A two dimensional gaussian filter is applied with a fixed kernel size.
MEDIAN_FILTER
Removes noise by replacing intensities with the median of neighboring signals using a fixed window size.
SAVITZKY_GOLAY_FILTER
Removes noise by replacing intensities with a weighted average and a fixed window size.
Peak Detection
PEAX
PEAX is a non-commercial automated peak extraction method for MCC/IMS measurements. It's core idea is to first extract a lower dimensional peak model from the spectra and merge them info two-dimensional peak models. For more details see (2014, D’Addario et. al).
VISUALNOWLAYER
Extract peaks in rectangles based on the positions provided in the layer/annotation file.
We support both .xls and .csv format. See #visualnowlayer_format for reference.
TOPHAT
Extracts peaks in a two-step process: First: tophat filtering, second local maxima extraction. In the first step a noise-threshold is applied removing all intensities below this threshold and a mask is created highlighting the areas of high intensities. In the second step the local maxima of each connected component is extracted and saved as intensity value.
JIBB
Naive peak extraction approach. Considers an area a peak, if it's intensity is 1.5 times above noise level and 5 consecutive signal points are raising continuously towards the local maxima in inverse reduced mobility direction and 7 consecutive signal points are raising in retention time dimension, while reflecting the inverse behavior when moving away from the maximum.
WATERSHED
Approach resembling a falling waterlevel that is lowered from maximum intensity value until it reaches the noise level. Local maxima reaching out of the water-level are labeled as peaks. Similarly used in the iPHEx software (2011, Bunkowski).
Peak Alignment
The Peak Alignment Method is applied after Peak Detection to merge peaks close to each other and reduce the number of peaks. We support 2 clustering algorithms: DBSCAN and PROBE_CLUSTERING.DBSCAN
DBSCAN - Density Based Spatial Clustering of Applications with Noise (1996, Ester et. al) is a frequently used clustering algorithm popular for its intuitive density based approach.
PROBE_CLUSTERING
Clustering method to reduce groups of peaks using fixed grid positions. This ensures unique and consistent PeakIds when using the same grid parameters. Especially important for prediction of class labels from raw data.