Usage

Installation

pip install msfiddle

PyTorch must be installed separately following the official PyTorch installation guide. Alternatively, install the optional inference extra:

pip install "msfiddle[inference]"

Downloading Pre-trained Models

Model weights must be downloaded before running predictions:

# Download to the default location (~/.msfiddle/check_point)
msfiddle-download-models

# Download specific models to a custom location
msfiddle-download-models --destination /path/to/models \
                          --models fiddle_tcn_qtof fiddle_rescore_qtof

To inspect current model paths:

msfiddle-checkpoint-paths

Running Predictions

Demo data:

msfiddle --demo --result_path ./output_demo.csv --device 0

Custom data:

msfiddle --test_data /path/to/data.mgf \
         --instrument_type orbitrap \
         --result_path /path/to/results.csv \
         --device 0

--instrument_type accepts orbitrap (default) or qtof.

Custom model paths:

msfiddle --test_data /path/to/data.mgf \
         --config_path /path/to/config.yml \
         --resume_path /path/to/tcn_model.pt \
         --rescore_resume_path /path/to/rescore_model.pt \
         --result_path /path/to/results.csv \
         --device 0

Integration with BUDDY and SIRIUS

Candidate formulas from the BUDDY/msbuddy command-line tool and the SIRIUS command-line interface can be incorporated to improve refinement results. --buddy_path and --sirius_path accept native/original tool outputs. The older msfiddle-normalized CSV files are still accepted but are deprecated and will be removed in msfiddle 3.0.0.

First, run BUDDY/msbuddy with the same MGF file:

msbuddy -mgf /path/to/data.mgf \
        -output /path/to/buddy_output \
        -ms orbitrap \
        -d

msbuddy writes msbuddy_result_summary.tsv in the output directory. When -d is used, it also writes per-spectrum formula_results.tsv files with per-candidate FDR scores. Passing the full output directory to msfiddle is preferred because those detailed scores can be used directly. If only msbuddy_result_summary.tsv is passed, only rank 1 has an FDR score and lower ranks are not used by the FDR threshold. See the msbuddy command-line API for the full option list.

Next, run SIRIUS and export formula summaries:

sirius --input /path/to/data.mgf \
       --project /path/to/sirius_project \
       formulas --profile orbitrap

sirius --project /path/to/sirius_project \
       summaries --top-k-summary=5 \
       --output /path/to/sirius_output

SIRIUS writes formula summary files such as formula_identifications.tsv or formula_identifications_top-5.tsv. SIRIUS 6 may require sirius login before formula computation. See the SIRIUS command-line interface for workflow details and the full option list.

Then pass those native/original outputs to msfiddle:

msfiddle --test_data /path/to/data.mgf \
         --buddy_path /path/to/buddy_output \
         --sirius_path /path/to/sirius_output \
         --result_path /path/to/results.csv \
         --device 0

You can also pass /path/to/buddy_output/msbuddy_result_summary.tsv or an individual SIRIUS formula summary file directly. If only one external tool is available, omit the other option. See File Formats for the native formats and the deprecated msfiddle-normalized schemas.

Python API

For a single native MS/MS spectrum:

from msfiddle import predict_from_spectrum

candidates = predict_from_spectrum(
    mz_array=[60.0, 85.0, 100.0, 125.0, 150.0],
    intensity_array=[10.0, 50.0, 20.0, 35.0, 15.0],
    precursor_mz=180.063,
    adduct="[M+H]+",
    top_k=5,
    instrument_type="orbitrap",
    collision_energy="Unknown",
    device="cpu",
)

For repeated or batched use, instantiate a predictor once so model checkpoints are loaded once and reused:

from msfiddle import MsFiddlePredictor

predictor = MsFiddlePredictor(instrument_type="orbitrap", device="cpu")
results = predictor.predict_batch(
    [
        {
            "id": "sample-1",
            "mz_array": [60.0, 85.0, 100.0, 125.0, 150.0],
            "intensity_array": [10.0, 50.0, 20.0, 35.0, 15.0],
            "precursor_mz": 180.063,
            "adduct": "[M+H]+",
            "collision_energy": "Unknown",
        }
    ]
)

MGF files can also be used from Python:

from msfiddle import predict_from_mgf

df = predict_from_mgf(
    "/path/to/data.mgf",
    instrument_type="orbitrap",
    device="cpu",
)

The Python APIs are quiet by default and do not download checkpoints unless download_models=True is passed. The CLI also requires checkpoints to be downloaded before prediction and prints a checkpoint error with the msfiddle-download-models command if they are missing.