Usage
=====
Installation
------------
.. code-block:: bash
pip install msfiddle
PyTorch must be installed separately following the
`official PyTorch installation guide `_.
Alternatively, install the optional inference extra:
.. code-block:: bash
pip install "msfiddle[inference]"
Downloading Pre-trained Models
-------------------------------
Model weights must be downloaded before running predictions:
.. code-block:: bash
# Download to the default location (~/.msfiddle/check_point)
msfiddle-download-models
# Download specific models to a custom location
msfiddle-download-models --destination /path/to/models \
--models fiddle_tcn_qtof fiddle_rescore_qtof
To inspect current model paths:
.. code-block:: bash
msfiddle-checkpoint-paths
Running Predictions
--------------------
**Demo data:**
.. code-block:: bash
msfiddle --demo --result_path ./output_demo.csv --device 0
**Custom data:**
.. code-block:: bash
msfiddle --test_data /path/to/data.mgf \
--instrument_type orbitrap \
--result_path /path/to/results.csv \
--device 0
``--instrument_type`` accepts ``orbitrap`` (default) or ``qtof``.
**Custom model paths:**
.. code-block:: bash
msfiddle --test_data /path/to/data.mgf \
--config_path /path/to/config.yml \
--resume_path /path/to/tcn_model.pt \
--rescore_resume_path /path/to/rescore_model.pt \
--result_path /path/to/results.csv \
--device 0
Integration with BUDDY and SIRIUS
-----------------------------------
Candidate formulas from the `BUDDY/msbuddy command-line tool
`_ and the
`SIRIUS command-line interface `_ can be
incorporated to improve refinement results. ``--buddy_path`` and
``--sirius_path`` accept native/original tool outputs. The older
msfiddle-normalized CSV files are still accepted but are deprecated and will be
removed in ``msfiddle`` 3.0.0.
First, run BUDDY/msbuddy with the same MGF file:
.. code-block:: bash
msbuddy -mgf /path/to/data.mgf \
-output /path/to/buddy_output \
-ms orbitrap \
-d
``msbuddy`` writes ``msbuddy_result_summary.tsv`` in the output directory.
When ``-d`` is used, it also writes per-spectrum ``formula_results.tsv`` files
with per-candidate FDR scores. Passing the full output directory to
``msfiddle`` is preferred because those detailed scores can be used directly.
If only ``msbuddy_result_summary.tsv`` is passed, only rank 1 has an FDR score
and lower ranks are not used by the FDR threshold. See the
`msbuddy command-line API `_
for the full option list.
Next, run SIRIUS and export formula summaries:
.. code-block:: bash
sirius --input /path/to/data.mgf \
--project /path/to/sirius_project \
formulas --profile orbitrap
sirius --project /path/to/sirius_project \
summaries --top-k-summary=5 \
--output /path/to/sirius_output
SIRIUS writes formula summary files such as
``formula_identifications.tsv`` or ``formula_identifications_top-5.tsv``.
SIRIUS 6 may require ``sirius login`` before formula computation. See the
`SIRIUS command-line interface `_ for
workflow details and the full option list.
Then pass those native/original outputs to ``msfiddle``:
.. code-block:: bash
msfiddle --test_data /path/to/data.mgf \
--buddy_path /path/to/buddy_output \
--sirius_path /path/to/sirius_output \
--result_path /path/to/results.csv \
--device 0
You can also pass ``/path/to/buddy_output/msbuddy_result_summary.tsv`` or an
individual SIRIUS formula summary file directly. If only one external tool is
available, omit the other option. See :doc:`formats` for the native formats
and the deprecated msfiddle-normalized schemas.
Python API
----------
For a single native MS/MS spectrum:
.. code-block:: python
from msfiddle import predict_from_spectrum
candidates = predict_from_spectrum(
mz_array=[60.0, 85.0, 100.0, 125.0, 150.0],
intensity_array=[10.0, 50.0, 20.0, 35.0, 15.0],
precursor_mz=180.063,
adduct="[M+H]+",
top_k=5,
instrument_type="orbitrap",
collision_energy="Unknown",
device="cpu",
)
For repeated or batched use, instantiate a predictor once so model checkpoints
are loaded once and reused:
.. code-block:: python
from msfiddle import MsFiddlePredictor
predictor = MsFiddlePredictor(instrument_type="orbitrap", device="cpu")
results = predictor.predict_batch(
[
{
"id": "sample-1",
"mz_array": [60.0, 85.0, 100.0, 125.0, 150.0],
"intensity_array": [10.0, 50.0, 20.0, 35.0, 15.0],
"precursor_mz": 180.063,
"adduct": "[M+H]+",
"collision_energy": "Unknown",
}
]
)
MGF files can also be used from Python:
.. code-block:: python
from msfiddle import predict_from_mgf
df = predict_from_mgf(
"/path/to/data.mgf",
instrument_type="orbitrap",
device="cpu",
)
The Python APIs are quiet by default and do not download checkpoints unless
``download_models=True`` is passed. The CLI also requires checkpoints to be
downloaded before prediction and prints a checkpoint error with the
``msfiddle-download-models`` command if they are missing.