File Formats
Input: MGF
msfiddle accepts tandem mass spectra in .mgf format. Four fields are
required per spectrum; all others are ignored.
Field |
Description |
|---|---|
|
Unique spectrum identifier, propagated to the |
|
Observed precursor m/z. |
|
Adduct type (e.g. |
|
Collision energy in eV. |
Example:
BEGIN IONS
TITLE=EMBL_MCF_2_0_HRMS_Library000529
PEPMASS=111.02016
CHARGE=1-
PRECURSOR_TYPE=[M-H]-
PRECURSOR_MZ=111.02016
COLLISION_ENERGY=50.0
41.0148 0.329893
68.0258 0.402906
111.0203 100.0
END IONS
Supported precursor types:
[M+H]+, [M+2H]2+, [M+Na]+, [M-H]-, [M+H-H2O]+,
[M-H2O+H]+, [2M+H]+, [2M-H]-, [M+H-2H2O]+, [M+H-NH3]+,
[M+H+NH3]+, [M+NH4]+, [M+H-CH2O2]+, [M+H-CH4O2]+,
[M-H-CO2]-, [M-CHO2]-, [M-H-H2O]-
Output: msfiddle CSV
One row is produced per input spectrum.
Column |
Description |
|---|---|
|
Spectrum identifier from the MGF |
|
Raw TCN output: semicolon-separated atom-count vector (C, H, O, N, F, S, Cl, P, B, Br, I, Na, K). |
|
Neutral monoisotopic mass derived from |
|
Top formula from the TCN model prior to refinement. |
|
Monoisotopic mass of |
|
Total atom count predicted by the model. |
|
H/C ratio predicted by the model. |
|
Wall time per spectrum in seconds (prediction + refinement). |
|
The k-th best refined formula (0-indexed), ranked by rescore score.
|
|
Monoisotopic mass of |
|
Rescore model confidence score for |
The number of ranked columns is set by top_k in the configuration file
(default: 5).
Native/Original BUDDY and SIRIUS Outputs
The commands below use the same MGF input file used by msfiddle. They
produce native BUDDY/msbuddy and SIRIUS outputs that can be passed directly to
msfiddle with --buddy_path or --sirius_path.
BUDDY / msbuddy
BUDDY is available through the msbuddy command-line tool. It writes a
result-summary TSV file to the output directory; -d also writes detailed
per-query candidate results.
msbuddy -mgf /path/to/data.mgf \
-output /path/to/buddy_output \
-ms orbitrap \
-p -n_cpu 12 \
-d -hal
Use -ms qtof or -ms fticr instead of -ms orbitrap when appropriate.
Native summary fields include identifier, mz, rt, adduct,
formula_rank_1 through formula_rank_5, and estimated_fdr. Keep the
MGF TITLE values aligned with the identifiers reported by msbuddy.
SIRIUS
The SIRIUS command-line interface first
writes results to a project space. The summaries / write-summaries
step exports tabular summary files from
that project space.
sirius --input /path/to/data.mgf \
--project /path/to/sirius_project \
formulas --profile orbitrap
sirius --project /path/to/sirius_project \
summaries --top-k-summary=5 \
--output /path/to/sirius_output
Use --profile qtof instead of --profile orbitrap for Q-TOF data. The
molecular formula summary is exported as formula_identifications.tsv by
default, or as formula_identifications.[tsv|csv|xlsx] depending on the
selected export format. SIRIUS summary columns include formula/adduct
identifiers and scores such as molecularFormula, adduct, rank or
rankingScore, SiriusScore, TreeScore, IsotopeScore,
mass-error fields, and explained-peak fields. SIRIUS 6 may require
sirius login before formula computation.
If the project space already exists, run only the summaries command. The
resulting top-k or all-hit SIRIUS summaries can be passed to --sirius_path
directly.
Optional BUDDY/msbuddy Results
--buddy_path accepts:
a native/original msbuddy output directory containing
msbuddy_result_summary.tsv;the native/original
msbuddy_result_summary.tsvfile itself; oran msfiddle-normalized CSV with the columns below (deprecated).
When a native/original msbuddy output directory includes detailed
formula_results.tsv files from the -d option, msfiddle uses their
per-candidate FDR scores. Passing the full output directory is preferred.
When only the summary TSV is available, msfiddle can use the top-ranked formula
and its estimated_fdr score; lower ranks are retained without scores and
are therefore ignored by the configured FDR threshold.
msfiddle-normalized BUDDY CSV schema (deprecated)
Deprecated since version 2.1.0: Use native msbuddy output instead. The msfiddle-normalized BUDDY CSV format
will be removed in msfiddle 3.0.0; loading it emits a
DeprecationWarning.
Required columns:
Column |
Description |
|---|---|
|
Spectrum identifier matching the MGF |
|
Precursor type string. |
|
Top-5 candidate formulas from BUDDY. |
|
Confidence scores; candidates below the configured threshold are excluded. |
Optional SIRIUS Results
--sirius_path accepts:
a SIRIUS summary output directory containing
formula_identificationsfiles;a native/original SIRIUS
formula_identificationssummary file in TSV, CSV, or XLSX form; oran msfiddle-normalized CSV with the columns below (deprecated).
For native SIRIUS summaries, msfiddle reads identifier, molecular formula, adduct, rank, and score columns from the formula-identification output, then keeps the top five candidates per spectrum.
msfiddle-normalized SIRIUS CSV schema (deprecated)
Deprecated since version 2.1.0: Use native SIRIUS output instead. The msfiddle-normalized SIRIUS CSV format
will be removed in msfiddle 3.0.0; loading it emits a
DeprecationWarning.
Required columns:
Column |
Description |
|---|---|
|
Spectrum identifier matching the MGF |
|
Top-5 candidate formulas from SIRIUS. |
|
Predicted adduct for each candidate. |
|
Log-likelihood scores; candidates below the configured threshold are excluded. |