API Reference

Prediction API

class msfiddle.api.MsFiddlePredictor(*, instrument_type='orbitrap', device='cpu', no_cuda=False, config_path=None, resume_path=None, rescore_resume_path=None, download_models=False, batch_size=32, verbose=False)

Reusable predictor for msfiddle formula inference.

Instantiate this class once in long-running Python applications to avoid loading model checkpoints for each spectrum.

predict_spectrum(mz_array, intensity_array, precursor_mz, adduct, *, top_k=5, collision_energy='Unknown')

Predict formula candidates for one native MS/MS spectrum.

Return type:

list[dict[str, Any]]

predict_batch(spectra, *, top_k=5, batch_size=None)

Predict formula candidates for multiple native MS/MS spectra.

Each spectrum mapping must contain mz_array, intensity_array, precursor_mz, and adduct. Optional keys are collision_energy and id.

Return type:

list[dict[str, Any]]

predict_mgf(test_data_path, *, buddy_path='', sirius_path='', top_k=None, batch_size=None)

Predict formulas for spectra in an MGF file.

buddy_path and sirius_path may point to either native/original BUDDY/msbuddy and SIRIUS formula-identification outputs or legacy msfiddle-normalized CSV files.

The returned DataFrame uses the same result columns as the CLI CSV.

Return type:

DataFrame

msfiddle.api.predict_from_spectrum(mz_array, intensity_array, precursor_mz, adduct, *, top_k=5, instrument_type='orbitrap', collision_energy='Unknown', device='cpu', no_cuda=False, config_path=None, resume_path=None, rescore_resume_path=None, download_models=False, batch_size=32)

Predict formula candidates for one native MS/MS spectrum.

Return type:

list[dict[str, Any]]

msfiddle.api.predict_batch_from_spectra(spectra, *, top_k=5, instrument_type='orbitrap', device='cpu', no_cuda=False, config_path=None, resume_path=None, rescore_resume_path=None, download_models=False, batch_size=32)

Predict formula candidates for multiple native MS/MS spectra.

Return type:

list[dict[str, Any]]

msfiddle.api.predict_from_mgf(test_data_path, *, instrument_type='orbitrap', buddy_path='', sirius_path='', device='cpu', no_cuda=False, config_path=None, resume_path=None, rescore_resume_path=None, download_models=False, top_k=None, batch_size=32)

Predict formulas for spectra in an MGF file.

Return type:

DataFrame

Molecular Utilities

msfiddle.utils.mol_utils.monoisotopic_mass_calculator(x, mode)

Calculate the monoisotopic mass from a molecule or formula string.

Parameters:
  • x – RDKit Mol object (mode=’mol’) or molecular formula string (mode=’f’).

  • mode – ‘mol’ to accept an RDKit Mol, ‘f’ to accept a formula string.

Returns:

Monoisotopic mass in Da.

Return type:

float

msfiddle.utils.mol_utils.dict_to_formula(formula_dict)

Convert an atom-count dict back to a molecular formula string.

Parameters:

formula_dict – Mapping of atom symbol to count, e.g. {‘C’: 10, ‘H’: 12}.

Returns:

Molecular formula string, e.g. ‘C10H12’.

Return type:

str

msfiddle.utils.mol_utils.formula_to_dict(formula)

Parse a molecular formula string into an atom-count dict.

Parameters:

formula – Molecular formula string, e.g. ‘C10H12N2O3’.

Returns:

Mapping of atom symbol to count, e.g. {‘C’: 10, ‘H’: 12, …}.

Returns {} if formula is not a string.

Return type:

dict

msfiddle.utils.mol_utils.formula_to_vector(formula)

Convert a molecular formula to a fixed-length atom-count vector.

The vector order follows ATOMS_INDEX (C, H, O, N, F, S, Cl, P, B, Br, I, Na, K).

Parameters:

formula – Molecular formula string.

Returns:

Length-13 vector of atom counts.

Return type:

list[int]

msfiddle.utils.mol_utils.vector_to_formula(vec, withH=True)

Convert a fixed-length atom-count vector back to a formula string.

Parameters:
  • vec – Sequence of atom counts aligned with ATOMS_INDEX.

  • withH – If False, hydrogen is omitted from the output string.

Returns:

Molecular formula string.

Return type:

str

MS/MS Utilities

msfiddle.utils.msms_utils.sdf2mgf(path, prefix)

Read an SDF file and convert each record to an MGF-style spectrum dict.

Parameters:
  • path – Path to the SDF file.

  • prefix – String prefix for spectrum titles. Each spectrum is titled <prefix>_<index>.

Returns:

Each dict has keys params (metadata), m/z array,

and intensity array. Records missing required SDF properties (MASS SPECTRAL PEAKS, PRECURSOR TYPE, PRECURSOR M/Z, SPECTRUM TYPE, COLLISION ENERGY, ION MODE) are skipped.

Return type:

list[dict]

msfiddle.utils.msms_utils.filter_spec(spectra, config, type2charge)

Filter and clean a list of spectra according to a configuration dict.

Applies sequential filters: instrument type/name, MS level, atom count/type, precursor type, peak count, m/z range, and ppm mass error.

Parameters:
  • spectra – List of spectra dicts in MGF format.

  • config – Dict of filter thresholds (keys: ‘instrument_type’, ‘ms_level’, ‘atom_type’, ‘precursor_type’, ‘min_peak_num’, ‘min_mz’, ‘max_mz’, ‘ppm_tolerance’, etc.).

  • type2charge – Dict mapping precursor type string to charge int.

Returns:

(clean_spectra, smiles_list) — filtered spectra and their SMILES.

Return type:

tuple

msfiddle.utils.msms_utils.simulate_experimental_mz(theoretical_mz, relative_mass_tolerance_ppm)

Simulate experimental precursor m/z by shifting the theoretical value within a Gaussian distribution of mass deviations.

Parameters:
  • theoretical_mz – Theoretical precursor m/z value.

  • relative_mass_tolerance_ppm – Relative mass tolerance in ppm.

Returns:

Simulated experimental precursor m/z value.

Return type:

float

msfiddle.utils.msms_utils.ce2nce(ce, precursor_mz, charge)

Convert absolute collision energy (eV) to normalized collision energy (NCE).

Parameters:
  • ce – Collision energy in eV.

  • precursor_mz – Precursor m/z value.

  • charge – Precursor charge state (int, 1–8).

Returns:

Normalized collision energy (dimensionless).

Return type:

float

msfiddle.utils.msms_utils.precursor_mz_calculator(precursor_type, mass)

Compute the expected precursor m/z from neutral monoisotopic mass.

Parameters:
  • precursor_type – Adduct string, e.g. ‘[M+H]+’.

  • mass – Neutral monoisotopic mass in Da.

Returns:

Theoretical precursor m/z.

Return type:

float

Raises:

ValueError – If precursor_type is not supported.

msfiddle.utils.msms_utils.mass_calculator(precursor_type, precursor_mz)

Back-calculate neutral monoisotopic mass from observed precursor m/z.

The inverse of precursor_mz_calculator.

Parameters:
  • precursor_type – Adduct string, e.g. ‘[M+H]+’.

  • precursor_mz – Observed precursor m/z.

Returns:

Neutral monoisotopic mass in Da.

Return type:

float

Raises:

ValueError – If precursor_type is not supported.

Formula Refinement

msfiddle.utils.refine_utils.passes_senior_rule(formula)

Check whether a molecular formula satisfies the SENIOR valence rules.

The three SENIOR rules are:
  1. The sum of odd-valence atom counts must be even.

  2. Total valence >= 2 * max single-atom valence.

  3. Total valence >= 2 * (number of atoms - 1).

Parameters:

formula – Molecular formula string, e.g. ‘C6H12O6’.

Returns:

True if all three rules are satisfied.

Return type:

bool

msfiddle.utils.refine_utils.exceed_refine_atom_limit(refined_counts, formulas, refine_atom_type, refine_atom_num)

Check if the refined formula exceeds the refinement limit for a given atom type.

msfiddle.utils.refine_utils.candidate_formulas_generation(f, M, f0_list, refine_atom_type, refine_atom_num)

Generate neighboring formula candidates by adding or removing one heavy atom.

For each non-H atom in refine_atom_type, produces two candidates: one with the atom count incremented by 1 and one decremented by 1. After each change, hydrogen count is adjusted to match the target mass M. Candidates that exceed the per-atom refinement limits relative to any formula in f0_list are excluded. Results are sorted by proximity to M.

Parameters:
  • f – Current formula string to expand neighbors from.

  • M – Target monoisotopic mass in Da.

  • f0_list – List of initial predicted formula strings (used for limit checking).

  • refine_atom_type – List of atom symbols to vary, e.g. [‘C’, ‘N’, ‘O’, …].

  • refine_atom_num – Per-atom maximum deviation allowed; -1 means no limit.

Returns:

Candidate formula strings sorted by abs(mass - M).

Return type:

list[str]

msfiddle.utils.refine_utils.formula_refinement(f0_list, M, delta_M, ppm_mode, K, D, T, refine_atom_type, refine_atom_num)

Refine predicted formulas via a best-first search around the target mass.

Starting from one or more predicted formulas (f0_list), performs a guided search by iteratively expanding neighbors (±1 heavy atom, H adjusted to fit M). Candidates are accepted when their monoisotopic mass falls within delta_M of M and they pass the SENIOR valence rules. Search depth is bounded by D, and the search exits early if K results are found or the timeout T is exceeded.

Parameters:
  • f0_list – List of initial predicted formula strings.

  • M – Target monoisotopic mass in Da.

  • delta_M – Mass tolerance (Da if ppm_mode=False, ppm if ppm_mode=True).

  • ppm_mode – If True, interpret delta_M as ppm and convert to Da using M.

  • K – Maximum number of refined formulas to return.

  • D – Maximum search depth (number of atom-change steps from f0).

  • T – Timeout in seconds; set to 0 to disable.

  • refine_atom_type – List of atom symbols to vary during search.

  • refine_atom_num – Per-atom maximum deviation from f0; -1 means no limit.

Returns:

{‘formula’: list[str | None], ‘mass’: list[float | None]}, each of

length K, padded with None if fewer than K formulas are found. Results are sorted by abs(mass - M).

Return type:

dict