API Reference
Prediction API
- class msfiddle.api.MsFiddlePredictor(*, instrument_type='orbitrap', device='cpu', no_cuda=False, config_path=None, resume_path=None, rescore_resume_path=None, download_models=False, batch_size=32, verbose=False)
Reusable predictor for msfiddle formula inference.
Instantiate this class once in long-running Python applications to avoid loading model checkpoints for each spectrum.
- predict_spectrum(mz_array, intensity_array, precursor_mz, adduct, *, top_k=5, collision_energy='Unknown')
Predict formula candidates for one native MS/MS spectrum.
- predict_batch(spectra, *, top_k=5, batch_size=None)
Predict formula candidates for multiple native MS/MS spectra.
Each spectrum mapping must contain
mz_array,intensity_array,precursor_mz, andadduct. Optional keys arecollision_energyandid.
- predict_mgf(test_data_path, *, buddy_path='', sirius_path='', top_k=None, batch_size=None)
Predict formulas for spectra in an MGF file.
buddy_pathandsirius_pathmay point to either native/original BUDDY/msbuddy and SIRIUS formula-identification outputs or legacy msfiddle-normalized CSV files.The returned DataFrame uses the same result columns as the CLI CSV.
- Return type:
DataFrame
- msfiddle.api.predict_from_spectrum(mz_array, intensity_array, precursor_mz, adduct, *, top_k=5, instrument_type='orbitrap', collision_energy='Unknown', device='cpu', no_cuda=False, config_path=None, resume_path=None, rescore_resume_path=None, download_models=False, batch_size=32)
Predict formula candidates for one native MS/MS spectrum.
- msfiddle.api.predict_batch_from_spectra(spectra, *, top_k=5, instrument_type='orbitrap', device='cpu', no_cuda=False, config_path=None, resume_path=None, rescore_resume_path=None, download_models=False, batch_size=32)
Predict formula candidates for multiple native MS/MS spectra.
- msfiddle.api.predict_from_mgf(test_data_path, *, instrument_type='orbitrap', buddy_path='', sirius_path='', device='cpu', no_cuda=False, config_path=None, resume_path=None, rescore_resume_path=None, download_models=False, top_k=None, batch_size=32)
Predict formulas for spectra in an MGF file.
- Return type:
DataFrame
Molecular Utilities
- msfiddle.utils.mol_utils.monoisotopic_mass_calculator(x, mode)
Calculate the monoisotopic mass from a molecule or formula string.
- Parameters:
x – RDKit Mol object (mode=’mol’) or molecular formula string (mode=’f’).
mode – ‘mol’ to accept an RDKit Mol, ‘f’ to accept a formula string.
- Returns:
Monoisotopic mass in Da.
- Return type:
- msfiddle.utils.mol_utils.dict_to_formula(formula_dict)
Convert an atom-count dict back to a molecular formula string.
- Parameters:
formula_dict – Mapping of atom symbol to count, e.g. {‘C’: 10, ‘H’: 12}.
- Returns:
Molecular formula string, e.g. ‘C10H12’.
- Return type:
- msfiddle.utils.mol_utils.formula_to_dict(formula)
Parse a molecular formula string into an atom-count dict.
- Parameters:
formula – Molecular formula string, e.g. ‘C10H12N2O3’.
- Returns:
- Mapping of atom symbol to count, e.g. {‘C’: 10, ‘H’: 12, …}.
Returns {} if formula is not a string.
- Return type:
- msfiddle.utils.mol_utils.formula_to_vector(formula)
Convert a molecular formula to a fixed-length atom-count vector.
The vector order follows ATOMS_INDEX (C, H, O, N, F, S, Cl, P, B, Br, I, Na, K).
- msfiddle.utils.mol_utils.vector_to_formula(vec, withH=True)
Convert a fixed-length atom-count vector back to a formula string.
- Parameters:
vec – Sequence of atom counts aligned with ATOMS_INDEX.
withH – If False, hydrogen is omitted from the output string.
- Returns:
Molecular formula string.
- Return type:
MS/MS Utilities
- msfiddle.utils.msms_utils.sdf2mgf(path, prefix)
Read an SDF file and convert each record to an MGF-style spectrum dict.
- Parameters:
path – Path to the SDF file.
prefix – String prefix for spectrum titles. Each spectrum is titled
<prefix>_<index>.
- Returns:
- Each dict has keys
params(metadata),m/z array, and
intensity array. Records missing required SDF properties (MASS SPECTRAL PEAKS, PRECURSOR TYPE, PRECURSOR M/Z, SPECTRUM TYPE, COLLISION ENERGY, ION MODE) are skipped.
- Each dict has keys
- Return type:
- msfiddle.utils.msms_utils.filter_spec(spectra, config, type2charge)
Filter and clean a list of spectra according to a configuration dict.
Applies sequential filters: instrument type/name, MS level, atom count/type, precursor type, peak count, m/z range, and ppm mass error.
- Parameters:
spectra – List of spectra dicts in MGF format.
config – Dict of filter thresholds (keys: ‘instrument_type’, ‘ms_level’, ‘atom_type’, ‘precursor_type’, ‘min_peak_num’, ‘min_mz’, ‘max_mz’, ‘ppm_tolerance’, etc.).
type2charge – Dict mapping precursor type string to charge int.
- Returns:
(clean_spectra, smiles_list) — filtered spectra and their SMILES.
- Return type:
- msfiddle.utils.msms_utils.simulate_experimental_mz(theoretical_mz, relative_mass_tolerance_ppm)
Simulate experimental precursor m/z by shifting the theoretical value within a Gaussian distribution of mass deviations.
- Parameters:
theoretical_mz – Theoretical precursor m/z value.
relative_mass_tolerance_ppm – Relative mass tolerance in ppm.
- Returns:
Simulated experimental precursor m/z value.
- Return type:
- msfiddle.utils.msms_utils.ce2nce(ce, precursor_mz, charge)
Convert absolute collision energy (eV) to normalized collision energy (NCE).
- Parameters:
ce – Collision energy in eV.
precursor_mz – Precursor m/z value.
charge – Precursor charge state (int, 1–8).
- Returns:
Normalized collision energy (dimensionless).
- Return type:
- msfiddle.utils.msms_utils.precursor_mz_calculator(precursor_type, mass)
Compute the expected precursor m/z from neutral monoisotopic mass.
- Parameters:
precursor_type – Adduct string, e.g. ‘[M+H]+’.
mass – Neutral monoisotopic mass in Da.
- Returns:
Theoretical precursor m/z.
- Return type:
- Raises:
ValueError – If precursor_type is not supported.
- msfiddle.utils.msms_utils.mass_calculator(precursor_type, precursor_mz)
Back-calculate neutral monoisotopic mass from observed precursor m/z.
The inverse of precursor_mz_calculator.
- Parameters:
precursor_type – Adduct string, e.g. ‘[M+H]+’.
precursor_mz – Observed precursor m/z.
- Returns:
Neutral monoisotopic mass in Da.
- Return type:
- Raises:
ValueError – If precursor_type is not supported.
Formula Refinement
- msfiddle.utils.refine_utils.passes_senior_rule(formula)
Check whether a molecular formula satisfies the SENIOR valence rules.
- The three SENIOR rules are:
The sum of odd-valence atom counts must be even.
Total valence >= 2 * max single-atom valence.
Total valence >= 2 * (number of atoms - 1).
- Parameters:
formula – Molecular formula string, e.g. ‘C6H12O6’.
- Returns:
True if all three rules are satisfied.
- Return type:
- msfiddle.utils.refine_utils.exceed_refine_atom_limit(refined_counts, formulas, refine_atom_type, refine_atom_num)
Check if the refined formula exceeds the refinement limit for a given atom type.
- msfiddle.utils.refine_utils.candidate_formulas_generation(f, M, f0_list, refine_atom_type, refine_atom_num)
Generate neighboring formula candidates by adding or removing one heavy atom.
For each non-H atom in refine_atom_type, produces two candidates: one with the atom count incremented by 1 and one decremented by 1. After each change, hydrogen count is adjusted to match the target mass M. Candidates that exceed the per-atom refinement limits relative to any formula in f0_list are excluded. Results are sorted by proximity to M.
- Parameters:
f – Current formula string to expand neighbors from.
M – Target monoisotopic mass in Da.
f0_list – List of initial predicted formula strings (used for limit checking).
refine_atom_type – List of atom symbols to vary, e.g. [‘C’, ‘N’, ‘O’, …].
refine_atom_num – Per-atom maximum deviation allowed; -1 means no limit.
- Returns:
Candidate formula strings sorted by abs(mass - M).
- Return type:
- msfiddle.utils.refine_utils.formula_refinement(f0_list, M, delta_M, ppm_mode, K, D, T, refine_atom_type, refine_atom_num)
Refine predicted formulas via a best-first search around the target mass.
Starting from one or more predicted formulas (f0_list), performs a guided search by iteratively expanding neighbors (±1 heavy atom, H adjusted to fit M). Candidates are accepted when their monoisotopic mass falls within delta_M of M and they pass the SENIOR valence rules. Search depth is bounded by D, and the search exits early if K results are found or the timeout T is exceeded.
- Parameters:
f0_list – List of initial predicted formula strings.
M – Target monoisotopic mass in Da.
delta_M – Mass tolerance (Da if ppm_mode=False, ppm if ppm_mode=True).
ppm_mode – If True, interpret delta_M as ppm and convert to Da using M.
K – Maximum number of refined formulas to return.
D – Maximum search depth (number of atom-change steps from f0).
T – Timeout in seconds; set to 0 to disable.
refine_atom_type – List of atom symbols to vary during search.
refine_atom_num – Per-atom maximum deviation from f0; -1 means no limit.
- Returns:
- {‘formula’: list[str | None], ‘mass’: list[float | None]}, each of
length K, padded with None if fewer than K formulas are found. Results are sorted by abs(mass - M).
- Return type: