Stochasticity is a dominant feature of natural dynamical phenomena. Maximum likelihood estimation (MLE) is standard for stochastic model inference, but MLE converges to the true parameter values only in the large sample limit. When the data is insufficient, as in neuronal dynamics, or the observed stochastic dynamics are modulated by time-varying deterministic trends, as in biology or financial markets, alternatives to MLE are required. Success in modeling complex phenomena such as human perception hinges critically on the availability of data and computational power. Significant progress has been made in modeling such phenomena using probabilistic methods, particularly in image analysis and speech recognition. Maximum Likelihood Estimation (MLE) combined with Bayesian model selection is the basis of much of this progress, as MLE converges to the true model with copious data. In the sciences, large enough datasets are rarae aves, so alternatives to MLE must be developed for small sample size. We introduce a data-driven statistical physics approach to model inference based on minimizing a free energy of data and show superior model recovery for small sample sizes. We demonstrate coupling strength inference in non-equilibrium kinetic Ising models, including in the difficult large coupling variability regime, and show scaling to systems of arbitrary size. As applications, we infer a functional connectivity network in the salamander retina and a currency exchange rate network from time-series data of neuronal spiking and currency exchange rates, respectively. We show as a proof of principle that accurate small sample size inference is critical for devising a profitable currency hedging strategy.