Bayesian Statistics — Applied

Summer School for AstroStatistics in Sharjah 2025
This notebook contains original work by the authors unless stated otherwise. Any external material is properly credited to its sources.
References to papers, datasets, and software are acknowledged. Original content is licensed under the GNU General Public License v3.0 (GNU GPLv3).

Bayesian Statistics — Applied#

import requests

url = "https://raw.githubusercontent.com/AstroStat-Academy/assets-public/main/styles/plot_style.py"
style = requests.get(url).text
exec(style)

import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

\[ P(\textrm{hypothesis} | \textrm{data}, I) = \dfrac{P(\textrm{data} | \textrm{hypothesis}, I) P(\textrm{hypothesis} | I)}{P(\textrm{data} | I)} \]

1. Fitting a line#

In astronomy, a lot of information about the physical and emission properties of astronomical objects is accessed through spectroscopic measurements, i.e., the emission intensity as a function of wavelength/frequency. Interesting spectral features are emission or absorption lines. Naturally, the lines are not… lines, but narrow peaks! A true line is impossible due to quantum effects, but in addition, local/extended effects tend to “broaden” or alter the shape of spectal lines (e.g., thermal doppler, pressure, rotation of object,

\[ y(x) = \dfrac{A}{1 + \left(\dfrac{x - x_0}{w}\right)^2} \qquad \text{(Lorentzian or Cauchy model)} \]

\[ y(x) = A \exp{\left[-\frac{\left(x - x_0\right)^2}{2 w^2}\right]} \qquad \text{(Gaussian or Normal model)} \]

where

$A$ is the amplitude (notice that $y(x_0) = A$ in both cases),
$x_0$ is the location parameter (the center of the line), and
$w$ is the “spread” or width of the line.

Notice that here we use the word Cauchy and Gaussian to refer to non-linear models rather than distributions. Our data follows the shape of the PDFs of these distributions, not the distributions themselves!

Now, let’s assume we have measured the intensitities $y_i$ at given (unitless) wavelengths $x_i$, and that the errors $e_i$ are normally distributed with standard deviation $\sigma$:

\[ y_i = y(x_i) + \epsilon_i, \qquad \epsilon_i \sim \text{Norm}(0, \sigma)\]

1.1. The data from an unknown distribution#

np.random.seed(2024)

def make_data(x, model_dist, amplitude, location, width, error_scale=0.0):
    """Make a spectral line following the PDF of a given distribution.
    
    x           : the wavelength
    model_dist  : the distribution of which the PDF will be used
    amplitude   : the maximum height of the spectral line
    location    : the center of the spectral line
    width       : the width of the spectral line
    error_scale : the standard deviation of the observational uncertainties
    
    NOTE: use default `error_scale` (0.0) to get a model instead of an observational sample.
    
    """
    distribution = model_dist(loc=location, scale=width)
    y = distribution.pdf(x)
    y = amplitude * y / np.max(y)
    if error_scale > 0:
        y = np.random.normal(y, scale=error_scale)
    y_err = np.ones_like(y) * error_scale
    return y, y_err


n_data = 20

# two potential distributions
model_distributions = [st.cauchy, st.norm]
model_distributions_names = ["Cauchy", "Normal"]

# select pseudo-randomly the true distribution
random_index = 2023 % 17 % 2
true_model_distribution = model_distributions[random_index]
true_model_distribution_name = model_distributions_names[random_index]

# Permitted ranges for the parameters (because we processed the data and we have some intuition)
MIN_AMPLITUDE = 0.9
MAX_AMPLITUDE = 1.1
MIN_LOCATION = 0.45
MAX_LOCATION = 0.55
MIN_WIDTH = 0.05
MAX_WIDTH = 0.15

# Select the true parameters (these are supposed to be hidden from us)
true_amplitude = np.random.uniform(MIN_AMPLITUDE, MAX_AMPLITUDE)
true_location = np.random.uniform(MIN_LOCATION, MAX_LOCATION)
true_width = np.random.uniform(MIN_WIDTH, MAX_WIDTH)

# Make the new data according to the true model
x_data = np.linspace(0.0, 1.0, n_data) + np.random.uniform(-0.5/n_data, 0.5/n_data, size=n_data)
y_data, e_data = make_data(x_data, true_model_distribution, amplitude=true_amplitude, location=true_location, width=true_width, error_scale=0.1)

# Plot them!
plt.figure()
plt.errorbar(x_data, y_data, yerr=e_data, fmt="k.", capsize=4, label="Data")
plt.legend(loc="upper right")
plt.show()

1.2. Overplotting two potential models using fiducial parameters#

# use the midpoints of the permitted ranges of the parameters
fiducial_amplitude = (MIN_AMPLITUDE + MAX_AMPLITUDE) / 2.0
fiducial_location = (MIN_LOCATION + MAX_LOCATION) / 2.0
fiducial_width = (MIN_WIDTH + MAX_WIDTH) / 2.0

plt.figure()
plt.plot(x_data, y_data, "ks", mfc="none", label="Data")

for model_distribution in model_distributions:
    x_plot = np.linspace(0, 1, 100)
    y_plot, _ = make_data(x_plot, 
                          model_distribution,            # try this model
                          amplitude=fiducial_amplitude,  # fiducial...
                          location=fiducial_location,    # ...parameters...
                          width=fiducial_width,          # ...from ranges
                          error_scale=0.0                # the prediction does not have uncertainty
                         )
    plt.plot(x_plot, y_plot, label=model_distribution.name)

plt.legend(loc="upper right")
plt.show()

In-class discussion: The plotted models use fiducial values for the parameters. However to you think one fits best than the other?

Discuss with your teammate, then report.

[Spoiler]

It's purely subjective at this point!

1.3. Defining the prior, likelihood and posterior assuming Gaussian profile…#

Reminder: we operate in log-space (log-prior, log-likelihood, log-posterion) for numerical reasons.

Task: Complete the functions. Hint: the make_data function can be used to get model predictions as well!

Warning: the function returns two things!

def ln_prior(amplitude, location, width):
    ...

def ln_likelihood_norm(amplitude, location, width):
    ...

def ln_posterior_norm(amplitude, location, width):
    ...

[Solution below]

def ln_prior(amplitude, location, width):
    if MIN_AMPLITUDE < amplitude < MAX_AMPLITUDE and MIN_LOCATION < location < MAX_LOCATION and MIN_WIDTH < width < MAX_WIDTH:
        return 0.0
    return -np.inf

def ln_likelihood_norm(amplitude, location, width):
    y_pred, _ = make_data(x_data, model_dist=st.norm, amplitude=amplitude, location=location, width=width)
    chi2 = np.sum((y_data - y_pred) ** 2.0 / e_data ** 2.0)
    return -chi2 / 2.0

def ln_posterior_norm(amplitude, location, width):
    return ln_prior(amplitude, location, width) + ln_likelihood_norm(amplitude, location, width)

[End of solution]

(Reminder) From cell above:

# Permitted ranges for the parameters (because we processed the data and we have some intuition)
MIN_AMPLITUDE = 0.9
MAX_AMPLITUDE = 1.1
MIN_LOCATION = 0.45
MAX_LOCATION = 0.55
MIN_WIDTH = 0.05
MAX_WIDTH = 0.15

# Our observed data
x_data = np.linspace(0.0, 1.0, n_data) + np.random.uniform(-0.5/n_data, 0.5/n_data, size=n_data)
y_data, e_data = make_data(x_data, true_model_distribution, amplitude=true_amplitude, location=true_location, width=true_width, error_scale=0.1)

(Reminder): From MLE notebook:

Under the assumption of Gaussian uncertainties and independence of data, $$ \ln L = \text{constant} - \frac{1}{2} \sum_{i=1}^{N} {\dfrac{(y_i-f(x_i))^2}{\sigma_i^2}} = \text{constant} - \frac{\chi^2}{2} $$

1.4. Maximizing the posterior#

Tasks:

Define the function to be minimized in order to maximize the posterior.
Choose appropriate starting values for the minimization routine.

def neg_ln_posterior_norm(theta):
    amplitude, location, width = theta
    return ...

min_result_norm = minimize(neg_ln_posterior_norm, x0=[..., ..., ...], method='Nelder-Mead')
est_amplitude, est_location, est_width = min_result_norm.x


print(min_result_norm)
print()
print("| PARAMETER  |  ESTIMATION  |  TRUTH  |")
print(f"| amplitude  | {est_amplitude:11.3f}  | {true_amplitude:6.3f}  |")
print(f"| location   | {est_location:11.3f}  | {true_location:6.3f}  |")
print(f"| width      | {est_width:11.3f}  | {true_width:6.3f}  |")
print()
print("At best-fitting values...")
lnL_norm = ln_likelihood_norm(*min_result_norm.x)
lnP_norm = ln_posterior_norm(*min_result_norm.x)
print(f"  * log-prior      : {ln_prior(*min_result_norm.x):.6f}")
print(f"  * log-likelihood : {lnL_norm:.6f}")
print(f"  * log-posterior  : {lnP_norm:.6f}")

[Solution below]

def neg_ln_posterior_norm(theta):
    amplitude, location, width = theta
    return -ln_posterior_norm(amplitude=amplitude, location=location, width=width)

min_result_norm = minimize(neg_ln_posterior_norm, x0=[fiducial_amplitude, fiducial_location, fiducial_width], method='Nelder-Mead')
est_amplitude, est_location, est_width = min_result_norm.x


print(min_result_norm)
print()
print("| PARAMETER  |  ESTIMATION  |  TRUTH  |")
print(f"| amplitude  | {est_amplitude:11.3f}  | {true_amplitude:6.3f}  |")
print(f"| location   | {est_location:11.3f}  | {true_location:6.3f}  |")
print(f"| width      | {est_width:11.3f}  | {true_width:6.3f}  |")
print()
print("At best-fitting values...")
lnL_norm = ln_likelihood_norm(*min_result_norm.x)
lnP_norm = ln_posterior_norm(*min_result_norm.x)
print(f"  * log-prior      : {ln_prior(*min_result_norm.x):.6f}")
print(f"  * log-likelihood : {lnL_norm:.6f}")
print(f"  * log-posterior  : {lnP_norm:.6f}")

[End of solution]

In-class discussion: The fit was successful and we got parameters close to the truth. Is the Gaussian model validated?

Discuss with your teammate, then report.

[Spoiler]

As in hypothesis testing, the model is assumed to be true. The fitting process does not validate the model. The value of the log-posterior does not convey any information regarding the validity of the model.

1.5. Repeating using a Cauchy profile#

Tasks: Do the same steps for the Cauchy profile.

Complete the functions.
Choose appropriate starting values for the minimization routine.

def ln_likelihood_cauchy(amplitude, location, width):
    ...

def ln_posterior_cauchy(amplitude, location, width):
    ...

def neg_ln_posterior_cauchy(theta):
    amplitude, location, width = theta
    return ...

min_result_cauchy = minimize(neg_ln_posterior_cauchy, x0=[..., ..., ...], method='Nelder-Mead')
est_amplitude, est_location, est_width = min_result_cauchy.x

print(min_result_cauchy)
print()
print("| PARAMETER  |  ESTIMATION  |  TRUTH  |")
print(f"| amplitude  | {est_amplitude:11.3f}  | {true_amplitude:6.3f}  |")
print(f"| location   | {est_location:11.3f}  | {true_location:6.3f}  |")
print(f"| width      | {est_width:11.3f}  | {true_width:6.3f}  |")
print()
print("At best-fitting values...")
lnL_cauchy = ln_likelihood_cauchy(*min_result_cauchy.x)
lnP_cauchy = ln_posterior_cauchy(*min_result_cauchy.x)
print(f"  * log-prior      : {ln_prior(*min_result_cauchy.x):.6f}")
print(f"  * log-likelihood : {lnL_cauchy:.6f}")
print(f"  * log-posterior  : {lnP_cauchy:.6f}")

[Solution below]

def ln_likelihood_cauchy(amplitude, location, width):
    y_pred, _ = make_data(x_data, model_dist=st.cauchy, amplitude=amplitude, location=location, width=width)
    chi2 = np.sum((y_data - y_pred) ** 2.0 / e_data ** 2.0)
    return -chi2 / 2.0

def ln_posterior_cauchy(amplitude, location, width):
    return ln_prior(amplitude, location, width) + ln_likelihood_cauchy(amplitude, location, width)

def neg_ln_posterior_cauchy(theta):
    amplitude, location, width = theta
    return -ln_posterior_cauchy(amplitude=amplitude, location=location, width=width)

min_result_cauchy = minimize(neg_ln_posterior_cauchy, x0=[fiducial_amplitude, fiducial_location, fiducial_width], method='Nelder-Mead')
est_amplitude, est_location, est_width = min_result_cauchy.x

print(min_result_cauchy)
print()
print("| PARAMETER  |  ESTIMATION  |  TRUTH  |")
print(f"| amplitude  | {est_amplitude:11.3f}  | {true_amplitude:6.3f}  |")
print(f"| location   | {est_location:11.3f}  | {true_location:6.3f}  |")
print(f"| width      | {est_width:11.3f}  | {true_width:6.3f}  |")
print()
print("At best-fitting values...")
lnL_cauchy = ln_likelihood_cauchy(*min_result_cauchy.x)
lnP_cauchy = ln_posterior_cauchy(*min_result_cauchy.x)
print(f"  * log-prior      : {ln_prior(*min_result_cauchy.x):.6f}")
print(f"  * log-likelihood : {lnL_cauchy:.6f}")
print(f"  * log-posterior  : {lnP_cauchy:.6f}")

[End of solution]

In-class discussion: What can you infer from the comparison between the resulting when assuming normal vs. Cauchy?

Discuss with your teammate, then report.

[Spoiler]

The parameters are in both cases close to the truth. Whether they are closer or not, in practice, we cannot use in real data because we don't know the truth! The likelihood and posterior is, however, different! Maybe we can use this?

2. Selecting models#

If we don’t know the correct model for the data ($D$), then we have a model selection problem. We should come up with all potential models, or at least those expected from our prior experience with the data and the underlying mechanisms decribing them.

In the case above we have the Normal and Cuachy model. Let’s name them A and B respectively. We can compare the posteriors by taking their ratio:

Posterior odds: $ \dfrac{P(A|D)}{P(B|D)} $

Using the Bayes rule we can express it in the following way:

\[ \dfrac{P(A|D)}{P(B|D)} = \dfrac{P(D|A)P(A) / P(D)}{P(D|B)P(B) / P(D)} = \dfrac{P(D|A)P(A)}{P(D|B)P(B)}\]

2.1. Likelihood ratio (prior-independent)#

As we can see, the results depend on our prior belief on the models. In many cases, we want to be completely fair, and therefore we assign equal prior to both of them, resulting in:

\[ \dfrac{P(D|A)}{P(D|B)} \]

which is also used by frequentists under the name likelihood ratio statistic:

\[ \mathrm{LR}_{AB} = \dfrac{L_A}{L_B} = e^{l_A - l_B} \]

where the $l_A$ and $l_B$ are the log-likelihoods for convenience.

The larger the likelihood ratio, the more preferred Model A is with respect to Model B

Connection to comparison of $\chi^2$ values#

Under the assumption of Gaussian errors, then our log-likelihood is: (some constant) $-\chi^2$. Therefore, if we compute the $\chi^2$ values of the best fitting parameters for our models A and B, then the likelihood ratio is:

\[ \mathrm{LR}_{AB} = e^{l_A - l_B} = e^{-\chi_A^2+\chi_B^2} \]

so the model with smallest $\chi^2$ is preferred.

2.2. Taking into account the flexibility of the models#

In classical statistics, we don’t compare $\chi^2$ values, but the reduced-$\chi^2$ which is divided by the degrees of freedom (number of data point - number of model parameters) to penalize complicated models that can fit the data easily without necessarily being the true model. In Bayesian statistics there are varous tools:

The Akaike Information Criterion#

If we use a model to represent the data, we lose information! There is structure/noise/etc. that we have lost! AIC measures the amount of information that is lost, relative to another model.

A good model “extracts” or “represents” most of the information from a system, or… it maximizes its entropy! The AIC is the application of the Second Law of Thermodynamics on statistics using information theory (cf. Shannon’s information entropy).

\[ \text{AIC} = 2k - 2\ln L \]

where $k$ is the number of parameters of the model (if we had to estimate them from the data), and $L$ is the likelihood of the data according to the model.

The larger the AIC, the more information is lost, so the worst for our model!

k is a penalty term#

Using a 100-degree polynomial or a k-nearest neighbor interpolator, we could capture all trends in the data. However, this is just shifting all the information into parameters - it’s not fair to compare something like that against a linear model!

The Bayesian Information Criterion#

\[ \text{BIC} = k \ln N - 2\ln L \]

where $N$ is the size of the sample used to calculate the likelihood.

AIC vs. BIC: it’s complicated…

As in AIC, the larger the BIC, the more information is lost, so the worst for our model!

Bayesian Factors#

Like the likelihood ratio, but here, the likelihoods are not that of the best fit. It takes into account all possible values for the parameters (which can be different in each model), $\theta$!

\[ \text{K} = \dfrac{P(D|A)}{P(D|B)} = \dfrac{\int P(\theta_A) P(D|\theta_A, A) d\theta} {\int P(\theta_B) P(D|\theta_B, B) d\theta} \]

As in the likelihood ratio, the larger the value, the better for Model A compared to Model B!

The Jeffreys’ scale#

Bayes Factor	Strength of evidence
1 - 3.2	Not worth more than a bare mention
3.2 - 10	Substantial
10 - 100	Strong
>100	Decisive

2.3. Let’s compare the AICs, BICs#

Task: Calculate the AICs and BICs, and decide which model won!

AIC_norm = ...
AIC_cauchy = ...

BIC_norm = ...
BIC_cauchy = ...

print("AIC (norm, cauchy):", AIC_norm, AIC_cauchy)
print("BIC (norm, cauchy):", BIC_norm, BIC_cauchy)
print("Posterior ratio (norm / cauchy)  =", np.exp(lnP_norm - lnP_cauchy))
print("Posterior ratio (cauchy / norm)  =", np.exp(lnP_cauchy - lnP_norm))

print("And the truth is.... (drum roll)... The", true_model_distribution_name, "distribution!")

[Solution below]

AIC_norm = 2 * 3 - 2 * lnL_norm
AIC_cauchy = 2 * 3 - 2 * lnL_cauchy

BIC_norm = 2 * 3 * np.log(n_data) - 2 * lnL_norm
BIC_cauchy = 2 * 3 * np.log(n_data) - 2 * lnL_cauchy

print("AIC (norm, cauchy):", AIC_norm, AIC_cauchy)
print("BIC (norm, cauchy):", BIC_norm, BIC_cauchy)
print("Posterior ratio (norm / cauchy)  =", np.exp(lnP_norm - lnP_cauchy))
print("Posterior ratio (cauchy / norm)  =", np.exp(lnP_cauchy - lnP_norm))

print("And the truth is.... (drum roll)... The", true_model_distribution_name, "distribution!")

[End of solution]

3. Helping model selection#

In-class discussion: What could you change to increase the contrast between the models??

Discuss with your teammate, then report.

[Spoiler]

Since the models cannot change, data have to. Only if we get more data we can be more confident in selecting models. Increase the size of the synthetic data and try again!

4. Computing the Bayes Factor#

Bayes Factors are agnostic about the location of the posterior maximum! They use the marginal likelihood or the evidence: the integral of the likelihood over all possible values of the parameters.

This makes them hard to compute, especially in complex, multi-dimensional parameter spaces. Thankfully, there are techniques like Monte Carlo integration that can help us.

Essentially, instead of integrating a function over $x$, we use the sum of uniform samples of $x$:

\[ \int\limits_a^b f(x) dx \approx \frac{b-a}{N} \sum\limits_{i=1}^N f(x_i) \]

or in $k$ dimensions, a sum over the whole volume $V$ of the multi-dimensional parameter space $\Omega$:

\[ \int_\Omega f(\vec{x}) d\vec{x} \approx \frac{V}{N} \sum\limits_{i=1}^N f(\vec{x}_i)\]

Tasks:

Choose a sample size for the Monte Carlo calculation.
Calculate the Bayes factor.
Which model is preferred? Does this agree with AIC and BIC?

sample_size = ...
amp_samples = np.random.uniform(MIN_AMPLITUDE, MAX_AMPLITUDE, size=sample_size)
loc_samples = np.random.uniform(MIN_LOCATION, MAX_LOCATION, size=sample_size)
wid_samples = np.random.uniform(MIN_WIDTH, MAX_WIDTH, size=sample_size)

# use no normalization factor (watch out for the NaNs)
ln_norm_factor = 0.0
print("Normalization factor:", ln_norm_factor)

sum_norm = 0.0
sum_cauchy = 0.0

for amp, loc, wid in zip(amp_samples, loc_samples, wid_samples):
    sum_norm += ...
    sum_cauchy += ...
    
K = sum_norm / sum_cauchy
print(f"K (norm / cauchy): {K:.4g}")
print(f"K (cauchy / norm): {1.0/K:.4g}")

[Solution below]

sample_size = 10000
amp_samples = np.random.uniform(MIN_AMPLITUDE, MAX_AMPLITUDE, size=sample_size)
loc_samples = np.random.uniform(MIN_LOCATION, MAX_LOCATION, size=sample_size)
wid_samples = np.random.uniform(MIN_WIDTH, MAX_WIDTH, size=sample_size)

# use no normalization factor (watch out for the NaNs)
# ln_norm_factor = 0.0
# use the first random point to get the normalization factor
ln_norm_factor = ln_likelihood_norm(amp_samples[0], loc_samples[0], wid_samples[0])
print("Normalization factor:", ln_norm_factor)

sum_norm = 0.0
sum_cauchy = 0.0

for amp, loc, wid in zip(amp_samples, loc_samples, wid_samples):
    sum_norm += np.exp(ln_likelihood_norm(amp, loc, wid) - ln_norm_factor)
    sum_cauchy += np.exp(ln_likelihood_cauchy(amp, loc, wid) - ln_norm_factor)
    
K = sum_norm / sum_cauchy
print(f"K (norm / cauchy): {K:.4g}")
print(f"K (cauchy / norm): {1.0/K:.4g}")

[End of solution]

5. Discussion: Bayesian vs. Frequentist#

In-class discussion: What are the main differences between the two approaches? Which one we prefer in astronomy/cosmology?

5.1. Philosophical Foundations#

Aspect	Frequentist Perspective	Bayesian Perspective
What is probability?	Long-run relative frequency in repeated, identical experiments	Rational degree of belief, or a quantification of uncertainty, given current information
Unknown parameters	Fixed but unknown	Random variables with probability distributions
Data	Random (from repeated sampling)	Fixed (once observed)
Goal	Procedures that perform well under repetition	Reasonable inference given the data at hand

Frequentist inference asks: “How would this method perform if we repeated the experiment many times?”
Bayesian inference asks: “Given the data I observed, what do I believe about the parameters?”

5.2. Priors: The Core Divergence#

Bayesian: Introduces priors to express existing knowledge or ignorance.
Frequentist: No priors allowed—everything must come from the data.

Debate: Are priors a strength or a weakness?

Bayesian view: Priors encode assumptions transparently—and they are always present, implicitly or explicitly.
Frequentist view: Priors are subjective and make results “non-objective”.

Thought experiment: The librarian vs the farmer

You observe someone who is shy, bookish, and wears glasses. What’s the probability they are a librarian vs a farmer? Likelihood favors librarian: that description matches stereotypes. Prior tells you: there are many more farmers than librarians. ➡ A frequentist might ignore the base rate (prior). ➡ A Bayesian correctly updates prior beliefs using new data.

Sometimes data alone isn’t enough, especially with rare events.

In astronomy, priors are often available and meaningful (e.g., stellar populations, cosmological parameters, object brightness, etc.).

5.3. Interpretation of Results#

Inference Type	Frequentist Example	Bayesian Equivalent
Confidence Interval	“If we repeated the experiment many times, 95% of the intervals would contain the true value.”	“Given the observed data, there is a 95% probability that the true value lies in this interval.”
Hypothesis Testing	p-value: probability of data at least as extreme if null is true	Posterior probability of hypothesis (requires model comparison)

➡ Confidence intervals ≠ credible intervals. ➡ p-values ≠ probabilities of hypotheses.

5.4. Why Astronomy Prefers Bayesian Statistics#

Unrepeatable experiments:
- We only observe one universe, one supernova, one gravitational lens system.
- Frequentist “long-run” logic doesn’t apply.
- Bayesian reasoning is well-suited to one-off inference.
Complex models:
- Hierarchical models, forward models, simulators.
- Easy to implement in Bayesian frameworks (via priors, marginalization, MCMC).
Incomplete data and selection effects:
- Bayesian inference naturally handles censoring, missing data, and selection functions (e.g., Malmquist bias).
Combination of datasets:
- Priors from one dataset naturally become posteriors for another.
- Elegant propagation of knowledge between instruments (e.g., Gaia + JWST + Euclid).
Uncertainty quantification:
- Full posterior distributions give more insight than point estimates or CI.
- Crucial for scientific interpretation.

Summary: Why We Prefer Bayesian Methods in Astronomy#

Reason	Comment
One-shot observations	Bayesian works on the actual data, not hypothetical repeats
Prior knowledge matters	Encoded naturally and transparently
Complex, hierarchical models	Straightforward with Bayesian approach
Full uncertainty quantification	Posterior gives rich information
Model comparison and selection	Bayes factors penalize overfitting appropriately

###EOF

Bayesian Statistics — Applied

Contents

Bayesian Statistics — Applied#

1. Fitting a line#

1.1. The data from an unknown distribution#

1.2. Overplotting two potential models using fiducial parameters#

1.3. Defining the prior, likelihood and posterior assuming Gaussian profile…#

1.4. Maximizing the posterior#

1.5. Repeating using a Cauchy profile#

2. Selecting models#

2.1. Likelihood ratio (prior-independent)#

Connection to comparison of \(\chi^2\) values#

2.2. Taking into account the flexibility of the models#

The Akaike Information Criterion#

k is a penalty term#

The Bayesian Information Criterion#

Bayesian Factors#

The Jeffreys’ scale#

2.3. Let’s compare the AICs, BICs#

3. Helping model selection#

4. Computing the Bayes Factor#

5. Discussion: Bayesian vs. Frequentist#

5.1. Philosophical Foundations#

5.2. Priors: The Core Divergence#

5.3. Interpretation of Results#

5.4. Why Astronomy Prefers Bayesian Statistics#

Summary: Why We Prefer Bayesian Methods in Astronomy#