causal_falsify.utils.simulate_data module

causal_falsify.utils.simulate_data.create_polynomial_representation(X, degree)[source]

Generate polynomial features for the input array X up to a specified degree.

Parameters:
  • X (np.ndarray) – Input array of shape (n_samples, n_features).

  • degree (int) – Degree of the polynomial features.

Returns:

Array of shape (n_samples, n_features * degree) containing the polynomial features.

Return type:

np.ndarray

causal_falsify.utils.simulate_data.simulate_data(n_samples, degree=1, conf_strength=1.0, transportability_violation=0.0, n_envs=50, n_observed_confounders=5, seed=None)[source]

Simulates synthetic data for causal inference experiments with multiple environments, observed confounders, and configurable treatment and outcome mechanisms.

Parameters:
  • n_samples (int) – Number of samples to generate per environment.

  • degree (int, optional) – Degree of polynomial transformation applied to observed confounders (default is 1, i.e., linear).

  • conf_strength (float, optional) – Strength of confounding between treatment and outcome (default is 1.0).

  • transportability_violation (float, optional) – Degree of violation in transportability across environments (default is 0.0).

  • n_envs (int, optional) – Number of distinct environments to simulate (default is 50).

  • n_observed_confounders (int, optional) – Number of observed confounders/features (default is 5).

  • seed (int, optional) – Random seed for reproducibility (default is None).

Returns:

A pandas DataFrame containing the simulated data with columns: - ‘A’: Treatment variable - ‘Y’: Outcome variable - ‘X_0’, …, ‘X_{n_observed_confounders-1}’: Observed confounders - ‘S’: Environment index

Return type:

pd.DataFrame

Notes

  • The function generates data for multiple environments, each with its own parameters.

  • Observed confounders can be transformed using polynomial features.

  • Unobserved confounding and transportability violations can be controlled via parameters.