causal_falsify.utils.simulate_data module
- causal_falsify.utils.simulate_data.create_polynomial_representation(X, degree)[source]
Generate polynomial features for the input array X up to a specified degree.
- Parameters:
X (np.ndarray) – Input array of shape (n_samples, n_features).
degree (int) – Degree of the polynomial features.
- Returns:
Array of shape (n_samples, n_features * degree) containing the polynomial features.
- Return type:
np.ndarray
- causal_falsify.utils.simulate_data.simulate_data(n_samples, degree=1, conf_strength=1.0, transportability_violation=0.0, n_envs=50, n_observed_confounders=5, seed=None)[source]
Simulates synthetic data for causal inference experiments with multiple environments, observed confounders, and configurable treatment and outcome mechanisms.
- Parameters:
n_samples (int) – Number of samples to generate per environment.
degree (int, optional) – Degree of polynomial transformation applied to observed confounders (default is 1, i.e., linear).
conf_strength (float, optional) – Strength of confounding between treatment and outcome (default is 1.0).
transportability_violation (float, optional) – Degree of violation in transportability across environments (default is 0.0).
n_envs (int, optional) – Number of distinct environments to simulate (default is 50).
n_observed_confounders (int, optional) – Number of observed confounders/features (default is 5).
seed (int, optional) – Random seed for reproducibility (default is None).
- Returns:
A pandas DataFrame containing the simulated data with columns: - ‘A’: Treatment variable - ‘Y’: Outcome variable - ‘X_0’, …, ‘X_{n_observed_confounders-1}’: Observed confounders - ‘S’: Environment index
- Return type:
pd.DataFrame
Notes
The function generates data for multiple environments, each with its own parameters.
Observed confounders can be transformed using polynomial features.
Unobserved confounding and transportability violations can be controlled via parameters.