Simulating Data with Leaspy#

This example demonstrates how to use Leaspy to simulate longitudinal data based on a fitted model.

The following imports bring in the required modules and load the synthetic Parkinson dataset from Leaspy. A logistic model will be fitted on this dataset and then used to simulate new longitudinal data.

from leaspy.datasets import load_dataset
from leaspy.io.data import Data

df = load_dataset("parkinson")

The clinical and imaging features of interest are selected and the DataFrame is converted into a Leaspy Data object that can be used for model fitting.

data = Data.from_dataframe(
    df[
        [
            "MDS1_total",
            "MDS2_total",
            "MDS3_off_total",
            "SCOPA_total",
            "MOCA_total",
            "REM_total",
            "PUTAMEN_R",
            "PUTAMEN_L",
            "CAUDATE_R",
            "CAUDATE_L",
        ]
    ]
)

A logistic model with a two-dimensional latent space is initialized.

from leaspy.models import LogisticModel

model = LogisticModel(name="test-model", source_dimension=2)

The model is fitted to the data using the MCMC-SAEM algorithm. A fixed seed is used for reproducibility and 100 iterations are performed.

model.fit(
    data,
    "mcmc_saem",
    n_iter=100,
    progress_bar=False,
)
Fit with `AlgorithmName.FIT_MCMC_SAEM` took: 3.71s

The parameters for simulating patient visits are defined. These parameters specify the number of patients, the visit spacing, and the timing variability.

visit_params = {
    "patient_number": 5,
    "visit_type": "random",  # The visit type could also be 'dataframe' with df_visits.
    # "df_visits": df_test           # Example for custom visit schedule.
    "first_visit_mean": 0.0,  # The mean of the first visit age/time.
    "first_visit_std": 0.4,  # The standard deviation of the first visit age/time.
    "time_follow_up_mean": 11,  # The mean follow-up time.
    "time_follow_up_std": 0.5,  # The standard deviation of the follow-up time.
    "distance_visit_mean": 2 / 12,  # The mean spacing between visits in years.
    "distance_visit_std": 0.75
    / 12,  # The standard deviation of the spacing between visits in years.
    "min_spacing_between_visits": 1,  # The minimum allowed spacing between visits.
}

A new longitudinal dataset is simulated from the fitted model using the specified parameters.

df_sim = model.simulate(
    algorithm="simulate",
    features=[
        "MDS1_total",
        "MDS2_total",
        "MDS3_off_total",
        "SCOPA_total",
        "MOCA_total",
        "REM_total",
        "PUTAMEN_R",
        "PUTAMEN_L",
        "CAUDATE_R",
        "CAUDATE_L",
    ],
    visit_parameters=visit_params,
)
Simulate with `simulate` took: 0.03s

The simulated data is converted back to a pandas DataFrame for inspection.

The simulated longitudinal dataset is displayed below.

ID TIME MDS1_total MDS2_total MDS3_off_total SCOPA_total MOCA_total REM_total PUTAMEN_R PUTAMEN_L CAUDATE_R CAUDATE_L
0 0 59.0 0.189837 0.108346 0.333539 0.129448 0.047005 0.420709 0.613526 0.773531 0.529728 0.343298
1 0 60.0 0.193939 0.062798 0.254718 0.114945 0.046977 0.431756 0.504392 0.703654 0.450950 0.333464
2 0 61.0 0.030214 0.171499 0.307285 0.253302 0.226160 0.430963 0.719570 0.732159 0.357568 0.451332
3 0 62.0 0.118672 0.152737 0.300467 0.437961 0.029535 0.508501 0.741038 0.903827 0.491854 0.541575
4 0 63.0 0.085761 0.217857 0.320099 0.153371 0.217154 0.630533 0.788019 0.862324 0.623736 0.523061
5 0 64.0 0.253548 0.155311 0.256859 0.272580 0.046067 0.563549 0.647653 0.839016 0.700722 0.570511
6 0 65.0 0.270736 0.280015 0.285188 0.169387 0.034665 0.700266 0.896325 0.802742 0.658602 0.734675
7 0 66.0 0.242705 0.252954 0.205068 0.376477 0.155729 0.633043 0.808777 0.911349 0.629066 0.555425
8 0 67.0 0.275077 0.150783 0.245891 0.233559 0.184104 0.795353 0.810372 0.752424 0.723899 0.673273
9 0 68.0 0.305548 0.321285 0.393659 0.368660 0.156345 0.780888 0.805010 0.954272 0.424553 0.750659


This concludes the simulation example using Leaspy. Stay tuned for more examples on model fitting and analysis!

Total running time of the script: (0 minutes 4.194 seconds)

Gallery generated by Sphinx-Gallery