Simulated and Synthetic Health Data: Improving Clinical Research on Rare Diseases. A Real-World Data Simulation of Autosomal Dominant Polycystic Kidney Disease (ADPKD) Trials. A Retrospective, Observational Study

NCT ID: NCT07016282

Last Updated: 2025-09-02

Study Results

Results pending

The study team has not published outcome measurements, participant flow, or safety data for this trial yet. Check back later for updates.

Basic Information

Get a concise snapshot of the trial, including recruitment status, study phase, enrollment targets, and key timeline milestones.

Recruitment Status

ACTIVE_NOT_RECRUITING

Total Enrollment

100 participants

Study Classification

OBSERVATIONAL

Study Start Date

2025-06-12

Study Completion Date

2027-06-30

Brief Summary

Review the sponsor-provided synopsis that highlights what the study is about and why it is being conducted.

This is a no-profit, retrospective observational study involving real-world data (RWD), retrieved from ADPKD-related electronic health records stored at Mario Negri Institute IRCCS. RWD will be used to generate simulated and synthetic datasets, using AI tools. RWD and generated data (GD) will be used to conduct three virtual RCTs, which main outcome is change in Total Kidney Volume (TKV). Statistical tests will be performed to assess quality and privacy preservation of GD compared with RWD. GD will be also evaluated in exploratory sample size estimations.

Detailed Description

Dive into the extended narrative that explains the scientific background, objectives, and procedures in greater depth.

Randomized clinical trials (RCTs) can be regarded as the least biased source of information to address intervention questions. One of the most common problems encountered in clinical trials focused on rare diseases is the difficulty in finding patients and therefore in building trials on sufficiently large population, in order to have more robust data and less methodological distortions. Several stratagems are already in use to deal with these problems, including extended trial duration, repeated outcome measures, patients genetic profiles, surrogate endpoint, multicenter studies. Another approach is to consider other trial designs in addition to parallel-arms design, such as crossover trial, n-of-1 trials, and adaptive trials.

Simulated and synthetic health data can represent new valid approaches to increase the representativeness of the patients, especially in rare diseases field, while reducing costs and time constraints, but also facing the limitations imposed by national and international regulations concerning privacy and data management. Simulation studies are defined as computer experiments that involve creating data by pseudo-random sampling from known probability distributions, based on Monte Carlo method. A promising approach now under development includes synthetic data, defined as artificially generated data with the aim of reproducing the statistical properties of an original dataset, through generative large languages models (LLMs).

Thus, while simulated data rely on known distributions that must be specified in advance, synthetic data are generated by LLMs that learn these distributions from training data, without the need for predefined distributions, offering a significant advantage in flexibility and applicability.

This study aims to find the most suitable tool for generating simulated and synthetic data in rare diseases field, and to compare the fidelity, quality, and privacy preservation of these datasets, derived from real-world ADPKD clinical trial data. Furthermore, a virtual clinical trial will be conducted using these three datasets to assess their validity in replicating real trial outcomes.

Finally, retrieved and generated data will be used to assess new sample size estimations for future clinical trial performed at the Clinical Research Center for Rare Disease "Aldo e Cele Daccò", Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Ranica (BG), Italy.

By using generative AI models, such as Generative Adversarial Networks (GANs), this study aims to overcome challenges related to data poverty and trial design. The results could provide valuable insights into whether synthetic data can be a useful tool for improving clinical trials in rare diseases, making them more efficient and cost-effective.

Conditions

See the medical conditions and disease areas that this research is targeting or investigating.

Autosomal Dominant Polycystic Kidney Disease (ADPKD)

Study Design

Understand how the trial is structured, including allocation methods, masking strategies, primary purpose, and other design elements.

Observational Model Type

CASE_CONTROL

Study Time Perspective

RETROSPECTIVE

Study Groups

Review each arm or cohort in the study, along with the interventions and objectives associated with them.

Real-world data form ADPKD patients

Real-world data from ADPKD-related electronic health records (EHR) stored at the Istituto di Ricerche Farmacologiche Mario Negri IRCCS, primarily based on the ALADIN (NCT00309283) and ALADIN 2 (NCT01377246) studies

No interventions assigned to this group

Simulated data

Data based on RWD from the ADPKD patients and derived from predefined statistical models (e.g., normal distribution for continuous variables, binomial distribution for categorical variables).

No interventions assigned to this group

Synthetic data

Data generated from the RWD of the ADPKD patients using generative large languages models (LLMs)

No interventions assigned to this group

Eligibility Criteria

Check the participation requirements, including inclusion and exclusion rules, age limits, and whether healthy volunteers are accepted.

Inclusion Criteria

* Adult (\>18 years) men and women with ADPKD according to Ravine criteria25
* Estimated glomerular filtration rate (eGFR) between 15 and 40 mL/min/1.73 m2 (CKD stage: G3b-G4) or higher (CKD stage: G1-G3a), as calculated by the Modification of Diet in Renal Disease study four variables equation

Exclusion Criteria

* confounding factors that could affect renal function loss independent of kidney growth and treatment allocation (i.e., diabetes mellitus, urinary protein excretion rate \>3 g/24 h)
* Abnormal urinalysis suggestive of concomitant, clinically significant glomerular disease, and urinary tract lithiasis or infection
* Patients with major systemic disease
* Patients unable to provide informed consent
* Pregnant, lactating, or potentially childbearing women without adequate contraception
Minimum Eligible Age

18 Years

Eligible Sex

ALL

Accepts Healthy Volunteers

No

Sponsors

Meet the organizations funding or collaborating on the study and learn about their roles.

Mario Negri Institute for Pharmacological Research

OTHER

Sponsor Role lead

Responsible Party

Identify the individual or organization who holds primary responsibility for the study information submitted to regulators.

Responsibility Role SPONSOR

Principal Investigators

Learn about the lead researchers overseeing the trial and their institutional affiliations.

Giuseppe Remuzzi, M.D.

Role: STUDY_DIRECTOR

Istituto Di Ricerche Farmacologiche Mario Negri

Locations

Explore where the study is taking place and check the recruitment status at each participating site.

Clinical Research Centre for Rare Diseases Aldo e Cele Daccò

Ranica, BG, Italy

Site Status

Department of Global Public Health (GPH), Karolinska Institutet

Stockholm, , Sweden

Site Status

Countries

Review the countries where the study has at least one active or historical site.

Italy Sweden

References

Explore related publications, articles, or registry entries linked to this study.

Bolignano D, Pisano A. Good-quality research in rare diseases: trials and tribulations. Pediatr Nephrol. 2016 Nov;31(11):2017-23. doi: 10.1007/s00467-016-3323-7. Epub 2016 Jan 27.

Reference Type BACKGROUND
PMID: 26817476 (View on PubMed)

Halpern SD, Karlawish JH, Berlin JA. The continuing unethical conduct of underpowered clinical trials. JAMA. 2002 Jul 17;288(3):358-62. doi: 10.1001/jama.288.3.358.

Reference Type BACKGROUND
PMID: 12117401 (View on PubMed)

Lilford RJ, Thornton JG, Braunholtz D. Clinical trials and rare diseases: a way out of a conundrum. BMJ. 1995 Dec 16;311(7020):1621-5. doi: 10.1136/bmj.311.7020.1621.

Reference Type BACKGROUND
PMID: 8555809 (View on PubMed)

Gagne JJ, Thompson L, O'Keefe K, Kesselheim AS. Innovative research methods for studying treatments for rare diseases: methodological review. BMJ. 2014 Nov 24;349:g6802. doi: 10.1136/bmj.g6802.

Reference Type BACKGROUND
PMID: 25422272 (View on PubMed)

Shurin S, Krischer J, Groft SC. Clinical trials In BMT: ensuring that rare diseases and rarer therapies are well done. Biol Blood Marrow Transplant. 2012 Jan;18(1 Suppl):S8-11. doi: 10.1016/j.bbmt.2011.10.030. No abstract available.

Reference Type BACKGROUND
PMID: 22226117 (View on PubMed)

van der Lee JH, Wesseling J, Tanck MW, Offringa M. Efficient ways exist to obtain the optimal sample size in clinical trials in rare diseases. J Clin Epidemiol. 2008 Apr;61(4):324-30. doi: 10.1016/j.jclinepi.2007.07.008. Epub 2008 Feb 21.

Reference Type BACKGROUND
PMID: 18313556 (View on PubMed)

Stone EM. Challenges in genetic testing for clinical trials of inherited and orphan retinal diseases. Retina. 2005 Dec;25(8 Suppl):S72-S73. doi: 10.1097/00006982-200512001-00034. No abstract available.

Reference Type BACKGROUND
PMID: 16374347 (View on PubMed)

Buckley BM. Clinical trials of orphan medicines. Lancet. 2008 Jun 14;371(9629):2051-5. doi: 10.1016/S0140-6736(08)60876-4. No abstract available.

Reference Type BACKGROUND
PMID: 18555919 (View on PubMed)

Kinder B, McCormack FX. Clinical trials for rare lung diseases: lessons from lymphangioleiomyomatosis. Lymphat Res Biol. 2010 Mar;8(1):71-9. doi: 10.1089/lrb.2009.0027.

Reference Type BACKGROUND
PMID: 20235889 (View on PubMed)

Lagakos SW. Clinical trials and rare diseases. N Engl J Med. 2003 Jun 12;348(24):2455-6. doi: 10.1056/NEJMe030024. No abstract available.

Reference Type BACKGROUND
PMID: 12802033 (View on PubMed)

Berlin JA. N-of-1 clinical trials should be incorporated into clinical practice. J Clin Epidemiol. 2010 Dec;63(12):1283-4. doi: 10.1016/j.jclinepi.2010.05.006. Epub 2010 Aug 30.

Reference Type BACKGROUND
PMID: 20800449 (View on PubMed)

Cerqueira FP, Jesus AMC, Cotrim MD. Adaptive Design: A Review of the Technical, Statistical, and Regulatory Aspects of Implementation in a Clinical Trial. Ther Innov Regul Sci. 2020 Jan;54(1):246-258. doi: 10.1007/s43441-019-00052-y. Epub 2020 Jan 6.

Reference Type BACKGROUND
PMID: 32008232 (View on PubMed)

Yoon J, Drumright LN, van der Schaar M. Anonymization Through Data Synthesis Using Generative Adversarial Networks (ADS-GAN). IEEE J Biomed Health Inform. 2020 Aug;24(8):2378-2388. doi: 10.1109/JBHI.2020.2980262. Epub 2020 Mar 12.

Reference Type BACKGROUND
PMID: 32167919 (View on PubMed)

Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019 May 20;38(11):2074-2102. doi: 10.1002/sim.8086. Epub 2019 Jan 16.

Reference Type BACKGROUND
PMID: 30652356 (View on PubMed)

METROPOLIS N, ULAM S. The Monte Carlo method. J Am Stat Assoc. 1949 Sep;44(247):335-41. doi: 10.1080/01621459.1949.10483310. No abstract available.

Reference Type BACKGROUND
PMID: 18139350 (View on PubMed)

Chen RJ, Lu MY, Chen TY, Williamson DFK, Mahmood F. Synthetic data in machine learning for medicine and healthcare. Nat Biomed Eng. 2021 Jun;5(6):493-497. doi: 10.1038/s41551-021-00751-8.

Reference Type BACKGROUND
PMID: 34131324 (View on PubMed)

Caroli A, Perico N, Perna A, Antiga L, Brambilla P, Pisani A, Visciano B, Imbriaco M, Messa P, Cerutti R, Dugo M, Cancian L, Buongiorno E, De Pascalis A, Gaspari F, Carrara F, Rubis N, Prandini S, Remuzzi A, Remuzzi G, Ruggenenti P; ALADIN study group. Effect of longacting somatostatin analogue on kidney and cyst growth in autosomal dominant polycystic kidney disease (ALADIN): a randomised, placebo-controlled, multicentre trial. Lancet. 2013 Nov 2;382(9903):1485-95. doi: 10.1016/S0140-6736(13)61407-5. Epub 2013 Aug 21.

Reference Type BACKGROUND
PMID: 23972263 (View on PubMed)

Perico N, Ruggenenti P, Perna A, Caroli A, Trillini M, Sironi S, Pisani A, Riccio E, Imbriaco M, Dugo M, Morana G, Granata A, Figuera M, Gaspari F, Carrara F, Rubis N, Villa A, Gamba S, Prandini S, Cortinovis M, Remuzzi A, Remuzzi G; ALADIN 2 Study Group. Octreotide-LAR in later-stage autosomal dominant polycystic kidney disease (ALADIN 2): A randomized, double-blind, placebo-controlled, multicenter trial. PLoS Med. 2019 Apr 5;16(4):e1002777. doi: 10.1371/journal.pmed.1002777. eCollection 2019 Apr.

Reference Type BACKGROUND
PMID: 30951521 (View on PubMed)

Cornec-Le Gall E, Alam A, Perrone RD. Autosomal dominant polycystic kidney disease. Lancet. 2019 Mar 2;393(10174):919-935. doi: 10.1016/S0140-6736(18)32782-X. Epub 2019 Feb 25.

Reference Type BACKGROUND
PMID: 30819518 (View on PubMed)

Chebib FT, Perrone RD, Chapman AB, Dahl NK, Harris PC, Mrug M, Mustafa RA, Rastogi A, Watnick T, Yu ASL, Torres VE. A Practical Guide for Treatment of Rapidly Progressive ADPKD with Tolvaptan. J Am Soc Nephrol. 2018 Oct;29(10):2458-2470. doi: 10.1681/ASN.2018060590. Epub 2018 Sep 18.

Reference Type BACKGROUND
PMID: 30228150 (View on PubMed)

Ravine D, Gibson RN, Walker RG, Sheffield LJ, Kincaid-Smith P, Danks DM. Evaluation of ultrasonographic diagnostic criteria for autosomal dominant polycystic kidney disease 1. Lancet. 1994 Apr 2;343(8901):824-7. doi: 10.1016/s0140-6736(94)92026-5.

Reference Type BACKGROUND
PMID: 7908078 (View on PubMed)

Levey AS, Stevens LA, Schmid CH, Zhang YL, Castro AF 3rd, Feldman HI, Kusek JW, Eggers P, Van Lente F, Greene T, Coresh J; CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration). A new equation to estimate glomerular filtration rate. Ann Intern Med. 2009 May 5;150(9):604-12. doi: 10.7326/0003-4819-150-9-200905050-00006.

Reference Type BACKGROUND
PMID: 19414839 (View on PubMed)

Levey AS, Bosch JP, Lewis JB, Greene T, Rogers N, Roth D. A more accurate method to estimate glomerular filtration rate from serum creatinine: a new prediction equation. Modification of Diet in Renal Disease Study Group. Ann Intern Med. 1999 Mar 16;130(6):461-70. doi: 10.7326/0003-4819-130-6-199903160-00002.

Reference Type BACKGROUND
PMID: 10075613 (View on PubMed)

Thiesmeier R, Orsini N. Rolling the DICE (Design, Interpret, Compute, Estimate): Interactive Learning of Biostatistics With Simulations. JMIR Med Educ. 2024 Apr 15;10:e52679. doi: 10.2196/52679.

Reference Type BACKGROUND
PMID: 38619866 (View on PubMed)

Pezoulas VC, Zaridis DI, Mylona E, Androutsos C, Apostolidis K, Tachos NS, Fotiadis DI. Synthetic data generation methods in healthcare: A review on open-source tools and methods. Comput Struct Biotechnol J. 2024 Jul 9;23:2892-2910. doi: 10.1016/j.csbj.2024.07.005. eCollection 2024 Dec.

Reference Type BACKGROUND
PMID: 39108677 (View on PubMed)

Zhang Z, Yan C, Mesa DA, Sun J, Malin BA. Ensuring electronic medical record simulation through better training, modeling, and evaluation. J Am Med Inform Assoc. 2020 Jan 1;27(1):99-108. doi: 10.1093/jamia/ocz161.

Reference Type BACKGROUND
PMID: 31592533 (View on PubMed)

Sun C, Dumontier M. Generating unseen diseases patient data using ontology enhanced generative adversarial networks. NPJ Digit Med. 2025 Jan 3;8(1):4. doi: 10.1038/s41746-024-01421-0.

Reference Type BACKGROUND
PMID: 39753917 (View on PubMed)

Other Identifiers

Review additional registry numbers or institutional identifiers associated with this trial.

SAILING-ADPKD

Identifier Type: -

Identifier Source: org_study_id

More Related Trials

Additional clinical trials that may be relevant based on similarity analysis.

ADPKD Cohort Study
NCT02084849 COMPLETED
Prospective Validation of AKI Prediction
NCT06804200 NOT_YET_RECRUITING
New Analytic Tools for aHUS and C3G Diagnosis
NCT05985122 ACTIVE_NOT_RECRUITING NA