A FRAMEWORK FOR ENSURING DATA INTEGRITY IN HEALTHCARE DATA MIGRATION

Authors

  • Jagrutiben Padhiyar
  • Akash Narendrakumar Parmar

DOI:

https://doi.org/10.53555/cse.v11i1.2474

Keywords:

Data Migration, Data Integrity, JSON, SQL

Abstract

The digital transformation of healthcare systems has accelerated the adoption of electronic health records (EHRs) and data-driven technologies. As healthcare institu- tions transition from legacy systems to modern platforms, data migration has become a critical process for consolidating, storing, and analyzing patient data. Despite its im- portance, data migration introduces significant risks to data integrity. Missing records, corrupted values, duplication, and schema mismatches during migration can lead to inaccurate or incomplete datasets. In the healthcare domain, such inconsistencies can have severe consequences, including misdiagnosis, delayed treatments, financial errors, and regulatory non-compliance. Therefore, maintaining strict data integrity during migration is essential to ensuring patient safety, trustworthiness of clinical decisions, and adherence to standards such as HIPAA, GDPR, and HL7/FHIR.

Healthcare data is highly complex, often containing patient demographics, diag- nostic information, laboratory results, and treatment histories. The heterogeneity of formats—ranging from spreadsheets and CSV files to JSON structures and relational SQL databases—compounds the challenge of reliable migration. Traditional migration tools primarily focus on transferring data efficiently but lack comprehensive mechanisms for validating integrity at every stage. This gap necessitates a structured framework that integrates validation techniques to guarantee that data remains accurate, complete, and consistent after migration.

This study addresses these challenges by proposing a framework for healthcare data migration that emphasizes integrity validation. The methodology involves simulating the migration of an open healthcare dataset from flat file formats (CSV/Excel) into JSON and SQL databases. Pre- and post-migration validation is performed using au- tomated methods such as record count comparison, duplicate detection, and checksum verification. Additionally, domain-specific data quality rules are applied to verify the accuracy of mandatory attributes such as patient ID, age, and diagnosis. Python li- braries, including pandas for data manipulation and hashlib for checksum computation, are employed to design a lightweight validation pipeline. The framework aims to provide a reproducible and efficient approach to healthcare data migration, with the ultimate goal of minimizing risks and ensuring data reliability in clinical environments.

Downloads

Download data is not yet available.

Author Biography

Akash Narendrakumar Parmar

Rajiv Gandhi Proudyogiki Vishwavidyalaya (R.G.P.V) Bachelor in Mechanical Engineering

References

M. Al-Hamdani, A. Y. Al-Dubai, and M. Al-Rawi. Opportunities and challenges of data migration in cloud computing for healthcare systems. Journal of Cloud Computing: Advances, Systems and Applications, 9(1):1–15, 2020.

M. Al-Hamdani and M. Al-Rawi. Cloud-based healthcare data migration pipelines: Challenges and best practices. Journal of Cloud Computing: Advances, Systems and Applications, 10(1):1–14, 2021.

C. Batini, C. Cappiello, C. Francalanci, and A. Maurino. Methodologies for data quality assessment and improvement. ACM Computing Surveys (CSUR), 41(3):1–52, 2010.

C. Batini and M. Scannapieco. Data and information quality: Dimensions, principles, and techniques. Springer, 2016.

J. Bisbal, D. Lawless, B. Wu, and J. Grimson. Legacy information system migration: A brief review of problems, solutions, and research issues. IEEE Software, 16(3):34–44, 1999.

I. Jarmakovica. Machine learning strategies for data quality enhancement in healthcare data migration. Computers in Biology and Medicine, 158:106657, 2025.

M. H. Kuo and A. W. Kushniruk. Migration of electronic health records: Challenges and strategies for ensuring data integrity. Journal of Medical Systems, 43(7):1–9, 2019.

X. Li, Y. Zhang, and H. Liu. Exploring secure hashing algorithms for data integrity verification. Journal of Computer Security, 26(3):345–367, 2018.

X. Li, Y. Zhang, and H. Liu. Machine learning approaches for anomaly detection in healthcare data migration. Journal of Healthcare Engineering, 2020:1–12, 2020.

A. MacKenzie. Participatory design approaches for healthcare data migration processes. International Journal of Medical Informatics, 150:104433, 2021.

T. Naidoo. Unsupervised machine learning frameworks for anomaly detection in healthcare data migration. Health Informatics Journal, 26(3):2059–2075, 2020.

E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. The VLDB Journal, 10(4):334–350, 2001.

E. Rahm and H. H. Do. Data cleaning: Problems and current approaches. IEEE Transactions on Knowledge and Data Engineering, 17(4):1–16, 2000.

P. Saripalle. Large language model and rag-based schema alignment for healthcare data migration. Journal of Biomedical Informatics, 140:104260, 2025.

R. Y. Wang and D. M. Strong. Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4):5–34, 1996.

N. G. Weiskopf and C. Weng. Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research. Journal of the American Medical Informatics Association, 20(1):144–151, 2013.

Downloads

Published

2025-11-04

How to Cite

Jagrutiben Padhiyar, & Akash Narendrakumar Parmar. (2025). A FRAMEWORK FOR ENSURING DATA INTEGRITY IN HEALTHCARE DATA MIGRATION. International Journal For Research In Advanced Computer Science And Engineering, 11(1), 1–10. https://doi.org/10.53555/cse.v11i1.2474