Endocrinology and Metabolism (original) (raw)

Original Article

Development of Clinical Data Mart of HMG-CoA Reductase Inhibitor for Varied Clinical Research

[Hun-Sung Kim](/articles/search%5Fresult.php?term%5Ftype=authors&term=Hun-Sung Kim)1,2*orcid, [Hyunah Kim](/articles/search%5Fresult.php?term%5Ftype=authors&term=Hyunah Kim)3*, [Yoo Jin Jeong](/articles/search%5Fresult.php?term%5Ftype=authors&term=Yoo Jin Jeong)1, [Tong Min Kim](/articles/search%5Fresult.php?term%5Ftype=authors&term=Tong Min Kim)1, [So Jung Yang](/articles/search%5Fresult.php?term%5Ftype=authors&term=So Jung Yang)1, [Sun Jung Baik](/articles/search%5Fresult.php?term%5Ftype=authors&term=Sun Jung Baik)1, [Seung-Hwan Lee](/articles/search%5Fresult.php?term%5Ftype=authors&term=Seung-Hwan Lee)2, [Jae Hyoung Cho](/articles/search%5Fresult.php?term%5Ftype=authors&term=Jae Hyoung Cho)2, [In Young Choi](/articles/search%5Fresult.php?term%5Ftype=authors&term=In Young Choi)1, [Kun-Ho Yoon](/articles/search%5Fresult.php?term%5Ftype=authors&term=Kun-Ho Yoon)1,2

Endocrinology and Metabolism 2017;32(1):90-98.
DOI: https://doi.org/10.3803/EnM.2017.32.1.90
Published online: February 28, 2017

1Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea.

2Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea.

3College of Pharmacy, Sookmyung Women's University, Seoul, Korea.

Corresponding author: Kun-Ho Yoon. Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, 222 Banpo-daero, Seocho-gu, Seoul 06591, Korea. Tel: +82-2-2258-8262, Fax: +82-2-2258-8297, yoonk@catholic.ac.kr

Corresponding author: In Young Choi. Department of Medical Informatics, College of Medicine, The Catholic University of Korea, 222 Banpo-daero, Seocho-gu, Seoul 06591, Korea. Tel: +82-2-2258-8262, Fax: +82-2-2258-8297, iychoi@catholic.ac.kr

*These authors contributed equally to this work.

• Received: November 9, 2016 • Revised: January 2, 2017 • Accepted: January 6, 2017

Copyright © 2017 Korean Endocrine Society

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

ABSTRACT

INTRODUCTION

Recently, there has been a rapid increase in the use of electronic medical records (EMRs). These computerized database (DB) system records eliminate the need for hand writing medical information, such as symptoms, diagnoses, and clinical results, on paper [1]. Concerns exist about the use of EMR for clinical research owing to a lack of random sampling and subsequent generalizability [2]. However, several recent reports have presented the advantages of EMR [3456]. EMR data are systematically managed and easily accessed, enabling the ability to collect information about a patient's current medical, past medical, family, and therapeutic histories. EMR data also include interdisciplinary clinical treatments and prescriptions from other departments within the same clinical center. With the increasing use of EMR systems for documenting clinical medical data, clinical trials have also increasingly accessed EMR data [789].

Randomized controlled trials (RCTs), in which medical interventions are conducted in a targeted patient population, are preferred in clinical trials when testing the efficacy and effectiveness of a drug or treatment, or when documenting the progression of a disease. Strict standards are generally applied to participant selection for RCTs to ensure the validity of the results. However, RCTs are expensive and time-intensive; consequently, study enrollment is often limited. EMR-based studies are similar in structure to a cohort study; however, the use of EMR data in studies allows for quick and simple extraction of large amounts of data that were collected over long periods of time. This capability will become a greater advantage as more EMR data are accumulated over time [101112]. Therefore, EMR-based clinical research requires a standardized clinical data mart (CDM) that various researchers can employ to readily extract necessary data depending on their diverse research objectives.

In this study, 3-hydroxy-3-methylglutaryl-coenzyme A reductase inhibitor (statin) was selected. It is the predominantly prescribed statin for preventing cardiovascular disorders [13]. Statins are used to prevent cardiovascular disease by lowering low density lipoprotein cholesterol (LDL-C) and triglyceride (TG) levels and increasing high density lipoprotein cholesterol (HDL-C) levels [1415]. Several types of statins exist on the market with differing efficacies, and numerous clinical trials have compared their respective effects [1415]. The primary purpose of this study was to develop a clinical statin data mart to address various purposes, such as assessments of drug efficacy and safety. We intended to aggregate a large amount of data on patients who were prescribed a statin for the first time to enable researchers to conduct their relevant studies. By establishing a CDM that includes patient personal data, medical history, medication history, and other patient information relating to statins, we strived to facilitate rapid and effective access to diverse patient information and data. We furthermore strived to open a CDM for authorized researchers and other users to share and enhance study results through open technology.

METHODS

Clinical data warehouse

Directly extracting research data via a query from the EMR system can impact the performance of the system for routine hospital use. A clinical data warehouse (CDW) provides numerous benefits to researchers for quality data collection and decision-making capabilities through quick and efficient access to patient information and links to multiple operational data sources. Furthermore, it can be used to discover disease relationships and drugs in addition to repositioning drugs by combining different data sources and validating the consistency of information. Therefore, we established the study dataset to include diagnosis and laboratory data based on a CDW system [16]. The CDW system of Seoul St. Mary's Hospital is currently comprised of 30 tables that include all clinical, prescription, laboratory, radiology, pathology, and other information of all patients at the hospital since 1997. Currently, data exist in the system for approximately 2.8 million patients, 47 million medication prescription events, and laboratory results for 150 million cases. Of the 2.2 billion total records, we developed a research DB from a subset of the data based on a study protocol. It additionally provides comprehensive views of clinical data for specific purposes [1718].

Extraction of the study sample

Data for patients who were prescribed a statin for the first time at Seoul St. Mary's Hospital between January 1, 2009 and December 31, 2015 were extracted from the CDW. Cases were identified as those who did not have a statin prescription for at least six months before a statin was initially prescribed. The date on which the initial statin prescription occurred was defined as visit 0 (index date, baseline) (Fig. 1). Visit 1 (an average of 3 months later) was defined as the next occurrence of a laboratory test and subsequent renewal of the statin prescription within 45 to 135 days of the baseline. Visit 2 was defined as the subsequent visit that occurred within 136 to 225 days after the baseline (an average of 6 months later). Visit 3 was defined as 226 to 315 days after the baseline (an average of 9 months later), and visit 4 was defined as 316 to 405 days after the baseline (an average of 1 year later). Cases were checked when the prescription changed to a different statin type or the statin prescription was suspended during the study period. When the patients visited the hospital or had a blood test performed more than once within one period of a visit, the test results of the dates closest to the 91th, 182th, 273th, and 365th day were retrieved.

This study covered all statin types that are prescribed at Seoul St. Mary's Hospital. According to American College of Cardiology/American Heart Association guidelines [15], we classified statins based on intensity, type, and dose. The statin types and dosages are as follows: atorvastatin (10, 20, and 40 mg), fluvastatin (40 and 80 mg), pitavastatin (2 and 4 mg), pravastatin (10, 20, and 40 mg), rosuvastatin (5, 10, and 20 mg), simvastatin (20 and 40 mg), and simvastatin plus ezetimibe (10/10 and 20/10 mg). The combination statins and other drugs with other effects were excluded from the study (atorvastatin plus amlodipine or pravastatin plus fenofibrate).

Patient data were extracted, including date of birth, age (when first prescribed statins), sex, department in which the statins were first prescribed, first statin prescription date, days of prescription, etc. Besides total cholesterol, TG, HDL-C, LDL-C, which are indicators relevant to hyperlipidemia, blood tests were also covered in this study, such as blood urea nitrogen, creatinine, aspartate aminotransferase/alanine aminotransferase (AST/ALT), hemoglobin/hematocrit, glycated hemoglobin, alkaline phosphatase, high-sensitivity C-reactive protein, γ-glutamyl transpeptidase, thyroid function test (thyroid stimulating hormone, free thyroxine), and others. In modeling the DB, we structuralized relations between objects of unique features comprising it (Fig. 2).

In addition to hyperlipidemia, various diseases, including cardiovascular disorders, diabetes, and others, have been recently reported as relevant to statins; thus, they were included. Moreover, to study statin side effects, consideration was given to whether a patient used fenofibrate, omega-3 fatty acids, propranolol, thyroxine, warfarin, nicotinic acid, etc., which are known to affect hepatotoxicity.

Privacy protection

This study was a retrospective cohort study using EMR data retained by one hospital. All files of extracted data were encoded to prevent personal identification of patients during the extraction process. Only one managing researcher was allowed to access the data; observers and analysts received the data with all personally identifying patient information deleted. Thus, they were unable to identify the actual patient numbers. Data utilized in this study did not include patient's personal information, and there was no risk of physical or psychological damage to the patient subject. Because the data were encoded and anonymized, and because the study was a retrospective cohort type, this study did not affect the patient subjects' rights and welfare. Therefore, informed consent was not required. This study was approved by the Institutional Review Board of the Catholic University of Korea.

RESULTS

Establishing the statin data mart to support clinical studies

We employed initial clinical data of statin obtained from a CDW that was established to support clinical studies. Through a data quality management (DQM) process, the initial clinical data were refined. Because data was extracted from the EMRs of patients, there were many duplicates and errors. For example, the most frequent case was the inclusion of letters or inequality in the extracted clinical laboratory scores field. We performed pre-processing work and DQM operations on these data successively. As part of the DQM, abnormal data (redundant data, out-of-range data, meaningless data, null space, etc.) were identified using clinical judgments as well as statistical methods. The abnormal data were re-confirmed by direct chart review. This enabled further extraction of patient data in accordance with the objective of each clinical study and the use of an optimized DB. In this study, clinical information of patients who were prescribed statins for the first time was extracted from the CDW. A data mart for analysis appropriate to the study objective was established, and relevant DB information was obtained. Various structured or encoded clinical information could be automatically incorporated into the DB. However, unstructured free text content, such as patient height and weight, was manually inputted by direct chart review. To enhance the reliability of the data, problematic values were reviewed and compared with the original data. A data description table was developed (Fig. 3). Personal identification information (name, social security number, etc.) was not included when extracting the requisite clinical data. Patient numbers were collected anonymously during the data collection.

Statin data mart composition

The data included patient personal information, such as height, weight, age, sex, etc. Patient laboratory results included glucose levels, AST/ALT values, and others, as shown in Table 1 for the baseline, visit 1 (an average of 3 months later), visit 2 (an average of 6 months later), visit 3 (an average of 9 months later), and visit 4 (an average of 12 months later) (Fig. 1). The presence of a diagnosis, such as hypertension, diabetes mellitus, etc., and the date when the diagnosis was first provided were together extracted. Accordingly, it was possible to distinguish whether certain diseases occurred before or after the first statin prescription and how soon the diseases occurred after the first statin prescription. Additionally extracted were various medications that were prescribed besides statins or those that could interact with statins.

Baseline characteristics of statin data mart

Data were extracted of a total of 21,368 patients who were prescribed statins for the first time at the hospital over 7 years, from January 2009 to December 2015 (Fig. 4). The percentage of males was 44.2% (9,439/21,368); the percentage of females was 55.8% (11,929/21,368). The mean age was 63±12 years. A total of 17 different statins were extracted. Atorvastatin (10 mg; 21.0%, 4,490/21,368) and rosuvastatin (10 mg; 20.4%, 4,364/21,368) were the most commonly prescribed, followed by simvastatin (20 mg; 11.7%, 2,493/21,368) and pitavastatin (2 mg; 11.1%, 2,364/21,368). Statins were most often first prescribed by the endocrinology department (69.6%, 14,865/21,368), followed by cardiology (13.0%, 2,770/21,368) and neurology (3.9%, 832/21,368).

DISCUSSION

We selected statins that were initially prescribed when addressing hyperlipidemia. We established a clinical DB using the data of patients who were prescribed statins for the first time at Seoul St. Mary's Hospital. Subjects of EMR clinical data extraction for the statin study were defined; thus, this CDM included the patient personal information, diagnosis information, etc. Clinical researchers can conduct diverse, appropriate, and optimal EMR-based large-scaled retrospective cohort studies using this data depending on their study purposes.

Through a year of laboratory follow-up testing, researchers can analyze effects of statins, such as LDL-C lowering effects, and they can conduct basic research about guidelines for each statin. By including various illnesses and diseases, a risk model of occurrences [19] of cardiovascular disorders [20] can be developed. Moreover, research on recent issues, such as correlations between statins and occurrences of diabetes [21] or cancers [22], can be quickly and easily conducted. Researchers can perform assessments and analyses of economic efficiency [23] regarding statins and comparison analyses of diverse adverse drug effects [24]. It is expected that analysis of effects of statins and the causes of disease occurrences will be possible by using a Bayesian network. Cost effectiveness, which depends on the different patient ages, risk types, and risks of side effects, can also be analyzed. Effects of statins that can be evaluated as outcomes include the rate of LDL-C lowering and contraction of cardiovascular disorders [13].

Various studies using the CDM have been conducted [25]. Prescription rates of each department have been analyzed along with prescription patterns [26]. In a previous study, cardiologists were determined to be the most frequent prescribers of statins [26]. However, our study showed that the initial prescription of a statin was most commonly provided by an endocrinologist. This may be because the endocrinologist prescribes statins for preventing cardiovascular disease. The validity of statin studies using EMR data has been proved [13]. Various studies can thus be conducted, such as those on different side effects that depend on different types/volumes of statins, actual statin prescription examples and problems, development of programs to predict clinical aspects, research on various cases that cannot be conducted with RCT, and rapid studies on rare side effects.

CDM development has become possible because of the dissemination of EMR. Accordingly, clinical studies that use a CDM have various advantages [22728]. EMR data can provide information that is not available with traditional paper medical records, including information about the various treatments for each patient. Extensive quantities of medical data can be easily and quickly extracted by EMR [2]. The use of EMR addresses the disadvantages of both cohort studies (follow-up costs, long study periods, and maintenance of consistency during the study period) and RCTs (loss of follow-up, changes in treatment, long study periods, and expenses). The investigator has direct access to the EMR system and can quickly verify a hypothesis by sampling variables from the system that are based on the hypothesis. As additional varied data are accumulated, studies regarding cause-and-effect relationships of rare factors become possible.

Furthermore, the additional advantage exists of increasing the generalizability of the results. In this study, we believed it was necessary to organize such diverse data by standardizing them. We thus strived to establish a CDM under a systematic plan from the beginning. Moreover, it is possible to develop a multi-center integration data mart to enable researchers of other hospitals to add their statin data to our data mart. We therefore employed a standard protocol. Consequently, clinical research using our CDM has many benefits. It can save time and labor compared with conventional clinical studies, which is the most important aspect. This is because it can help preview results before conventional long-term studies and can thus benefit clinical researchers. A large amount of clinical data can be collected within a short period. Unlike conventional clinical studies that often extrapolate results of the entire population on the basis of a small sample size, in EMR analysis, the amount of data can be significantly larger, thereby offsetting other limitations.

Previous research on the use of EMR data focused primarily on its convenience and ease of access [22728]. However, with the recent accumulation of extensive data in EMR systems, these data can be used in clinical studies. EMR data can be used to determine the efficacy of medications that are currently available. Owing to the ability of inexpensively extracting large quantities of data in a short period, EMR systems will become more valuable for research in the future. Moreover, previously unknown information that clinical researchers did not predict in the study planning process may come to light. Therefore, it seems that EMR systems may lead to new knowledge and theories.

Despite these strengths of the EMR system, EMR-based clinical research has certain limitations. First, some possible cofactors and confounders cannot be accessed from the DB, including compliance with medication prescription, and the severity of potential diseases. However, an EMR-based trial is a reflection of actual practice relating to statin use. A researcher has a standardized study plan that is identified early to minimize confounding factors, which can have a significant impact on the results. Second, our data mart was conducted over a short period of 12 months in a single center. With an increase in the amount of clinical or biochemical data and its intensified variation, additional biochemical variables can become available that would identify patients who respond well to a statin compared to those who do not. To accomplish this, additional data in the EMR and a longer period study are necessary.

As EMR have been widely implemented in hospitals in Korea, we expect a drastic increase in the analysis of accumulated EMR data. To obtain clinically significant data from the data analyzed from large hospitals, a CDW can be an important tool. Standardization of statin prescription can be possible by effectively verifying effects of each statin through various studies using CDM, and collecting domestic data will contribute to ground-based prescription and clinical and health research. Moreover, establishing data and developing algorithms can become foundational to Korean guidelines of dyslipidemia. Clearly, CDM cannot replace RCT. Nonetheless, CDM research in which rapid data extraction is possible can help establish directions of RCT research. New directions of RCT can be roughly established through CDM research prior to RCT. With the advent of technology, we expect that the use of EMR can potentially lead to various types of clinical research.

Acknowledgements

ACKNOWLEDGMENTS

This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (grant number: HC15C1362).

Article information

CONFLICTS OF INTEREST: No potential conflict of interest relevant to this article was reported.

References

Fig. 1

Design of clinical data mart for clinical trials on statins.

enm-32-90-g001.jpg

Fig. 2

Entity relationship diagram. ID, identification; LDL-C, low density lipoprotein cholesterol.

enm-32-90-g002.jpg

Fig. 3

Example of data table specification.

enm-32-90-g003.jpg

Fig. 4

Areas of focus in the 21,368 patients examined: (A) sex, (B) age, (C) type of statin, and (D) department.

enm-32-90-g004.jpg

Table 1

Extraction of Data Description Table of Clinical Research Data

Patient information Laboratory information Drug information Diagnosis information
ID number/birth date/age/sex/height/weight/BMI/systolic/diastolic BPDepartment of first prescriptionDate of first prescription/last prescriptionNo. of days of prescription in every visit WBC count/ANC/RBC count/platelet count/Hb/Hct/prothrombin time (INR)/activated PTT/ESR/hs-CRP/glucose/HbA1c/BUN/creatinine MDRD GFR/total bilirubinAST/ALT/ALP/γ-GTP/LDHTotal cholesterol/triglyceride/HDL-C/LDL-CSodium/potassium/CPK/PSA/free PSA/TSH/free T4 Atorvasatin/fluvastatin/pitavastatin/pravastatin/rosuvastatin/simvastatin/simvastatin plus ezetimibe complexFenofibrate/gemfibrozil/niacin/omega-3 fatty acidPropranolol/thyroxine/warfarin/bisphosphonate/metformin Hypertensive diseases/ischemic heart diseases/cerebrovascular diseases/aneurysm dissectionAbnormal/GTT/hyperglycemia/IFG/IGT/prediabetes/diabetes mellitusHypothyroidism/osteoporosis/fractureARF/ARF d/t RHABDO/nontraumatic RHABDORenal stone/calculus of kidneyErectile dysfunction/necrotizing myopathy/prostate cancer/cancer

Figure & Data

References

Citations

Citations to this article as recorded by

Development of Clinical Data Mart of HMG-CoA Reductase Inhibitor for Varied Clinical Research

Fig. 1 Design of clinical data mart for clinical trials on statins.

Fig. 2 Entity relationship diagram. ID, identification; LDL-C, low density lipoprotein cholesterol.

Fig. 3 Example of data table specification.

Fig. 4 Areas of focus in the 21,368 patients examined: (A) sex, (B) age, (C) type of statin, and (D) department.

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Development of Clinical Data Mart of HMG-CoA Reductase Inhibitor for Varied Clinical Research

Patient information Laboratory information Drug information Diagnosis information
ID number/birth date/age/sex/height/weight/BMI/systolic/diastolic BPDepartment of first prescriptionDate of first prescription/last prescriptionNo. of days of prescription in every visit WBC count/ANC/RBC count/platelet count/Hb/Hct/prothrombin time (INR)/activated PTT/ESR/hs-CRP/glucose/HbA1c/BUN/creatinine MDRD GFR/total bilirubinAST/ALT/ALP/γ-GTP/LDHTotal cholesterol/triglyceride/HDL-C/LDL-CSodium/potassium/CPK/PSA/free PSA/TSH/free T4 Atorvasatin/fluvastatin/pitavastatin/pravastatin/rosuvastatin/simvastatin/simvastatin plus ezetimibe complexFenofibrate/gemfibrozil/niacin/omega-3 fatty acidPropranolol/thyroxine/warfarin/bisphosphonate/metformin Hypertensive diseases/ischemic heart diseases/cerebrovascular diseases/aneurysm dissectionAbnormal/GTT/hyperglycemia/IFG/IGT/prediabetes/diabetes mellitusHypothyroidism/osteoporosis/fractureARF/ARF d/t RHABDO/nontraumatic RHABDORenal stone/calculus of kidneyErectile dysfunction/necrotizing myopathy/prostate cancer/cancer

Table 1 Extraction of Data Description Table of Clinical Research Data

ID, identification; BMI, body mass index; BP, blood pressure; WBC, white blood cell; ANC, absolute neutrophil count; RBC, red blood cell; Hb, hemoglobin; Hct, hematocrit; INR, international normalized ratio; PTT, partial thromboplastin time; ESR, erythrocyte sedimentation rate; hs-CRP, high-sensitivity C-reactive protein; HbA1c, glycated hemoglobin; BUN, blood urea nitrogen; MDRD GFR, modification of diet in renal disease glomerular filtration rate; AST, aspartate aminotransferase; ALT, alanine aminotransferase; ALP, alkaline phosphatase; γ-GTP, γ-glutamyl transpeptidase; LDH, lactate dehydrogenase; HDL-C, high density lipoprotein cholesterol; LDL-C, low density lipoprotein cholesterol; CPK, creatine phosphokinase; PSA, prostate-specific antigen; TSH, thyroid stimulating hormone; free T4, free thyroxine; GTT, glucose tolerance test; IFG, impaired fasting glucose; IGT, impaired glucose tolerance; ARF, acute renal failure; d/t, due to; RHABDO, rhabdomyolysis.

Table 1