An artificial intelligence-enabled ECG algorithm for identifying ventricular premature contraction during sinus rhythm

Background Ventricular premature complex (VPC) is a common arrhythmia in clinical practice. VPC could trigger ventricular tachycardia/fibrillation or VPC-induced cardiomyopathy in susceptible patients. Existing screening methods require prolonged monitoring and are limited by cost and low yield when the frequency of VPC is low. Twelve-lead electrocardiogram (ECG) is low cost and widely used. We aimed to identify patients with VPC during normal sinus rhythm (NSR) using artificial intelligence (AI) and machine learning-based ECG reading. Methods We developed AI-enabled ECG algorithm using a convolutional neural network (CNN) to detect the ECG signature of VPC presented during NSR using standard 12-lead ECGs. A total of 2515 ECG records from 398 patients with VPC were collected. Among them, only ECG records of NSR without VPC (1617 ECG records) were parsed. Results A total of 753 normal ECG records from 387 patients under NSR were used for comparison. Both image and time-series datasets were parsed for the training process by the CNN models. The computer architectures were optimized to select the best model for the training process. Both the single-input image model (InceptionV3, accuracy: 0.895, 95% confidence interval [CI] 0.683–0.937) and multi-input time-series model (ResNet50V2, accuracy: 0.880, 95% CI 0.646–0.943) yielded satisfactory results for VPC prediction, both of which were better than the single-input time-series model (ResNet50V2, accuracy: 0.840, 95% CI 0.629–0.952). Conclusions AI-enabled ECG acquired during NSR permits rapid identification at point of care of individuals with VPC and has the potential to predict VPC episodes automatically rather than traditional long-time monitoring.


Introduction
Ventricular premature complex (VPC), also known as ventricular extrasystole, is a commonly encountered arrhythmia worldwide [1]. According to the previous studies, the prevalence of VPC is around 1-4% in the general populations on standard 12-lead electrocardiography (ECG) [2]. Additionally, increasing age, male gender, atherosclerosis, hypertension, and cardiomyopathy are related to higher occurrence of VPC [1]. Clinically, VPC without any symptoms have been seemed to be benign. However, frequent VPC attacks are associated with cardiomyopathy and irreversible pathogenesis [3]. Especially for those with structurally heart diseases, the incidence and complexity of VPC also increase, up to 90% in ischemic cardiomyopathy [2]. Thence, VPC seems *Correspondence: cttsai1999@gmail.com to be the signals for increasing risk of sudden death or the clues for underlying cardiomyopathy. Consequently, timely prediction and intervention of VPC attack might eliminate its arrhythmogenic source and reverse progressive cardiomyopathy.
Clinically, the conventional 12-lead electrocardiogram (ECG) has been used to monitor cardiac structure and physiological condition for decades. ECG is non-invasive, easy to use, rapid, low cost in the resource setting, and simple for interpretations [4]. Due to these characteristics, several ECG monitoring systems are exploited to analyze the signals of ECG [4]. In order to interpret these enormous amount data immediately, deep learning has been widely used to read ECG signals and artificial intelligence (AI) technique is suitable to process countless ECG signals without human intervention and offer accurate diagnoses automatically [4].
However, most of the patients present with intermittent VPC and occasionally all the ECG-related examinations or monitoring are negative for the definite diagnosis of VPC. We need a tool to identify patients with VPC using ECG during sinus rhythm. It has been shown that AIenabled ECG algorithm can identify patients with paroxysmal atrial fibrillation using ECG during sinus rhythm. In this study, we used the automatic deep-learning neural network to identify the high-risk VPC populations using their ECGs during sinus rhythm for VPC attack to facilitate point of care and hope to prevent severe cardiovascular events in advance.

Data collection and parsing
The data were collected from patients with the diagnosis of VPC at the National Taiwan University Hospital, Taipei, Taiwan from Jan/2021 to Oct/2021. Initially, 398 patients were enrolled and 2515 ECG records were checked. Only ECG during sinus rhythm without the diagnoses of VPC was parsed and finally 1617 ECG records were double-checked by two cardiologists and labeled as sinus rhythm from patients with VPC. For the control group, 1053 patients with 2090 ECG records were collected and screened. Finally, 753 normal ECG records from 387 patients were picked up and marked as normal sinus rhythm (NSR). This study was approved by the ethics committee and institutional review board (IRB) on human research of the Medical Research Department of National Taiwan University Hospital, Taipei, Taiwan (IRB NO: 201705122RINC) and informed consent was waived because identification data on ECGs were removed before they were sent for analyses.

Dataset preparation
The datasets were divided into the training set, validation set, and test set. First, 50 ECG records were chosen randomly for the validation set and another 100 ECG records were selected for the test set. The rest of the data were assigned to the training set. Importantly, the data of the same patient could not belong to more than one dataset, otherwise, it would affect the credibility of the final results.

Data type and pre-process
The ECG records collected were in the format of standard 12-lead ECG images, including lead I, II, III, V1 ~ 6, aVR, aVL, aVF, and long lead II (MAC2000 resting ECG System, GE Healthcare). All the records were measured at the frequency of 500 Hz and duration was 2.5 s. Before data analysis, the red-grid backgrounds of the ECG images were removed and coped to make the whole images to be precisely focused on the ECG signals ( Fig. 1).
After that, the ECG images were adjusted to be 512 × 256 × 3 pixels. The two-dimensional ECG images were converted into the one-dimensional and time-series data. The input data size was 1250 × 12 pixels for convolutional neural network (CNN) to perform the image recognition (Fig. 2). Fig. 1 The ECG image processing process before input. a The standard 12-lead ECG image. b The red-rid background of the 12-lead ECG images was removed. c The image was cropped to be focused on the ECG signals

Models process
We set up CNN models according to the dimensional characteristics of the data formats. For the 2-dimensional image data, we used five network computer architectures, including VGG16 [5], ResNet50V2 [6], InceptionV3 [7], InceptionResNetV2 [8], and Xception [9] to get the best image recognition with the Image Net part of CNN (Fig. 3a). After the features of the image data were extracted by CNN, the signals were flattened by Global Average Pooling (GAP) [10] and another dense layer was connected. Dropout was added to avoid overfitting later on (drop rate = 0.5) (Fig. 3b) [11]. Finally, another dense layer with a size of two was added, which represented two-type results as output layer (VPC and NSR) (Fig. 3b).
For the time-series data, we used single-input and multiple-input computer architectures for the models processing. Initially, we changed the convolutional kernel into a one-dimensional kernel and different kernel sizes were tried by the CNN. The stride was set to three and the moving window of the convolutional kernel spans three grids at once. Each convolutional block was composed of one-dimensional CNN activation by BatchNormalization [12] and ReLU [13]. The setting of Maxpooling [14] was pooling size equal to 5 and stride equal to 3. After the signals of features were extracted through the CNN layers, they were flattened by GAP. The output features of Fig. 2 The ECG data input format. a The red-grid background of the 12-lead ECG image was removed and ECG was converted to a gray-scale image. b The pixel intensity was inversed and the pixel intensity was made to 255 pixels. The image was cut vertically into four sub-images according to the "start" and "end" position of each lead. c Pixel-wise scanning sub-images and recording the position where the pixel intensity was equal to 255 pixels. d The closest position of the signal was grouped. Each column was split into four values and all values of the columns were synthesized into four lists in each lead. The signals were transferred to be the time-series formats. e The column of each sub-image consisted of 250 pixels. After pixel-wise scanning, one lead with 250 time-series data was formatted. The interpolation operations were used to perform up-sampling for the time-series data (500 Hz, 2.5 s). f The IIR low-pass filter was used to filtered the noise (cut-off frequency = 15 Hz, order = 3). g The magnitude of each lead was normalized into a unified scale the single-input model were directly connected to dropout (dropout rate = 0.5) to avoid overfitting (Fig. 3c). On the other hand, the multiple-input model merged twelve channels' features together and connected to one dense layer (dense size = 2) to get the output result (Fig. 3c).

Training process
We used Google Colaboratory (Colab) [15] with high-Random Access Memory Graphics Processing Unit environment as the training platform. This Colab was supported by the Python 3.8 and Tensorflow package [16] for CNN training process. We also used the keras Application Programming Interface (API) (one deep-learning API written in Python) to build CNN models and Ima-geNet competition for transferring and learning. The settings of the APIs and the training parameters are shown in Table 1.

Statistical analysis
Optimal cut-points and measurements of diagnostic performance included accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve (AUC) of the receiver operating characteristic curve (ROC). All were reported with 2-sided 95% confidence interval. The data were analyzed by IBM SPSS (Version 25 for Windows, Armonk, New York) for statistical analysis.  (Fig. 4). The AUC of the ROC for this model architecture was 0.941 (Fig. 5).

Performance of the time-series-input model
For the time-series data, we evaluated different sizes of convolution kernels to find the best combination. The best kernel size was 7 to perform the single-input model and 11 for the multi-input model ( Table 2). In the multi-input model, the CNN channel needed to analyze the signals of all the twelve leads at the same time. The complexity was relatively higher than that of the single-input model which just needed to analyze one-lead signal. Additionally, the multi-input model used parallelization of analysis. Therefore, the accuracy of the multi-input model was 4% higher than the singleinput model (single-input model: 0.840 and multi-input model: 0.880, 95% CI) ( Table 3). The accuracy of the multi-input time-series model was still lower than but very close to that of the image-input model (0.880 vs. 0.895).

Highlights
In this study, our AI model enabled to record ECG signals and detect the presence of VPC during normal sinus rhythm (AUC: 0.941). The accuracy was comparable with a previous study using AI-enable ECG to identify AF during normal sinus rhythm (AUC: 0.87; 95% CI: 0.86-0.88) [17] and were better with other medical screening tests such as CHADS2 score (AUC: 0.64; 95% CI: 0.56-0.72 and CHA2DS2-VASc score (AUC: 0.67; 95% CI, 0.60-0.74) for prediction of ischemic strokes [18].   The importance for VPC detection during sinus rhythm Although VPC seems to be benign, it is associated with increasing cardiovascular events. From the Framingham Heart study [19], the Multiple Risk Factor Intervention Trial (MRFIT) [20], and the Atherosclerosis Risk in Communities Study (ARIC) studies [21], VPC has been demonstrated as an independent risk factor for mortalities of the patients without structural heart diseases [1]. VPC is also recognized to trigger ventricular tachycardia/fibrillation and cause sudden cardiac death (SCD) or unexplained syncope in patients without ischemic cardiomyopathy [1]. Additionally, patients with frequent VPCs (defined as > 1 VPC on a 10-s ECG or > 30 VPCs in an hour) are associated with incent heart failure and sudden cardiac death [1]. Besides, patients with frequent VPCs are risky to suffer from VPC-induced cardiomyopathy even though they are asymptomatic [1]. The ability to identify undetected VPC with an inexpensive, widely available, point-of-care test-an ECG recorded during normal sinus rhythm-has important practical implications, particularly for VPC screening efforts or for the management of patients with unexplained syncope or chest discomfort, especially for those with a familial history of SCD. This study shows the power of leveraging modern computing technology, large datasets, non-linear models, and automated features extraction using convolution layers to potentially improve diagnosis and treatment of a disease with a lifethreatening state. When VPC is found, treatment could be initiated early. Catheter ablation significantly improves the outcome [22]. Several large, prospective, randomized studies have also shown that implantation of implantable cardioverter defibrillator (ICD) improves survivals for those with life-threatening ventricular arrhythmia [3,23].
Prolonged ambulatory monitoring of patients with unexplained syncope or SCD may identify VPCs. Thus, short-term monitoring may under-detect VPC and leave a substantial proportion of patients unprotected from SCD until such time as VPC is detected. However, prolonged monitoring is expensive and can prove a burden to patients and clinical practices. Thus, identifying those patients who would most benefit from intensive monitoring would be valuable in patients with aborted. Our data indicate that a simple, inexpensive, non-invasive, 10-s test-the AI-enhanced standard ECG-might permit identification of patients with under-detected VPC. Further investigations will be necessary to confirm the diagnostic performance of AI-enabled ECG in specific populations, such as patients with SCD or unexplained syncope and chest tightness, to determine whether AIenabled ECG could be used to refine the selection of candidates for prolonged ambulatory cardiac rhythm monitoring or to guide treatment in these patients.

The dimensionality of 12-lead ECG data
While applying CNN analysis in the 12-lead ECG, the one-dimensional approach treats the ECG data as a timeseries format. On the other hand, CNN extracts all the features of 12-lead ECG with kernels during two-dimensional data processing. The CNN kernels could be activated by specific wave patterns and recognized by the neural network analysis subsequently [24]. Therefore, two-dimensional analysis is taking the data as an image, more similar to the cardiologist's way to interpret the 12-lead ECG. However, the two-dimensional data volume is gigantic and much complicated than the one-dimensional data format. Therefore, the general AI tools could not analyze the 12-lead ECG stored with images format [25]. In order to encounter difficulties to analyze these large amounts and complicated two-dimensional data, we used several networks available and different computer architecture combinations to get the best accuracy of VPC prediction by the CNN model. The CNN-based model for VPC prediction from the two-dimensional data was the important feature of this study. This had not been performed successfully before. After optimizing the input model architecture, our two-dimensional CNN model could identify the abnormal ECG and classify the high-risk populations before VPC attacked by the automatic learning paradigm.
From the previous study, the AI-driven algorithms had been applied in automatic diagnosis for various diseases [26], such as myocardial infarction needing urgent revascularization [24], systolic heart failure [25], subtle potassium change among the high-risk populations [26], and atrial fibrillation [25][26][27]. However, most of these studies were based on the single-lead ECG or one-dimensional (time-series) datasets. From our results, the CNN model derived from the 12-lead ECG and two-dimensional data format was reliable to predict VPC attack automatically and the accuracy was even better than one-dimensional or time-series results (0.895 vs. 0.880). Our study demonstrated the possibility to implement CNN model to identify VPCs using either one-dimensional or twodimensional data.

Mechanism by which AI could identify patients with VPC under normal sinus rhythm
The structural changes that underline VPC, which might include myocyte hypertrophy, fibrosis, and chamber enlargement, are likely to lead to subtle ECG changes, allowing for prediction of underlying VPC. This is very similar to using signal average ECG to detect late potentials that could not be observed by human eyes through a single ECG [28,29]. Furthermore, although seldom reported on ECGs, subtle intraventricular block may correlate with both subtle myocardial fibrosis and risk of VPC or SCD [30]. Thus, it is possible that wavelets on the ECG smaller than the readily observable wave might reflect regional conduction block in these patients. A neural network trained with exposure to plenty of ECGs and with sufficient depth to extract and recall subtle features not routinely appreciated or formally reported by human observers might be powerful enough to identify such features. Finally, it has been reported that AI-enabled ECG may predict left ventricular function [31], and lower left ventricular ejection fraction has been shown as a strong predictor of ventricular arrhythmia [32].

Limitations
This is one-center study. The results of our observational study may justify future randomized clinical trials for this purpose.

Conclusions
In this study, the CNN neural network demonstrated as a promising tool for comprehensively human-like interpretation of the ECG. The deep-learning CNN model showed a satisfactory performance in the high-dimensional datasets for the VPC prediction. It will have a great potential deployment in the clinical arena and largely unpredictable implications in the future. However, a key limitation in existing neural networks is explainable. Identifying these features could be of importance because they might offer novel findings that could provide new therapeutic targets or allow for more certainty for clinicians who are otherwise trying to understand what drives the network's interpretation. Finding ways to peer into this so-called black box is an area of active ongoing investigation.