投稿问答最小化  关闭

学会充分利用/挖掘不同的数据库

2022/5/9 15:50:26  阅读:200 发布者:

原创 毛毛弟 一起学科研 2022-05-08 17:52

“护”说科研

现在流行“大数据挖掘”,但更多是利用一些医学数据库;今天我们看一个别样的大数据挖掘工作

 

华中科技大学同济医学院附属同济医院肿瘤中心的吴边教授团队利用ClinicalTrials.gov数据库进行了横断面分析,发在IF 10+的高分期刊,没有特别难的统计学,最难的在于数据清洗工作,毕竟有上万条记录,从作者的检索日期可以看出这项工作开展了2.5年。

 

事实上,诸如我们自己的国自然数据库、中国临床试验注册中心等,都可以作为素材的来源,就看你是否有心了。

 

我们就看看文章的方法和结果部分,描述地简单易懂,没有难以理解的回归分析、建模等,就是常规的描述性统计。

 

 

 

Creation of the ClinicalTrials.gov dataset

 

创建ClinicalTrials.gov数据集

 

We conducted a cross-sectional analysis of oncological trials focusing on older patients registered in ClinicalTrials.gov (https://clinicaltrials.gov/). The ClinicalTrials.gov query and data download were performed on 17 December 2019, two decades after the launch of the registry. This study was considered exempt by the Institutional Review Board of Union Hospital, Tongji Medical College, Huazhong University of Science and Technology as it did not represent human participant research. We searched ClinicalTrials.gov using the keyword ‘Cancer’ as recruitment. In total, 61,119 registered clinical studies were identified. Two oncologists manually and independently reviewed all the trials, and a third one was consulted to resolve any disagreements. Finally, Figure 1. Flowchart identifying trials registered on ClinicalTrials.gov from 2000 to 2019. we restricted our analysis to interventional oncological trials registered between 1 January 2000, and 17 December 2019 (n = 49,273) (Figure 1).

 

我们对ClinicalTrials.govhttps://clinicaltrials.gov/)注册的专注于老年患者的肿瘤试验进行了横断面分析。ClinicalTrials.gov的查询和数据下载是在20191217日进行的,也就是在该注册表推出20年后。本研究被华中科技大学同济医学院附属协和医院机构审查委员会视为豁免,因为它不代表人类参与研究。我们以“癌症”为关键词搜索了ClinicalTrials.gov的招募。总共发现了61,119项注册的临床研究。两位肿瘤专家手工独立审查了所有的试验,并咨询了第三位专家以解决任何分歧。分析仅限于200011日至20191217日之间注册的干预性肿瘤试验(n = 49,273)。

 

In order to classify these trials as older-patient-specific, we used the field of ‘minimum age’. Trials using a lower age limit of 65 years or older were considered older-patient specific and were divided into two temporal subsets (1 January 2000, to 31 December 2009, and 1 January 2010, to 17 December 2019) to investigate the chronological shifts.

 

为了将这些试验归类为老年患者专用,我们使用了“最低年龄”这一领域。使用65岁或以上年龄下限的试验被认为是针对老年患者的,并被分为两个时间子集(200011日至20091231日,以及201011日至20191217日),以调查时间上的转变。

 

Study variables

 

研究变量

 

The following 18 trial characteristics were assessed: (i) whether the trial was registered before participant enrollment; (ii) presence or absence of a data monitoring committee; (iii) trial phase; (iv) number of enrolled patients; (v) number of study arms; (vi) blinding methods; (vii) allocation; (viii) primary purpose of intervention; (ix) sponsor; (x) number of facilities; (xi) number of regions; (xii) trial site regions; (xiii) recruitment status; (xiv) type of intervention; (xv) sex of participants; (xvi) cancer types that each trial targeted; (xvii) lower age limits and upper age limits; and (xviii) end points. Funding sources were identified as National Institutes of Health (NIH), industry and other based on the submitted information of the field of ‘sponsor’, which comprised lead sponsors and collaborators. A trial was classified as NIH-funded if its lead sponsor or one of its collaborators were from the NIH without any from the industry, and industry-funded if its lead sponsor or one of its collaborators was from the industry without any from the NIH. The remaining were classified as ‘other’, which represented the trials funded by other government or academic institutions.

 

对以下18项试验特征进行了评估: (i)试验是否在参与者注册前已注册;(ii)是否有数据监测委员会;(iii)试验阶段;(iv)入组患者数量;(v)研究组别数量;(vi)保密方法;(vii)分配;(viii)干预的主要目的;(ix)赞助者;(x)场所数量;(xi)地区数量;(xii)试验地点地区;(xiii)招募情况;(xiv)干预类型;(xv)参与者的性别;(xvi)每个试验针对的癌症类型;(xvii)年龄下限和年龄上限;以及(xviii)终点。根据提交的“赞助商”信息,资金来源被确定为美国国立卫生研究院(NIH)、工业和其他,其中包括主要赞助商和合作者。如果一项试验的主要赞助商或其合作者之一来自NIH,而没有来自工业界,则被归类为NIH资助;如果其主要赞助商或其合作者之一来自工业界,而没有来自NIH,则被归类为工业界资助。其余的被归类为“其他”,代表由其他政府或学术机构资助的试验。

 

Definition of trial end points

 

试验端点的定义

 

Two oncologists manually and independently identified all primary and secondary end points using the fields of ‘outcome-title’, ‘outcome-description’ or the detailed information of the submitted trials. Similarly, if there was any disagreement, the third oncologist was consulted. End points were divided into standard and patient-centred ones. The former included overall survival (OS), diseasespecific survival (DSS), composite and tumour-centred end points, biological parameters, AEs, pharmacokinetics and pharmacodynamics (PK-PD). The latter included completion of treatment, patient-reported outcomes (PROs), functional, cognitive and nutritional status, healthcare utilisation and economical end points. OS was defined as the period from randomisation/initiation to death from all causes, and DSS as the period from randomisation/initiation to death from specific cancer. Composite end points were a combination of different ones in a defined end point (Supplementary Table 1). Tumour-centred end points contained all measures of tumour evolution (including radiological assessment and biological markers evolution). AEs were defined as medical problems that occurred during treatment due to therapies or other factors, and completion as achieving the planned dose intensity of treatment, as well as compliance. Biological parameters included laboratory, genetic and tumour biology outcomes. As defined by the Food and Drug Administration, PROs referred to any report of health condition status that came directly from the patient, without interpretation by a clinician or anyone else [14]. Healthcare utilisation included healthcare use, readmission, length of hospital stay and other resources.

 

两位肿瘤学家利用“结果-标题”、“结果-描述”或提交的试验的详细信息,手动并独立地确定所有主要和次要的终点。同样,如果有任何分歧,也会咨询第三位肿瘤学家。终点分为标准终点和以患者为中心的终点。前者包括总生存期(OS)、特定疾病生存期(DSS)、综合和以肿瘤为中心的终点、生物参数、AEs、药代动力学和药效学(PK-PD)。后者包括完成治疗、患者报告的结果(PROs)、功能、认知和营养状况、医疗保健的利用和经济终点。OS被定义为从随机化/启动到死于所有原因的时间,DSS被定义为从随机化/启动到死于特定癌症的时间。综合终点是指在一个确定的终点中不同的终点的组合(补充表1)。以肿瘤为中心的终点包含了所有的肿瘤演变措施(包括放射学评估和生物标志物演变)。AEs被定义为治疗过程中由于治疗方法或其他因素而发生的医疗问题,完成度为达到计划的治疗剂量强度,以及依从性。生物学参数包括实验室、基因和肿瘤生物学结果。根据美国食品和药物管理局的定义,PROs指的是任何直接来自患者的健康状况报告,不需要临床医生或其他任何人的解释[14]。医疗保健的利用包括医疗保健的使用、再入院、住院时间和其他资源。

 

Statistical analysis

 

统计学分析

 

Categorical variables are described as frequencies and percentages, whereas continuous ones as medians and interquartile ranges (IQRs). Missing values were excluded from the analyses unless they could be inferred based on other related data. To compare trial characteristics and the evolution over the two temporal subsets, the Pearsons χ2 test was used, as well as Fishers exact test, if indicated. Data regarding cancer incidence and mortality in older cancer patients (age 65 years old) in 2017 were obtained from the Surveillance, Epidemiology and End Results Program (SEER). The number of older-patient-specific trials by cancer type was compared with the relative incidence and mortality for that cancer type. The correlation between the number of trials for a given type of cancer and the corresponding incidence or mortality was calculated using Spearman rank correlation statistic. Statistical significance was set at P < 0.05 (twosided). All analyses were performed using SPSS version 23.0 (IBM Corp).

 

分类变量以频率和百分比描述,而连续变量以中位数和四分位数范围(IQRs)描述。除非可以根据其他相关数据推断出缺失值,否则在分析中不考虑这些缺失值。为了比较试验特征和两个时间子集的演变,使用了Pearson's χ2检验,以及Fisher's精确检验(如有)。有关2017年老年癌症患者(年龄≥65岁)的癌症发病率和死亡率的数据来自监测、流行病学和最终结果计划(SEER)。按癌症类型划分的老年患者专用试验的数量与该癌症类型的相对发病率和死亡率进行了比较。使用Spearman等级相关统计法计算某类癌症的试验数量与相应发病率或死亡率之间的相关性。统计学意义被设定为P<0.05(双侧)。所有分析均使用SPSS 23.0版进行。

 

Results

 

结果

 

Age eligibilities of older-patient-specific oncological trials

 

针对老年患者的肿瘤学试验的年龄资格

 

Of the 49,273 registered interventional oncological trials registered between 1 January 2000, and 17 December 2019, 1.0% (n = 490) were specific to older patients (Figure 1). The most frequently used thresholds to define age were 70 years (44.9%), and 65 years (42.9%) (Supplementary Figure 1).

 

200011日至20191217日期间注册的49273项登记的肿瘤学试验中,有1.0%n = 490)是专门针对老年患者的(图1)。最常使用的定义年龄的阈值是70岁(44.9%),和65岁(42.9%)(补充图1)。

 

Characteristics of older-patient-specific oncological trials

 

针对老年患者的肿瘤学试验的特点

 

Table 1 shows the trial design characteristics of the oncological trials included in this study. Phase 2 trials accounted for the highest proportion of trials specific to older patients as well as other oncological trials (58.7 vs 43.4%, respectively). Older-patient-specifific trials were significantly less likely to be phase 1 (8.7 vs 24.5%; P < 0.001) and more likely to be larger scale studies than other oncological trials. The median number of patients per trial was 73.5 (IQR, 38.0–180.0) for trials specific to older patients and 51.0 (IQR, 25.0–128.0) for other oncological trials. Regarding the primary purpose of the study, trials specific to older patients were more likely to be supportive care oriented (10.0 vs 5.8%; P < 0.001) and less likely to be diagnostic (2.4 vs 5.8%; P < 0.001) compared to other oncological trials. Moreover, trials specific to older patients were less likely to be funded by the industry (26.9 vs 37.1%; P < 0.001) or the NIH (14.9 vs 17.6%; P < 0.001), but more likely to be funded by other governmental or academic resources (58.2 vs 45.4%; P < 0.001). Geographical region differences were also evident. Trials specific to older patients were more likely to be conducted in Europe (44.5 vs 28.3%; P < 0.001), and less likely to be conducted in the United States or Canada (43.4 vs 60.2%; P < 0.001) and Asia (16.8 vs 21.5%; P = 0.014). Only 4.5% of the trials specific to older patients were conducted in multiple geographic regions ( 2), which was less common than that of other oncological trials (9.4%).

 

1显示了本研究中包括的肿瘤学试验的试验设计特点。2期试验在老年患者专用试验以及其他肿瘤试验中占比最高(分别为58.7% vs 43.4%)。专门针对老年患者的试验明显少于一期试验(8.7 vs 24.5%P < 0.001),而且更可能是比其他肿瘤试验更大规模的研究。针对老年患者的试验,每项试验的患者人数中位数为73.5IQR38.0-180.0),其他肿瘤试验的患者人数中位数为51.0IQR25.0-128.0)。关于研究的主要目的,与其他肿瘤试验相比,专门针对老年患者的试验更有可能以支持性护理为导向(10.0 vs 5.8%P < 0.001),而诊断性试验的可能性较小(2.4 vs 5.8%P < 0.001)。此外,针对老年患者的试验不太可能由工业界(26.9 vs 37.1%P < 0.001)或NIH14.9 vs 17.6%P < 0.001)资助,但更可能由其他政府或学术资源资助(58.2 vs 45.4%P < 0.001)。地理区域的差异也很明显。针对老年患者的试验更可能在欧洲进行(44.5 vs 28.3%P < 0.001),而在美国或加拿大(43.4 vs 60.2%P < 0.001)和亚洲(16.8 vs 21.5%P = 0.014)进行的可能性较低。只有4.5%专门针对老年患者的试验是在多个地理区域进行的(≥2),这比其他肿瘤试验(9.4%)更少见。

 

Chronological shifts in older-patient-specific trial characteristics

 

老年患者专用试验特征的时间变化

 

Supplementary Figure 2 shows the number of oncological trials specific to older patients registered on ClinicalTrials.gov per year and the proportion of these trials in all oncological trials per year. An upward trend of this proportion was observed from 2000 (0.4%) to 2013 (1.3%), peaking at 1.3% in 2013. This was followed by a downward trend which reached the lowest level in 2017 (0.7%).

 

补充图2显示了每年在ClinicalTrials.gov上注册的专门针对老年患者的肿瘤试验数量,以及这些试验在每年所有肿瘤试验中的比例。从2000年(0.4%)到2013年(1.3%),这一比例呈上升趋势,在2013年达到峰值1.3%。随后出现了下降趋势,在2017年达到最低水平(0.7%)。

 

Table 2 shows the characteristics of oncological trials divided by two temporal subsets (1 January 2000 to 31 December 2009, and 1 January 2010 to 17 December 2019). Generally, the proportion of trials specific to older patients remained stable at 1.0% between these two time periods. Treatment-oriented trials decreased significantly from 91.9 to 70.0% (P < 0.001) over the two time periods, whereas supportive care-oriented trials increased from 1.9 to 13.9% (P < 0.001). Trials in the second period between 2010 and 2019 were less likely to include drug (57.0 vs 81.3%; P < 0.001) and biological (2.7 vs 7.5%; P < 0.001) interventions, but behavioural (7.9 vs 3.8%; P < 0.001), radiation (6.4 vs 1.3%; P < 0.001) and device (3.6 vs 0.0%; P < 0.001) interventions were more likely. Over the two time periods, industry-funded trials decreased from 34.4 to 23.3% (P = 0.004) and NIH-funded ones from 18.1 to 13.3% (P = 0.004). Older-patient-specific trials conducted in Asia increased from 10 out of 155 (6.5%) to 56 out of 310 (18.1%) (P = 0.001) between the two time periods.

 

2显示了按两个时间子集(200011日至20091231日,以及201011日至20191217日)划分的肿瘤试验的特点。一般来说,在这两个时间段,专门针对老年患者的试验比例稳定在1.0%。在这两个时间段内,以治疗为导向的试验从91.9%明显下降到70.0%P<0.001),而以支持性护理为导向的试验则从1.9%上升到13.9%P<0.001)。2010年至2019年第二个时期的试验不太可能包括药物(57.0 vs 81.3%P < 0.001)和生物(2.7 vs 7.5%P < 0.001)干预,但行为(7.9 vs 3.8%P < 0.001)、放射(6.4 vs 1.3%P < 0.001)和设备(3.6 vs 0.0%P < 0.001)干预的可能性更大。在这两个时间段内,行业资助的试验从34.4%下降到23.3%P = 0.004),NIH资助的试验从18.1下降到13.3%P = 0.004)。在两个时间段内,在亚洲进行的针对老年患者的试验从155项中的10项(6.5%)增加到310项中的56项(18.1%)(P = 0.001)。

 

Evolution of end points in older-patient-specific trials

 

针对老年患者的试验中终点的演变

 

In this analysis, trials with missing data regarding ‘phase’ (n = 123) or ‘end points’ (n = 9) in ClinicalTrials.gov were excluded. Thus, a total of 358 trials specific to older patients were included. Figure 2 shows the distribution of end points over the two time periods 2000–2009 and 2010– 2019. Supplementary Figure 3 shows the comparison of proportions of end points reported among each phase and Supplementary Table 2 shows the distribution of end points in the two temporal subsets 2000–2009 and 2010–2019 in each phase in detail.

 

在这项分析中,排除了ClinicalTrials.gov中关于“阶段”(n = 123)或“终点”(n = 9)数据缺失的试验。因此,总共有358项专门针对老年患者的试验被纳入。图2显示了2000-2009年和2010-2019年两个时间段的终点分布情况。补充图3显示了每个阶段之间报告的终点比例的比较,补充表2详细显示了2000-2009年和2010-2019年两个时间子集的终点在每个阶段的分布。

 

In total, in terms of all end points, OS (58.9 vs 43.9%; P = 0.005), composite end points (51.0 vs 40.3%; P = 0.028), tumour-centred end points (79.5 vs 48.2%; P < 0.001) and AEs (60.3 vs 45.8%; P = 0.003) were decreased significantly over the two time periods. In contrast, patient-centred end points did not change significantly with time (43.0 vs 47.9%, P < 0.324), such as completion (4.6 vs 5.5%; P = 0.707), PROs (22.5 vs 22.1%; P = 0.093), functional (8.6 vs 9.1%; P = 0.864), cognitive (0.7 vs 2.7%; P = 0.141) and nutritional status (0.7 vs 1.5%; P = 0.434), as well as healthcare utilisation (6.0 vs 7.0%; P = 680). PROs accounted for the majority (48.0%) of these patientcentred end points. In terms of primary end points, the use of composite end points, tumour-centred end points and PK-PD decreased over time, of which composite end points (20.5 vs 14.2%; P = 0.001) and tumour-centred end points fell significantly (49.0 vs 27.0%; P < 0.001), while patient-centred end points exhibited an increase over time (11.9 vs 31.2%; P < 0.001), especially the use of PROs that increased considerably from 2.6 to 10.6% (P = 0.003), although their proportions remained lower than those of standardised end points. In addition, the use of clinically meaningful end points in the older patients, such as DSS, PROs, and functional status, as a primary end point were uncommon (0.4, 8.1 and 7.3%, respectively).

 

总的来说,在所有终点方面,OS58.9 vs 43.9%P = 0.005)、复合终点(51.0 vs 40.3%P = 0.028)、以肿瘤为中心的终点(79.5 vs 48.2%P < 0.001)和AEs60.3 vs 45.8%P = 0.003)在两个时间段内明显下降。相比之下,以患者为中心的终点没有随着时间的推移而发生显著变化(43.0 vs 47.9%P < 0.324),如完成度(4.6 vs 5.5%P = 0.707)、PROs22.5 vs 22.1%P = 0. 093)、功能(8.6 vs 9.1%P = 0.864)、认知(0.7 vs 2.7%P = 0.141)和营养状况(0.7 vs 1.5%P = 0.434),以及医疗保健的利用(6.0 vs 7.0%P = 680)。在这些以患者为中心的终点中,专业人员占了大多数(48.0%)。在主要终点方面,综合终点、以肿瘤为中心的终点和PK-PD的使用随时间推移而减少,其中综合终点(20.5 vs 14.2%P = 0.001)和以肿瘤为中心的终点明显下降(49.0 vs 27.0%P < 0. 001),而以患者为中心的终点则随着时间的推移而增加(11.9 vs 31.2%P < 0.001),特别是PROs的使用从2.6%大幅增加到10.6%P = 0.003),尽管它们的比例仍然低于标准化终点的比例。此外,在老年患者中使用有临床意义的终点,如DSSPROs和功能状态作为主要终点的情况并不多见(分别为0.48.17.3%)。

 

In phase 1 trials, AEs were the most frequently used end points. The majority of phase 2 trials involved at least one OS, composite end point, tumour-centred end point and AEs. Significant increases in OS (57.1 vs 74.2%; P = 0.009), and composite end point (51.6 vs 73.3%; P = 0.020), vs were seen, with a considerable decrease in tumour-centred end points (92.3 vs 73.3%; P < 0.001) between the two time periods. All patient-centred end points of phase 2 trials,both primary and secondary, exhibited an upward trend (42.9 vs 83.3%; P < 0.001), especially in PROs (24.2 vs 38.3%; P = 0.029), vs and cognitive status (0.0 vs 5.0%; P = 0.030). In phase 3 trials, OS remained the most frequently reported end point (89.1%), followed by composite end points (75.0%), tumour-centred end points (70.3%), AEs (56.3%) and PROs (34.4%). No differences in the use of all end points or primary end points were observed between the two time periods.

 

1期试验中,AEs是最常使用的终结点。大多数2期试验至少涉及一个OS、综合终点、以肿瘤为中心的终点和AEs。在两个时间段内,OS57.1 vs 74.2%P = 0.009)和综合终点(51.6 vs 73.3%P = 0.020)都有明显的增加,而以肿瘤为中心的终点(92.3 vs 73.3%P < 0.001)则有明显的减少。2期试验的所有以患者为中心的终点,包括主要的和次要的,都呈现出上升的趋势(42.9 vs 83.3%P < 0.001),特别是在PROs24.2 vs 38.3%P = 0.029), vs 和认知状态(0.0 vs 5.0%P = 0.030)。在3期试验中,OS仍然是最常报告的终点(89.1%),其次是综合终点(75.0%)、以肿瘤为中心的终点(70.3%)、AEs56.3%)和PROs34.4%)。在两个时间段内,没有观察到所有终点或主要终点的使用差异。

 

Representation and characteristics of older-patient-specific trials among cancer subtypes

 

癌症亚型中老年患者专用试验的代表性和特点

 

Supplementary Figure 4 shows the distribution of common cancer types registered in ClinicalTrials.gov. In trials exclusively for the older patients, 40.0% (n = 196) were conducted in patients with haematological malignancies, followed by those with breast (15.3%), lung (9.4%) and colorectal (8.0%) cancers. Haematological malignancies were studied more often in older-patient-specific trials (40.0%) than other oncological trials (14.5%). Figure 3 shows the top 15 cancer types by proportion of trials conducted and their extent of matching with corresponding incidences and mortalities in older cancer patients in 2017 from data of SEER Program. There was no correlation between the number of trials on a given type of cancer and relative incidence or mortality (Spearman rank correlation coefficient, 0.146 [P = 0.604] and −0.445 [P = 0.096], respectively). The representation for leukaemia, at 27.3% of all trials specific to older patients, was much higher than either its incidence or mortality (3.1 and 2.9%, respectively). Lymphoma, which represented 15.3% of all trials specific to older patients, was also unproportionate to its incidence (4.6%) and mortality (4.4%). Meanwhile, melanoma had a low representation, at 0.1% of all trials specific to older patients despite incidence and mortality rates of 5.9% and 3.2%, respectively. Bladder cancer had an incidence and mortality rate of 5.8 and 5.5%, respectively, but it was the focus of only 0.4% of older-patient-specific trials. Supplementary Table 3 shows the detailed characteristics of trials specific to older patients based on common cancer types.

 

补充图4显示了在ClinicalTrials.gov注册的常见癌症类型的分布。在专门针对老年患者的试验中,40.0%n = 196)是针对血液恶性肿瘤患者进行的,其次是乳腺癌(15.3%)、肺癌(9.4%)和结直肠癌(8.0%)。在针对老年患者的试验中,血液学恶性肿瘤的研究(40.0%)比其他肿瘤试验(14.5%)更频繁。图3显示了按进行的试验比例排列的前15种癌症类型及其与2017年老年癌症患者的相应发病率和死亡率的匹配程度,这些数据来自SEER计划的数据。关于某类癌症的试验数量与相对发病率或死亡率之间没有相关性(Spearman等级相关系数,分别为0.146[P = 0.604]-0.445[P = 0.096])。白血病在所有针对老年患者的试验中占27.3%,远远高于其发病率或死亡率(分别为3.1%2.9%)。淋巴瘤在所有针对老年患者的试验中占15.3%,也与它的发病率(4.6%)和死亡率(4.4%)不相称。同时,黑色素瘤的代表性很低,在所有针对老年患者的试验中占0.1%,尽管发病率和死亡率分别为5.9%3.2%。膀胱癌的发病率和死亡率分别为5.8%5.5%,但它仅是0.4%老年患者专用试验的重点。补充表3显示了基于常见癌症类型的专门针对老年患者的试验的详细特征。

PS:学术成绩有限,在一些方面可能会存在错误,欢迎大家即时指正和批评!

 

如有侵权,请联系本站删除!


  • 万维QQ投稿交流群    招募志愿者

    版权所有 Copyright@2009-2015豫ICP证合字09037080号

     纯自助论文投稿平台    E-mail:eshukan@163.com