Overview

Brought to you by YData

Dataset statistics

Number of variables8
Number of observations10756
Missing cells8503
Missing cells (%)9.9%
Total size in memory1014.3 KiB
Average record size in memory96.6 B

Variable types

Categorical7
Numeric1

Alerts

source is highly imbalanced (73.4%) Imbalance
gender has 801 (7.4%) missing values Missing
age_at_diagnosis has 1290 (12.0%) missing values Missing
tumor_grade has 6412 (59.6%) missing values Missing

Reproduction

Analysis started2025-07-30 12:00:59.142777
Analysis finished2025-07-30 12:00:59.222247
Duration0.08 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

category
Categorical

Distinct54
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size426.1 KiB
Blood or Bone marrow Acute myeloid leukemia
 
753
Uterus Endometrioid adenocarcinoma
 
523
Breast Infiltrating duct carcinoma
 
486
Lung Adenocarcinoma
 
482
Head and Neck Squamous cell carcinoma
 
474
Other values (49)
8038 

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowColon Adenocarcinoma
2nd rowColon Adenocarcinoma
3rd rowStomach Carcinoma
4th rowColon Adenocarcinoma
5th rowBrain Glioblastoma

Common Values

ValueCountFrequency (%)
Blood or Bone marrow Acute myeloid leukemia 753
7.0%
Uterus Endometrioid adenocarcinoma 523
4.9%
Breast Infiltrating duct carcinoma 486
4.5%
Lung Adenocarcinoma 482
4.5%
Head and Neck Squamous cell carcinoma 474
4.4%
Thyroid gland Papillary carcinoma 473
4.4%
Lung Squamous cell carcinoma 431
4.0%
Kidney Renal cell carcinoma 387
3.6%
Lung Healthy 360
3.3%
Colon Adenocarcinoma 359
3.3%
Skin Malignant melanoma 350
3.3%
Prostate gland Adenocarcinoma 330
3.1%
Kidney Healthy 327
3.0%
Cervix uteri Squamous cell carcinoma 325
3.0%
Liver Hepatocellular carcinoma 321
3.0%
Brain Glioblastoma 317
2.9%
Stomach Carcinoma 311
2.9%
Bladder Transitional cell carcinoma 288
 
2.7%
Pancreas Infiltrating duct carcinoma 260
 
2.4%
Kidney Clear cell adenocarcinoma 258
 
2.4%
Kidney Papillary adenocarcinoma 221
 
2.1%
Blood or Bone marrow Healthy 216
 
2.0%
Prostate gland Acinar cell carcinoma 182
 
1.7%
Brain Oligodendroglioma 167
 
1.6%
Breast Lobular carcinoma 159
 
1.5%
Brain Astrocytoma 154
 
1.4%
Adrenal gland Pheochromocytoma 135
 
1.3%
Kidney Wilms tumor 131
 
1.2%
Blood or Bone marrow Chronic lymphocytic leukemia 117
 
1.1%
Ovarian Serous cancer 99
 
0.9%
Breast Healthy 92
 
0.9%
Head and Neck Healthy 88
 
0.8%
Uterus Serous cystadenocarcinoma 86
 
0.8%
Adrenal gland Neuroblastoma 75
 
0.7%
Bones Osteosarcoma 74
 
0.7%
Esophagus Squamous cell carcinoma 73
 
0.7%
Pancreas Healthy 73
 
0.7%
Thymus Thymoma 68
 
0.6%
Esophagus Adenocarcinoma 67
 
0.6%
Testis Seminoma 61
 
0.6%
Kidney Malignant rhabdoid tumor 60
 
0.6%
Adrenal gland Adrenal cortical carcinoma 58
 
0.5%
Prostate gland Healthy 58
 
0.5%
Blood or Bone marrow Acute lymphocytic leukemia 50
 
0.5%
Thyroid gland Healthy 49
 
0.5%
Liver Healthy 47
 
0.4%
Blood or Bone marrow Acute myelomonocytic leukemia 41
 
0.4%
Cervix uteri Adenocarcinoma 39
 
0.4%
Colon Healthy 39
 
0.4%
Retroperitoneum Dedifferentiated liposarcoma 36
 
0.3%
Anterior mediastinum Thymoma 33
 
0.3%
Pleura Epithelioid mesothelioma 33
 
0.3%
Retroperitoneum Leiomyosarcoma 30
 
0.3%
Uterus Healthy 30
 
0.3%

gender
Categorical

Missing 

Distinct2
Distinct (%)< 0.1%
Missing801
Missing (%)7.4%
Memory size426.1 KiB
male
5214 
female
4741 

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfemale
2nd rowfemale
3rd rowfemale
4th rowfemale
5th rowmale

Common Values

ValueCountFrequency (%)
male 5214
48.5%
female 4741
44.1%
(Missing) 801
 
7.4%

Common Values (Plot)

2025-07-30T12:00:59.269349image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

age_at_diagnosis
Real number (ℝ)

Missing 

Distinct6593
Distinct (%)69.6%
Missing1290
Missing (%)12.0%
Infinite0
Infinite (%)0.0%
Mean20549.64769
Minimum3
Maximum32872
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size426.1 KiB
2025-07-30T12:00:59.332803image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile4111.25
Q117692
median21936
Q325237.5
95-th percentile29188.75
Maximum32872
Range32869
Interquartile range (IQR)7545.5

Descriptive statistics

Standard deviation6914.918224
Coefficient of variation (CV)0.33649814
Kurtosis1.144775185
Mean20549.64769
Median Absolute Deviation (MAD)3639.5
Skewness-1.136332396
Sum194522965
Variance47816094.05
MonotonicityNot monotonic
2025-07-30T12:00:59.399093image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
32872 31
 
0.3%
20550 9
 
0.1%
24594 8
 
0.1%
22718 8
 
0.1%
21891 7
 
0.1%
32871 7
 
0.1%
20286 7
 
0.1%
23404 7
 
0.1%
22433 6
 
0.1%
21519 6
 
0.1%
24927 6
 
0.1%
22241 6
 
0.1%
21949 6
 
0.1%
23226 6
 
0.1%
18748 6
 
0.1%
17349 6
 
0.1%
24745 6
 
0.1%
22179 6
 
0.1%
21144 5
 
< 0.1%
26002 5
 
< 0.1%
18885 5
 
< 0.1%
20625 5
 
< 0.1%
25123 5
 
< 0.1%
20392 5
 
< 0.1%
24731 5
 
< 0.1%
23275 5
 
< 0.1%
25788 5
 
< 0.1%
20747 5
 
< 0.1%
22407 5
 
< 0.1%
19314 5
 
< 0.1%
26559 5
 
< 0.1%
28865 5
 
< 0.1%
23653 5
 
< 0.1%
21020 5
 
< 0.1%
21655 5
 
< 0.1%
17465 5
 
< 0.1%
22957 5
 
< 0.1%
20425 5
 
< 0.1%
23634 5
 
< 0.1%
22057 5
 
< 0.1%
20369 5
 
< 0.1%
24990 5
 
< 0.1%
26555 5
 
< 0.1%
17760 5
 
< 0.1%
24887 5
 
< 0.1%
25462 5
 
< 0.1%
21514 5
 
< 0.1%
31016 5
 
< 0.1%
26781 5
 
< 0.1%
23458 5
 
< 0.1%
20507 5
 
< 0.1%
5594 5
 
< 0.1%
21263 5
 
< 0.1%
24631 5
 
< 0.1%
19855 5
 
< 0.1%
24122 5
 
< 0.1%
19939 5
 
< 0.1%
21232 4
 
< 0.1%
25020 4
 
< 0.1%
26252 4
 
< 0.1%
23353 4
 
< 0.1%
27320 4
 
< 0.1%
19500 4
 
< 0.1%
23923 4
 
< 0.1%
25223 4
 
< 0.1%
28108 4
 
< 0.1%
22997 4
 
< 0.1%
27636 4
 
< 0.1%
24055 4
 
< 0.1%
22205 4
 
< 0.1%
22199 4
 
< 0.1%
21488 4
 
< 0.1%
22220 4
 
< 0.1%
24837 4
 
< 0.1%
18932 4
 
< 0.1%
19479 4
 
< 0.1%
25138 4
 
< 0.1%
24663 4
 
< 0.1%
25991 4
 
< 0.1%
18843 4
 
< 0.1%
24187 4
 
< 0.1%
27341 4
 
< 0.1%
8551 4
 
< 0.1%
26600 4
 
< 0.1%
25495 4
 
< 0.1%
17215 4
 
< 0.1%
5082 4
 
< 0.1%
24019 4
 
< 0.1%
24328 4
 
< 0.1%
25659 4
 
< 0.1%
17930 4
 
< 0.1%
22285 4
 
< 0.1%
26582 4
 
< 0.1%
25502 4
 
< 0.1%
1757 4
 
< 0.1%
26490 4
 
< 0.1%
22238 4
 
< 0.1%
27971 4
 
< 0.1%
26084 4
 
< 0.1%
21429 4
 
< 0.1%
12822 4
 
< 0.1%
30462 4
 
< 0.1%
23892 4
 
< 0.1%
27104 4
 
< 0.1%
26268 4
 
< 0.1%
21501 4
 
< 0.1%
19161 4
 
< 0.1%
21464 4
 
< 0.1%
24640 4
 
< 0.1%
25091 4
 
< 0.1%
19213 4
 
< 0.1%
29350 4
 
< 0.1%
22934 4
 
< 0.1%
32848 4
 
< 0.1%
24382 4
 
< 0.1%
20974 4
 
< 0.1%
23927 4
 
< 0.1%
22558 4
 
< 0.1%
22056 4
 
< 0.1%
24775 4
 
< 0.1%
26008 4
 
< 0.1%
730 4
 
< 0.1%
26748 4
 
< 0.1%
24110 4
 
< 0.1%
25756 4
 
< 0.1%
24623 4
 
< 0.1%
17807 4
 
< 0.1%
22471 4
 
< 0.1%
25827 4
 
< 0.1%
19620 4
 
< 0.1%
18611 4
 
< 0.1%
21064 4
 
< 0.1%
24846 4
 
< 0.1%
22176 4
 
< 0.1%
24969 4
 
< 0.1%
25141 4
 
< 0.1%
19986 4
 
< 0.1%
28714 4
 
< 0.1%
20349 4
 
< 0.1%
19818 4
 
< 0.1%
27798 4
 
< 0.1%
21183 4
 
< 0.1%
24282 4
 
< 0.1%
24986 4
 
< 0.1%
29529 4
 
< 0.1%
21402 4
 
< 0.1%
20598 4
 
< 0.1%
23656 4
 
< 0.1%
29873 4
 
< 0.1%
23413 4
 
< 0.1%
25627 4
 
< 0.1%
21323 4
 
< 0.1%
22156 4
 
< 0.1%
23543 4
 
< 0.1%
20237 4
 
< 0.1%
14933 4
 
< 0.1%
27162 4
 
< 0.1%
24825 4
 
< 0.1%
24311 4
 
< 0.1%
22041 4
 
< 0.1%
23227 4
 
< 0.1%
23318 4
 
< 0.1%
21194 4
 
< 0.1%
26377 3
 
< 0.1%
22745 3
 
< 0.1%
24041 3
 
< 0.1%
20946 3
 
< 0.1%
22684 3
 
< 0.1%
20493 3
 
< 0.1%
23696 3
 
< 0.1%
25477 3
 
< 0.1%
24797 3
 
< 0.1%
22366 3
 
< 0.1%
24778 3
 
< 0.1%
23650 3
 
< 0.1%
24180 3
 
< 0.1%
22025 3
 
< 0.1%
24138 3
 
< 0.1%
19964 3
 
< 0.1%
21803 3
 
< 0.1%
27899 3
 
< 0.1%
24802 3
 
< 0.1%
26274 3
 
< 0.1%
25050 3
 
< 0.1%
23953 3
 
< 0.1%
25515 3
 
< 0.1%
25686 3
 
< 0.1%
23603 3
 
< 0.1%
14025 3
 
< 0.1%
1789 3
 
< 0.1%
20888 3
 
< 0.1%
26331 3
 
< 0.1%
23741 3
 
< 0.1%
26902 3
 
< 0.1%
28289 3
 
< 0.1%
23849 3
 
< 0.1%
19454 3
 
< 0.1%
851 3
 
< 0.1%
3285 3
 
< 0.1%
4623 3
 
< 0.1%
24896 3
 
< 0.1%
20771 3
 
< 0.1%
20851 3
 
< 0.1%
23574 3
 
< 0.1%
27124 3
 
< 0.1%
20364 3
 
< 0.1%
20350 3
 
< 0.1%
24078 3
 
< 0.1%
25419 3
 
< 0.1%
31 3
 
< 0.1%
21557 3
 
< 0.1%
21749 3
 
< 0.1%
26846 3
 
< 0.1%
22411 3
 
< 0.1%
21632 3
 
< 0.1%
19390 3
 
< 0.1%
18822 3
 
< 0.1%
25891 3
 
< 0.1%
28060 3
 
< 0.1%
27281 3
 
< 0.1%
19192 3
 
< 0.1%
18659 3
 
< 0.1%
29330 3
 
< 0.1%
25227 3
 
< 0.1%
24898 3
 
< 0.1%
27523 3
 
< 0.1%
23043 3
 
< 0.1%
18903 3
 
< 0.1%
15121 3
 
< 0.1%
30028 3
 
< 0.1%
15137 3
 
< 0.1%
17831 3
 
< 0.1%
27152 3
 
< 0.1%
21871 3
 
< 0.1%
28359 3
 
< 0.1%
21374 3
 
< 0.1%
24085 3
 
< 0.1%
22432 3
 
< 0.1%
20363 3
 
< 0.1%
24988 3
 
< 0.1%
26820 3
 
< 0.1%
13138 3
 
< 0.1%
20993 3
 
< 0.1%
19591 3
 
< 0.1%
26602 3
 
< 0.1%
11650 3
 
< 0.1%
25181 3
 
< 0.1%
17168 3
 
< 0.1%
26574 3
 
< 0.1%
26724 3
 
< 0.1%
Other values (6343) 8442
78.5%
(Missing) 1290
 
12.0%
ValueCountFrequency (%)
3 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
11 1
< 0.1%
18 1
< 0.1%
ValueCountFrequency (%)
32872 31
0.3%
32871 7
 
0.1%
32848 4
 
< 0.1%
32784 1
 
< 0.1%
32754 1
 
< 0.1%
Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size426.1 KiB
Kidney
1384 
Lung
1273 
Blood or Bone marrow
1177 
Breast
737 
Uterus
639 
Other values (20)
5546 

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowColon
2nd rowColon
3rd rowStomach
4th rowColon
5th rowBrain

Common Values

ValueCountFrequency (%)
Kidney 1384
12.9%
Lung 1273
11.8%
Blood or Bone marrow 1177
10.9%
Breast 737
6.9%
Uterus 639
5.9%
Brain 638
5.9%
Prostate gland 570
5.3%
Head and Neck 562
5.2%
Thyroid gland 522
 
4.9%
Colon 398
 
3.7%
Liver 368
 
3.4%
Cervix uteri 364
 
3.4%
Skin 350
 
3.3%
Pancreas 333
 
3.1%
Stomach 311
 
2.9%
Bladder 288
 
2.7%
Adrenal gland 268
 
2.5%
Esophagus 140
 
1.3%
Ovarian 99
 
0.9%
Bones 74
 
0.7%
Thymus 68
 
0.6%
Retroperitoneum 66
 
0.6%
Testis 61
 
0.6%
Anterior mediastinum 33
 
0.3%
Pleura 33
 
0.3%
Distinct38
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size426.1 KiB
Adenocarcinoma
1421 
Healthy
1379 
Squamous cell carcinoma
1303 
Acute myeloid leukemia
753 
Infiltrating duct carcinoma
746 
Other values (33)
5154 

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAdenocarcinoma
2nd rowAdenocarcinoma
3rd rowAdenocarcinoma
4th rowAdenocarcinoma
5th rowGlioblastoma

Common Values

ValueCountFrequency (%)
Adenocarcinoma 1421
13.2%
Healthy 1379
12.8%
Squamous cell carcinoma 1303
12.1%
Acute myeloid leukemia 753
7.0%
Infiltrating duct carcinoma 746
6.9%
Papillary adenocarcinoma 559
 
5.2%
Endometrioid adenocarcinoma 523
 
4.9%
Renal cell carcinoma 387
 
3.6%
Malignant melanoma 350
 
3.3%
Hepatocellular carcinoma 321
 
3.0%
Glioblastoma 317
 
2.9%
Clear cell adenocarcinoma 258
 
2.4%
Transitional cell carcinoma 223
 
2.1%
Acinar cell carcinoma 182
 
1.7%
Oligodendroglioma 167
 
1.6%
Lobular carcinoma 159
 
1.5%
Astrocytoma 154
 
1.4%
Papillary carcinoma 135
 
1.3%
Pheochromocytoma 135
 
1.3%
Wilms tumor 131
 
1.2%
Chronic lymphocytic leukemia 117
 
1.1%
Thymoma 101
 
0.9%
Serous cancer 99
 
0.9%
Serous cystadenocarcinoma 86
 
0.8%
Neuroblastoma 75
 
0.7%
Osteosarcoma 74
 
0.7%
Tubular adenocarcinoma 66
 
0.6%
Papillary transitional cell carcinoma 65
 
0.6%
Seminoma 61
 
0.6%
Malignant rhabdoid tumor 60
 
0.6%
Adrenal cortical carcinoma 58
 
0.5%
Carcinoma 53
 
0.5%
Acute lymphocytic leukemia 50
 
0.5%
Mucinous adenocarcinoma 48
 
0.4%
Acute myelomonocytic leukemia 41
 
0.4%
Dedifferentiated liposarcoma 36
 
0.3%
Epithelioid mesothelioma 33
 
0.3%
Leiomyosarcoma 30
 
0.3%

tumor_grade
Categorical

Missing 

Distinct4
Distinct (%)0.1%
Missing6412
Missing (%)59.6%
Memory size426.1 KiB
G2
2138 
G3
1673 
G1
396 
G4
 
137

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowG2
2nd rowG2
3rd rowG2
4th rowG2
5th rowG1

Common Values

ValueCountFrequency (%)
G2 2138
 
19.9%
G3 1673
 
15.6%
G1 396
 
3.7%
G4 137
 
1.3%
(Missing) 6412
59.6%

Common Values (Plot)

2025-07-30T12:00:59.457047image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

platform
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size426.1 KiB
450K
7678 
EPIC
3043 
EPICv2
 
35

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEPIC
2nd rowEPIC
3rd rowEPIC
4th rowEPIC
5th rowEPIC

Common Values

ValueCountFrequency (%)
450K 7678
71.4%
EPIC 3043
 
28.3%
EPICv2 35
 
0.3%

Common Values (Plot)

2025-07-30T12:00:59.501086image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

source
Categorical

Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size426.1 KiB
GDC
9971 
GEO
 
668
In-house generated
 
117

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGDC
2nd rowGDC
3rd rowGDC
4th rowGDC
5th rowGDC

Common Values

ValueCountFrequency (%)
GDC 9971
92.7%
GEO 668
 
6.2%
In-house generated 117
 
1.1%

Common Values (Plot)

2025-07-30T12:00:59.540808image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/