2.1.4
随着医学技术的进步和医疗资源的丰富,医疗研究在改善患者治疗效果、提升医疗服务质量方面起到了重要作用。研究人员通过分析大量患者的治疗数据,能够评估不同治疗方案的效果,发现潜在的健康问题,并提出针对性的治疗建议。这不仅可以帮助患者获得更好的治疗效果,还能为医疗机构优化资源配置、提升服务水平提供重要依据。
现提供一份医疗研究数据集,训练集样本数据一共5441条记录。请补全2.1.4.ipynb代码,完成下面的数据预处理任务:
1、加载数据集,查看表的数据类型,表结构和显示每一列的空缺值数量;
2、将“就诊日期”和“诊断日期”规范为“yyyy-mm-dd”格式,并将“病人ID”列名改为“患者ID”,显示修改后的表结构;
3、增加“诊断延迟”(诊断日期-就诊日期)和“病程”(当前日期-诊断日期)两列,删除不合理的数据(如负数,年龄为几百岁等);
4、检查数据集中的重复值并删除所有重复值,并记录删除的行数;
5、对数据段[年龄,体重,身高]进行归一化处理;
6、统计不同疾病类型的治疗结果分布,并画出柱状图;
7、分析年龄和疾病严重程度的关系,绘制出散点图;
8、保存处理后的数据,并命名为:2.1.4_cleaned_data.csv,保存到考生文件夹;
9、制定数据清洗和数据标注规范,将答案写到答题卷文件中,答题卷文件命名为“2.1.4.docx”,保存到考生文件夹;
10、将以上代码以及运行结果,以html格式保存并命名为2.1.4.html,保存到考生文件夹,考生文件夹命名为“准考证号+身份证后6位”。
import pandas as pd
# 加载数据集并指定编码为gbk
data = pd.read_csv('medical_data.csv',encoding='gbk')
# 查看数据类型
print(data.dtypes)
# 查看表结构基本信息
print(data.info)
# 显示每一列的空缺值数量
print(data.isnull().sum())
# 规范日期格式
data['就诊日期'] = pd.to_datetime(data['就诊日期'])
data['诊断日期'] = pd.to_datetime(data['诊断日期'])
# 修改列名
data.rename(columns={'病人ID':'患者ID'}, inplace=True)
# 查看修改后的表结构
print(data.head())
from datetime import datetime
# 增加诊断延迟和病程列
data['诊断延迟'] = (data['诊断日期'] - data['就诊日期']) .dt.days
data['病程'] = (datetime(2024, 9, 1) - data['诊断日期']).dt.days
# 删除不合理的数据
data = data[(data['诊断延迟'] >= 0) & (data['年龄'] > 0) & (data['年龄'] < 120)]
# 查看修改后的数据
print(data.describe())
# 删除重复值并记录删除的行数
initial_rows = data.shape[0]
data.drop_duplicates(inplace=True)
deleted_rows = initial_rows - data.shape[0]
print(f'删除的重复行数: {deleted_rows}')
from sklearn.preprocessing import MinMaxScaler
# 对需要归一化的列进行处理
scaler = MinMaxScaler()
columns_to_normalize = ['年龄','体重','身高']
data[columns_to_normalize] = scaler.fit_transform(data[columns_to_normalize])
# 查看归一化后的数据
print(data.head())
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
# 统计治疗结果分布
treatment_outcome_distribution = data.groupby('疾病类型')['治疗结果'].value_counts().unstack()
# 设置中文字体
#font_path = 'C:/Windows/Fonts/simhei.ttf' # 根据你的系统调整字体路径
#my_font = fm.FontProperties(fname=font_path)
# 绘制柱状图
treatment_outcome_distribution.plot(kind='bar', stacked=True)
plt.title('title')
plt.xlabel('xtitle')
plt.ylabel('ylabel')
plt.xticks() # 设置x轴刻度标签的字体
plt.yticks() # 设置y轴刻度标签的字体
plt.legend() # 设置图例字体
plt.show()
# 绘制散点图
plt.scatter(data['年龄'],data['疾病严重程度'])
plt.title('年龄和疾病严重程度的关系')
plt.xlabel('年龄')
plt.ylabel('疾病严重程度')
plt.xticks() # 设置x轴刻度标签的字体
plt.yticks() # 设置y轴刻度标签的字体
plt.legend() # 设置图例字体
plt.show()
# 保存处理后得数据
output_path = '2.1.4_cleaned_data.csv'
data.to_csv(output_path, index=False)
病人ID int64
年龄 int64
性别 object
地区 object
就诊日期 object
疾病类型 object
诊断日期 object
疾病严重程度 object
治疗方案 object
治疗结果 object
药物名称 object
药物分类 object
用药剂量 object
用药时长 object
检查项目 object
检查结果 float64
参考范围 object
体重 float64
身高 float64
吸烟情况 object
饮酒情况 object
遗传病史 object
dtype: object
<bound method DataFrame.info of 病人ID 年龄 性别 地区 就诊日期 疾病类型 ... 参考范围 体重 身高 吸烟情况 饮酒情况 遗传病史
0 20413 45 女 重庆 2023-4-4 感冒 ... 正常 67.0 184.0 是 偶尔 是
1 27375 38 女 深圳 2023-3-12 哮喘 ... 正常 43.0 140.0 戒烟 戒酒 是
2 93406 27 女 武汉 2022-11-19 哮喘 ... 未知 91.0 199.0 否 偶尔 不清楚
3 59255 87 男 成都 2022-10-30 骨折 ... 正常 79.0 165.0 是 戒酒 是
4 52488 24 男 北京 2022-9-22 糖尿病 ... 未知 119.0 158.0 戒烟 是 不清楚
... ... .. .. .. ... ... ... ... ... ... ... ... ...
5435 38703 6 男 深圳 2023-4-29 癌症 ... 偏高 118.0 173.0 偶尔 戒酒 是
5436 10306 37 女 重庆 2023-1-14 骨折 ... 正常 118.0 150.0 戒烟 是 不清楚
5437 46444 57 男 杭州 2022-11-20 哮喘 ... 未知 53.0 186.0 否 偶尔 是
5438 22516 49 女 深圳 2022-12-29 感冒 ... 偏高 73.0 158.0 偶尔 戒酒 不清楚
5439 37760 42 男 广州 2022-7-4 癌症 ... 偏低 117.0 165.0 戒烟 否 不清楚
[5440 rows x 22 columns]>
病人ID 0
年龄 0
性别 0
地区 2
就诊日期 1
疾病类型 0
诊断日期 11
疾病严重程度 1
治疗方案 2
治疗结果 3
药物名称 2
药物分类 2
用药剂量 4
用药时长 2
检查项目 5
检查结果 3
参考范围 1
体重 3
身高 1
吸烟情况 3
饮酒情况 1
遗传病史 0
dtype: int64
患者ID 年龄 性别 地区 就诊日期 疾病类型 ... 参考范围 体重 身高 吸烟情况 饮酒情况 遗传病史
0 20413 45 女 重庆 2023-04-04 感冒 ... 正常 67.0 184.0 是 偶尔 是
1 27375 38 女 深圳 2023-03-12 哮喘 ... 正常 43.0 140.0 戒烟 戒酒 是
2 93406 27 女 武汉 2022-11-19 哮喘 ... 未知 91.0 199.0 否 偶尔 不清楚
3 59255 87 男 成都 2022-10-30 骨折 ... 正常 79.0 165.0 是 戒酒 是
4 52488 24 男 北京 2022-09-22 糖尿病 ... 未知 119.0 158.0 戒烟 是 不清楚
[5 rows x 22 columns]
患者ID 年龄 ... 诊断延迟 病程
count 2707.000000 2707.000000 ... 2707.000000 2707.000000
mean 54772.932397 49.381234 ... 120.326191 562.430735
min 10012.000000 1.000000 ... 0.000000 441.000000
25% 31891.500000 25.000000 ... 48.000000 492.000000
50% 54527.000000 48.000000 ... 106.000000 550.000000
75% 76802.500000 74.000000 ... 183.000000 623.000000
max 99965.000000 100.000000 ... 357.000000 802.000000
std 26021.158215 28.614318 ... 86.129346 84.133081
[8 rows x 9 columns]
删除的重复行数: 2
患者ID 年龄 性别 地区 就诊日期 ... 吸烟情况 饮酒情况 遗传病史 诊断延迟 病程
0 20413 0.444444 女 重庆 2023-04-04 ... 是 偶尔 是 21.0 495.0
3 59255 0.868687 男 成都 2022-10-30 ... 是 戒酒 是 45.0 627.0
5 66021 0.333333 男 深圳 2022-11-03 ... 否 偶尔 否 164.0 504.0
7 21594 0.111111 男 杭州 2022-08-07 ... 否 否 否 285.0 471.0
10 14858 0.767677 女 广州 2022-08-18 ... 否 是 否 102.0 643.0
[5 rows x 24 columns]
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 21742 (\N{CJK UNIFIED IDEOGRAPH-54EE}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 21912 (\N{CJK UNIFIED IDEOGRAPH-5598}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 24515 (\N{CJK UNIFIED IDEOGRAPH-5FC3}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 33039 (\N{CJK UNIFIED IDEOGRAPH-810F}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 30149 (\N{CJK UNIFIED IDEOGRAPH-75C5}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 24863 (\N{CJK UNIFIED IDEOGRAPH-611F}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 20882 (\N{CJK UNIFIED IDEOGRAPH-5192}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 30284 (\N{CJK UNIFIED IDEOGRAPH-764C}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 30151 (\N{CJK UNIFIED IDEOGRAPH-75C7}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 31958 (\N{CJK UNIFIED IDEOGRAPH-7CD6}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 23615 (\N{CJK UNIFIED IDEOGRAPH-5C3F}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 39592 (\N{CJK UNIFIED IDEOGRAPH-9AA8}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 25240 (\N{CJK UNIFIED IDEOGRAPH-6298}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 39640 (\N{CJK UNIFIED IDEOGRAPH-9AD8}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 34880 (\N{CJK UNIFIED IDEOGRAPH-8840}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 21387 (\N{CJK UNIFIED IDEOGRAPH-538B}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 22909 (\N{CJK UNIFIED IDEOGRAPH-597D}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 36716 (\N{CJK UNIFIED IDEOGRAPH-8F6C}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 24247 (\N{CJK UNIFIED IDEOGRAPH-5EB7}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 22797 (\N{CJK UNIFIED IDEOGRAPH-590D}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 24694 (\N{CJK UNIFIED IDEOGRAPH-6076}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 21270 (\N{CJK UNIFIED IDEOGRAPH-5316}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 24773 (\N{CJK UNIFIED IDEOGRAPH-60C5}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 31283 (\N{CJK UNIFIED IDEOGRAPH-7A33}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 23450 (\N{CJK UNIFIED IDEOGRAPH-5B9A}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/var/folders/rq/lzp3p28j4wsfzc34n6x5rdlh0000gn/T/ipykernel_48386/3551012859.py:82: UserWarning: No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
plt.legend() # 设置图例字体
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 19981 (\N{CJK UNIFIED IDEOGRAPH-4E0D}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 20005 (\N{CJK UNIFIED IDEOGRAPH-4E25}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 37325 (\N{CJK UNIFIED IDEOGRAPH-91CD}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 20013 (\N{CJK UNIFIED IDEOGRAPH-4E2D}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 24230 (\N{CJK UNIFIED IDEOGRAPH-5EA6}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 36731 (\N{CJK UNIFIED IDEOGRAPH-8F7B}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 30142 (\N{CJK UNIFIED IDEOGRAPH-75BE}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 31243 (\N{CJK UNIFIED IDEOGRAPH-7A0B}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 24180 (\N{CJK UNIFIED IDEOGRAPH-5E74}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 40836 (\N{CJK UNIFIED IDEOGRAPH-9F84}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 21644 (\N{CJK UNIFIED IDEOGRAPH-548C}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 30340 (\N{CJK UNIFIED IDEOGRAPH-7684}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 20851 (\N{CJK UNIFIED IDEOGRAPH-5173}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 31995 (\N{CJK UNIFIED IDEOGRAPH-7CFB}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
import pandas as pd
# 加载数据集并指定编码为gbk
data = pd.read_csv('medical_data.csv',encoding='gbk')
# 查看数据类型
print(data.dtypes)
# 查看表结构基本信息
print(data.info)
# 显示每一列的空缺值数量
print(data.isnull().sum())
# 规范日期格式
data['就诊日期'] = pd.to_datetime(data['就诊日期'])
data['诊断日期'] = pd.to_datetime(data['诊断日期'])
# 修改列名
data.rename(columns={'病人ID':'患者ID'}, inplace=True)
# 查看修改后的表结构
print(data.head())
from datetime import datetime
# 增加诊断延迟和病程列
data['诊断延迟'] = (data['诊断日期'] - data['就诊日期']) .dt.days
data['病程'] = (datetime(2024, 9, 1) - data['诊断日期']).dt.days
# 删除不合理的数据
data = data[(data['诊断延迟'] >= 0) & (data['年龄'] > 0) & (data['年龄'] < 120)]
病人ID int64
年龄 int64
性别 object
地区 object
就诊日期 object
疾病类型 object
诊断日期 object
疾病严重程度 object
治疗方案 object
治疗结果 object
药物名称 object
药物分类 object
用药剂量 object
用药时长 object
检查项目 object
检查结果 float64
参考范围 object
体重 float64
身高 float64
吸烟情况 object
饮酒情况 object
遗传病史 object
dtype: object
<bound method DataFrame.info of 病人ID 年龄 性别 地区 就诊日期 疾病类型 ... 参考范围 体重 身高 吸烟情况 饮酒情况 遗传病史
0 20413 45 女 重庆 2023-4-4 感冒 ... 正常 67.0 184.0 是 偶尔 是
1 27375 38 女 深圳 2023-3-12 哮喘 ... 正常 43.0 140.0 戒烟 戒酒 是
2 93406 27 女 武汉 2022-11-19 哮喘 ... 未知 91.0 199.0 否 偶尔 不清楚
3 59255 87 男 成都 2022-10-30 骨折 ... 正常 79.0 165.0 是 戒酒 是
4 52488 24 男 北京 2022-9-22 糖尿病 ... 未知 119.0 158.0 戒烟 是 不清楚
... ... .. .. .. ... ... ... ... ... ... ... ... ...
5435 38703 6 男 深圳 2023-4-29 癌症 ... 偏高 118.0 173.0 偶尔 戒酒 是
5436 10306 37 女 重庆 2023-1-14 骨折 ... 正常 118.0 150.0 戒烟 是 不清楚
5437 46444 57 男 杭州 2022-11-20 哮喘 ... 未知 53.0 186.0 否 偶尔 是
5438 22516 49 女 深圳 2022-12-29 感冒 ... 偏高 73.0 158.0 偶尔 戒酒 不清楚
5439 37760 42 男 广州 2022-7-4 癌症 ... 偏低 117.0 165.0 戒烟 否 不清楚
[5440 rows x 22 columns]>
病人ID 0
年龄 0
性别 0
地区 2
就诊日期 1
疾病类型 0
诊断日期 11
疾病严重程度 1
治疗方案 2
治疗结果 3
药物名称 2
药物分类 2
用药剂量 4
用药时长 2
检查项目 5
检查结果 3
参考范围 1
体重 3
身高 1
吸烟情况 3
饮酒情况 1
遗传病史 0
dtype: int64
患者ID 年龄 性别 地区 就诊日期 疾病类型 ... 参考范围 体重 身高 吸烟情况 饮酒情况 遗传病史
0 20413 45 女 重庆 2023-04-04 感冒 ... 正常 67.0 184.0 是 偶尔 是
1 27375 38 女 深圳 2023-03-12 哮喘 ... 正常 43.0 140.0 戒烟 戒酒 是
2 93406 27 女 武汉 2022-11-19 哮喘 ... 未知 91.0 199.0 否 偶尔 不清楚
3 59255 87 男 成都 2022-10-30 骨折 ... 正常 79.0 165.0 是 戒酒 是
4 52488 24 男 北京 2022-09-22 糖尿病 ... 未知 119.0 158.0 戒烟 是 不清楚
[5 rows x 22 columns]
# 查看修改后的数据
print(data.describe())
# 删除重复值并记录删除的行数
initial_rows = data.shape[0]
data.drop_duplicates(inplace=True)
deleted_rows = initial_rows - data.shape[0]
患者ID 年龄 ... 诊断延迟 病程
count 2705.000000 2705.000000 ... 2705.000000 2705.000000
mean 54799.954159 49.384843 ... 120.272089 562.434750
min 10012.000000 1.000000 ... 0.000000 441.000000
25% 32000.000000 25.000000 ... 48.000000 492.000000
50% 54572.000000 48.000000 ... 106.000000 550.000000
75% 76828.000000 74.000000 ... 183.000000 623.000000
max 99965.000000 100.000000 ... 357.000000 802.000000
std 26011.621262 28.610941 ... 86.102238 84.131555
[8 rows x 9 columns]
print(f'删除的重复行数: {deleted_rows}')
from sklearn.preprocessing import MinMaxScaler
# 对需要归一化的列进行处理
scaler = MinMaxScaler()
columns_to_normalize = ['年龄','体重','身高']
data[columns_to_normalize] = scaler.fit_transform(data[columns_to_normalize])
# 查看归一化后的数据
print(data.head())
删除的重复行数: 0
患者ID 年龄 性别 地区 就诊日期 ... 吸烟情况 饮酒情况 遗传病史 诊断延迟 病程
0 20413 0.444444 女 重庆 2023-04-04 ... 是 偶尔 是 21.0 495.0
3 59255 0.868687 男 成都 2022-10-30 ... 是 戒酒 是 45.0 627.0
5 66021 0.333333 男 深圳 2022-11-03 ... 否 偶尔 否 164.0 504.0
7 21594 0.111111 男 杭州 2022-08-07 ... 否 否 否 285.0 471.0
10 14858 0.767677 女 广州 2022-08-18 ... 否 是 否 102.0 643.0
[5 rows x 24 columns]
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
# 统计治疗结果分布
treatment_outcome_distribution = data.groupby('疾病类型')['治疗结果'].value_counts().unstack()
# 设置中文字体
#font_path = 'C:/Windows/Fonts/simhei.ttf' # 根据你的系统调整字体路径
#my_font = fm.FontProperties(fname=font_path)
# 绘制柱状图
treatment_outcome_distribution.plot(kind='bar', stacked=True)
plt.title('title')
plt.xlabel('xtitle')
plt.ylabel('ylabel')
plt.xticks() # 设置x轴刻度标签的字体
plt.yticks() # 设置y轴刻度标签的字体
plt.legend() # 设置图例字体
plt.show()
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 21742 (\N{CJK UNIFIED IDEOGRAPH-54EE}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 21912 (\N{CJK UNIFIED IDEOGRAPH-5598}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 24515 (\N{CJK UNIFIED IDEOGRAPH-5FC3}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 33039 (\N{CJK UNIFIED IDEOGRAPH-810F}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 30149 (\N{CJK UNIFIED IDEOGRAPH-75C5}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 24863 (\N{CJK UNIFIED IDEOGRAPH-611F}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 20882 (\N{CJK UNIFIED IDEOGRAPH-5192}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 30284 (\N{CJK UNIFIED IDEOGRAPH-764C}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 30151 (\N{CJK UNIFIED IDEOGRAPH-75C7}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 31958 (\N{CJK UNIFIED IDEOGRAPH-7CD6}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 23615 (\N{CJK UNIFIED IDEOGRAPH-5C3F}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 39592 (\N{CJK UNIFIED IDEOGRAPH-9AA8}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 25240 (\N{CJK UNIFIED IDEOGRAPH-6298}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 39640 (\N{CJK UNIFIED IDEOGRAPH-9AD8}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 34880 (\N{CJK UNIFIED IDEOGRAPH-8840}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 21387 (\N{CJK UNIFIED IDEOGRAPH-538B}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 22909 (\N{CJK UNIFIED IDEOGRAPH-597D}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 36716 (\N{CJK UNIFIED IDEOGRAPH-8F6C}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 24247 (\N{CJK UNIFIED IDEOGRAPH-5EB7}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 22797 (\N{CJK UNIFIED IDEOGRAPH-590D}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 24694 (\N{CJK UNIFIED IDEOGRAPH-6076}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 21270 (\N{CJK UNIFIED IDEOGRAPH-5316}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 24773 (\N{CJK UNIFIED IDEOGRAPH-60C5}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 31283 (\N{CJK UNIFIED IDEOGRAPH-7A33}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
/opt/anaconda3/lib/python3.12/site-packages/IPython/core/pylabtools.py:170: UserWarning: Glyph 23450 (\N{CJK UNIFIED IDEOGRAPH-5B9A}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)