安装最新版本方法:
pip install git+https://github.com/mwaskom/seaborn.git
导入包和设置背景
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="darkgrid")
查看数据集
tips = sns.load_dataset("tips")
tips.head(3)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
|
total_bill |
tip |
sex |
smoker |
day |
time |
size |
0 |
16.99 |
1.01 |
Female |
No |
Sun |
Dinner |
2 |
1 |
10.34 |
1.66 |
Male |
No |
Sun |
Dinner |
3 |
2 |
21.01 |
3.50 |
Male |
No |
Sun |
Dinner |
3 |
一、用散点图关联变量
1、绘制散点图,x为”total_bill”列的数据,y为”tip”列的数据,数据集为”tips”的pandas类型数据
sns.relplot(x="total_bill", y="tip", data=tips);
![这里写图片描述](https://img-blog.csdn.net/20180719191023308?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
2、绘制3维的关系图,第3维的数据”smoker”列的数据用hue=”smoker”表示,如图,蓝色代表取值
“yes”,红色代表取值”no”。
sns.relplot(x="total_bill", y="tip", hue="smoker", data=tips);
![这里写图片描述](https://img-blog.csdn.net/20180719191050847?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
3、绘制3维的关系图,第3维的数据”smoker”列的数据用hue=”smoker”和style=”smoker”表示,
用不同颜色和样式表示该维数据,即蓝色圆点代表”yes”,橙色x代表取值”no”
sns.relplot(x="total_bill", y="tip", hue="smoker", style="smoker",
data=tips);
![这里写图片描述](https://img-blog.csdn.net/20180719191114893?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
4、绘制4维的关系图,用样式表示另外一列数据
sns.relplot(x="total_bill", y="tip", hue="smoker", style="time", data=tips);
![这里写图片描述](https://img-blog.csdn.net/20180719191125793?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
5、如果颜色代表的维度数据不是离散值,而是连续的数字,则根据颜色深浅代表取值
sns.relplot(x="total_bill", y="tip", hue="size", data=tips);
![这里写图片描述](https://img-blog.csdn.net/20180719191136813?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
我们可以自定义颜色
sns.relplot(x="total_bill", y="tip", hue="size", palette="ch:r=-.5,l=.75", data=tips);
![这里写图片描述](https://img-blog.csdn.net/20180719191148388?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
6、除了”颜色”、”样式”可以代表维度外,”大小”也可以;,”大小”类似于”颜色”处理数值数据
sns.relplot(x="total_bill", y="tip", size="size", data=tips);
![这里写图片描述](https://img-blog.csdn.net/20180719191202156?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
数据单元中的值范围被归一化为面积单位的范围,这个范围可以定制
sns.relplot(x="total_bill", y="tip", size="size", sizes=(15, 200), data=tips);
![这里写图片描述](https://img-blog.csdn.net/20180719191220324?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
二、强调线图的连续性
数据集
df = pd.DataFrame(dict(time=np.arange(500),value=np.random.randn(500).cumsum()))
df.head(3)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
|
time |
value |
0 |
0 |
-0.652967 |
1 |
1 |
-2.372132 |
2 |
2 |
-2.926434 |
1、time随value的变化图
g = sns.relplot(x="time", y="value", kind="line", data=df)
g.fig.autofmt_xdate()
![这里写图片描述](https://img-blog.csdn.net/20180719191243575?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
默认行为是x在绘图之前按值对数据进行排序。但是,这可以被禁用。
df = pd.DataFrame(np.random.randn(500, 2).cumsum(axis=0), columns=["x", "y"])
sns.relplot(x="x", y="y", sort=False, kind="line", data=df);
![这里写图片描述](https://img-blog.csdn.net/2018071919125593?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
2、聚合和表示不确定性:
更复杂的数据集将对x变量的相同值进行多次测量。seaborn中的默认行为是x通过绘制平均值周围
的平均值和95%置信区间来聚合每个值的多个测量值:
数据集
fmri = sns.load_dataset("fmri")
fmri.head(3)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
|
subject |
timepoint |
event |
region |
signal |
0 |
s13 |
18 |
stim |
parietal |
-0.017552 |
1 |
s5 |
14 |
stim |
parietal |
-0.080883 |
2 |
s12 |
18 |
stim |
parietal |
-0.081033 |
sns.relplot(x="timepoint", y="signal", kind="line", data=fmri);
![这里写图片描述](https://img-blog.csdn.net/20180719191310117?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
使用自举来计算置信区间,对于较大的数据集,这可能是时间密集的。因此可以禁用它们:
sns.relplot(x="timepoint", y="signal", ci=None, kind="line", data=fmri);
![这里写图片描述](https://img-blog.csdn.net/20180719191322462?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
对于更大的数据,是通过绘制标准偏差而不是置信区间来表示每个时间点的分布范围
sns.relplot(x="timepoint", y="signal", kind="line", ci="sd", data=fmri);
![这里写图片描述](https://img-blog.csdn.net/20180719191336218?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
要完全关闭聚合,请将estimator参数设置为“ None当数据在每个点有多个观察值时,可能会产生奇怪的效果
sns.relplot(x="timepoint", y="signal", estimator=None, kind="line", data=fmri);
![这里写图片描述](https://img-blog.csdn.net/20180719191346560?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
三、用语义映射绘制数据子集
sns.relplot(x="timepoint", y="signal", hue="event", kind="line", data=fmri);
![这里写图片描述](https://img-blog.csdn.net/20180719191358556?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
hue表示颜色,style表示样式
sns.relplot(x="timepoint", y="signal", hue="region", style="event",
kind="line", data=fmri);
![这里写图片描述](https://img-blog.csdn.net/20180719191409634?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
使用短划线或代替默认的样式
sns.relplot(x="timepoint", y="signal", hue="region", style="event",
dashes=False, markers=True, kind="line", data=fmri);
![这里写图片描述](https://img-blog.csdn.net/20180719191422322?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
用线条的颜色和样式只表示一个数据维度
sns.relplot(x="timepoint", y="signal", hue="event", style="event",
kind="line", data=fmri);
![这里写图片描述](https://img-blog.csdn.net/20180719191434615?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
1、使用重复测量数据(即有多次采样的单位)时,可以单独绘制每个采样单位,
而无需通过语义区分它们。这可以避免使图例混乱。比如对”subject”单位绘图。
sns.relplot(x="timepoint", y="signal", hue="region",
units="subject", estimator=None,
kind="line", data=fmri.query("event == 'stim'"));
![这里写图片描述](https://img-blog.csdn.net/20180719191445243?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
2、默认lineplot()的色彩映射和图例的处理还取决于色调语义是分类还是数字
数据集
dots = sns.load_dataset("dots").query("align == 'dots'")
dots.head(4)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
|
align |
choice |
time |
coherence |
firing_rate |
0 |
dots |
T1 |
-80 |
0.0 |
33.189967 |
1 |
dots |
T1 |
-80 |
3.2 |
31.691726 |
2 |
dots |
T1 |
-80 |
6.4 |
34.279840 |
3 |
dots |
T1 |
-80 |
12.8 |
32.631874 |
sns.relplot(x="time", y="firing_rate",
hue="coherence", style="choice",
kind="line", data=dots);
![这里写图片描述](https://img-blog.csdn.net/2018071919150711?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
如果hue变量是数字,很难用线性色标表示可以用水平以对数方式缩放。通过传递列表或字典为每一行提供特定的颜色值:
palette = sns.cubehelix_palette(light=.8, n_colors=6)
sns.relplot(x="time", y="firing_rate",
hue="coherence", style="choice",
palette=palette,
kind="line", data=dots);
![这里写图片描述](https://img-blog.csdn.net/2018071919151941?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
三、用日期数据绘图
df = pd.DataFrame(dict(time=pd.date_range("2017-1-1", periods=500),
value=np.random.randn(500).cumsum()))
g = sns.relplot(x="time", y="value", kind="line", data=df)
g.fig.autofmt_xdate()
![这里写图片描述](https://img-blog.csdn.net/20180719191531606?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
四、显示与facet的多个关系
1、通过col=”time”,每一列代表”time”的取值,有多少个值就有多少列
sns.relplot(x="total_bill", y="tip", hue="smoker",
col="time", data=tips);
![这里写图片描述](https://img-blog.csdn.net/20180719191542335?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
2、 通过col=”region”, row=”event”,每一列代表”time”的取值,每一行代表”event”的取值
sns.relplot(x="timepoint", y="signal", hue="subject",
col="region", row="event", height=3,
kind="line", estimator=None, data=fmri);
![这里写图片描述](https://img-blog.csdn.net/2018071919155515?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
col_wrap=5设置每一列显示多少个图;height=3代表子图的高, aspect=.75,代表高宽比;
linewidth=2.5代表线条粗细
sns.relplot(x="timepoint", y="signal", hue="event", style="event",
col="subject", col_wrap=5,
height=3, aspect=.75, linewidth=2.5,
kind="line", data=fmri.query("region == 'frontal'"));
![这里写图片描述](https://img-blog.csdn.net/20180719191605480?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2hhbzUzMzUxNTY=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)