seaborn-distribution

可视化数据集的分布

1
2
3
4
5
6
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
%matplotlib inline

单变量分布

displot()函数将绘制直方图,并拟合核密度函数(KDE)

x=np.random.normal(size=100)
sns.distplot(x)

1
2
#去除kde
sns.distplot(x,kde=False,rug=True)
<matplotlib.axes._subplots.AxesSubplot at 0x1a21577f60>

png

1
2
#箱子划分有多细,Seaborn会默认猜测一个,但是更好的应该由我们来指定
sns.distplot(x,bins=20,kde=False,rug=True)
<matplotlib.axes._subplots.AxesSubplot at 0x1a215fdbe0>

png

核密度函数

The kernel density estimate may be less familiar, but it can be a useful tool for plotting the shape of a distribution. Like the histogram, the KDE plots encode the density of observations on one axis with height along the other axis:

简单理解为展示密度

1
sns.distplot(x,hist=False,rug=True)
<matplotlib.axes._subplots.AxesSubplot at 0x1a216fada0>

png

1
sns.kdeplot(x,shade=True)
<matplotlib.axes._subplots.AxesSubplot at 0x1a2181cd68>

png

双变量分布

It can also be useful to visualize a bivariate distribution of two variables. The easiest way to do this in seaborn is to just use the jointplot() function, which creates a multi-panel figure that shows both the bivariate (or joint) relationship between two variables along with the univariate (or marginal) distribution of each on separate axes.

1
2
3
4
mean, cov = [0, 1], [(1, .5), (.5, 1)]
data = np.random.multivariate_normal(mean, cov, 200)
df = pd.DataFrame(data, columns=["x", "y"])
df.head()
x y
0 1.620467 2.511505
1 -0.529253 0.247477
2 -1.361914 0.225665
3 -1.188358 0.785273
4 1.158663 0.180673
1
sns.jointplot(x='x',y='y',data=df)
<seaborn.axisgrid.JointGrid at 0x1a21b3a668>

png

1
2
3
x, y = np.random.multivariate_normal(mean, cov, 1000).T
with sns.axes_style("white"):
sns.jointplot(x=x, y=y, kind="hex", color="k")

png

Kernel density estimation

这个和单变量分布的核密度函数差不多

1
sns.jointplot(x='x',y='y',kind='kde',data=df)
<seaborn.axisgrid.JointGrid at 0x1a21c534a8>

png

可视化数据集中的成对关系

这个没太看明白、、、

1
2
iris = sns.load_dataset("iris")
sns.pairplot(iris);

png

1
2
3
g = sns.PairGrid(iris)
g.map_diag(sns.kdeplot)
g.map_offdiag(sns.kdeplot, n_levels=6);

png