可视化数据集的分布

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
%matplotlib inline

单变量分布

displot()函数将绘制直方图，并拟合核密度函数(KDE)

x=np.random.normal(size=100)
sns.distplot(x)

1 2	#去除kde sns.distplot(x,kde=False,rug=True)

<matplotlib.axes._subplots.AxesSubplot at 0x1a21577f60>

png

1 2	#箱子划分有多细，Seaborn会默认猜测一个，但是更好的应该由我们来指定 sns.distplot(x,bins=20,kde=False,rug=True)

<matplotlib.axes._subplots.AxesSubplot at 0x1a215fdbe0>

png

核密度函数

The kernel density estimate may be less familiar, but it can be a useful tool for plotting the shape of a distribution. Like the histogram, the KDE plots encode the density of observations on one axis with height along the other axis:

简单理解为展示密度

1	sns.distplot(x,hist=False,rug=True)

<matplotlib.axes._subplots.AxesSubplot at 0x1a216fada0>

png

1	sns.kdeplot(x,shade=True)

<matplotlib.axes._subplots.AxesSubplot at 0x1a2181cd68>

png

双变量分布

It can also be useful to visualize a bivariate distribution of two variables. The easiest way to do this in seaborn is to just use the jointplot() function, which creates a multi-panel figure that shows both the bivariate (or joint) relationship between two variables along with the univariate (or marginal) distribution of each on separate axes.

mean, cov = [0, 1], [(1, .5), (.5, 1)]
data = np.random.multivariate_normal(mean, cov, 200)
df = pd.DataFrame(data, columns=["x", "y"])
df.head()

	x	y
0	1.620467	2.511505
1	-0.529253	0.247477
2	-1.361914	0.225665
3	-1.188358	0.785273
4	1.158663	0.180673

1	sns.jointplot(x='x',y='y',data=df)

<seaborn.axisgrid.JointGrid at 0x1a21b3a668>

png

1
2
3

x, y = np.random.multivariate_normal(mean, cov, 1000).T
with sns.axes_style("white"):
    sns.jointplot(x=x, y=y, kind="hex", color="k")

png

Kernel density estimation

这个和单变量分布的核密度函数差不多

1	sns.jointplot(x='x',y='y',kind='kde',data=df)

<seaborn.axisgrid.JointGrid at 0x1a21c534a8>

png

可视化数据集中的成对关系

这个没太看明白、、、

1 2	iris = sns.load_dataset("iris") sns.pairplot(iris);

png

1
2
3

g = sns.PairGrid(iris)
g.map_diag(sns.kdeplot)
g.map_offdiag(sns.kdeplot, n_levels=6);

png