6

I am a Python-Newbie and want to plot a list of values between -0.2 and 0.2. The list looks like this

[...-0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01501152092971969,
  -0.01489985147131656,
  -0.015833709930856088,
  -0.015833709930856088,
  -0.015833709930856088,
  -0.015833709930856088,
  -0.015833709930856088...and so on].

In statistics I've learned to group my data into classes to get a useful plot for a histogram, which depends on such large data.

How can I add classes in python to my plot?

My code is

plt.hist(data)

and histogram looks like enter image description here

But it should look like enter image description here

Brian Spiering
  • 23,131
  • 2
  • 29
  • 113
Thomas
  • 61
  • 1
  • 1
  • 2

2 Answers2

3

Your histogram is valid, but it has too many bins to be useful.

If you want a number of equally spaced bins, you can simply pass that number through the bins argument of plt.hist, e.g.:

plt.hist(data, bins=10)

If you want your bins to have specific edges, you can pass these as a list to bins:

plt.hist(data, bins=[0, 5, 10, 15, 20, 25, 30, 35, 40, 60, 100])

Finally, you can also specify a method to calculate the bin edges automatically, such as auto (available methods are specified in the documentation of numpy.histogram_bin_edges):

plt.hist(data, bins='auto')

Complete code sample

import matplotlib.pyplot as plt
import numpy as np

# fix the random state for reproducibility
np.random.seed(19680801);

# sum of 2 normal distributions
n = 500;
data = 10 * np.random.randn(n) + 20 * np.random.randn(n) + 20;

# plot histograms with various bins
fig, axs = plt.subplots(1, 3, sharey=True, tight_layout=True, figsize=(9,3));
axs[0].hist(data, bins=10);
axs[1].hist(data, bins=[0, 5, 10, 15, 20, 25, 30, 35, 40, 60, 100]);
axs[2].hist(data, bins='auto');

enter image description here

Xavier
  • 131
  • 3
2

You have to specify the bin size, if I've figured out the question. As stated here.

You can give a list with the bin boundaries.

plt.hist(data, bins=[0, 10, 20, 30, 40, 50, 100])

If you just want them equally distributed, you can simply use range:

plt.hist(data, bins=range(min(data), max(data) + binwidth, binwidth))

You can also take a look at here and here.

Green Falcon
  • 14,308
  • 10
  • 59
  • 98