7.+Histograms


 * What is a Histogram?**

A histogram is bar graph displaying tabulated frequencies (which are the bars). The height of each bar is proportional to the number of times a particular set of values has occurred its width is equal to the x-axis interval (the span of that particular set of numbers).

A histogram is made by graphing a set of values, for example, like the number set {1,2,2,3,3,3,3,4,4,5,6}. This would be the graph:
 * Histogram Example:**

Although this is a simple example, it works because it shows you the frequency of each number in this, which is the purpose of a histogram. With a graph like this, you can easily identify the mean, median, and mode. For example, the median is easily interpreted as 3, as well as the mode. The frequency of a number (3) is equal to the interval (1) times the number of times it appears (4), giving you a frequency of 4.

Now, that was just a simple example. In real-life, histograms are actually a little more complicated, for they usually contain all different numbers. In other words, there is no repetition of any number. Because of this, one would bin the data into pragmatic intervals. To bin is just to separate it into the various, distinct bars of the graph.


 * Types of Histograms:**

Bell Curve (occurs when the data follows normal distribution - where most of the values cluster around the average of the data) The previous example, while effective, is just a simple bell-curve with the majority of the frequencies occurring in the median values and the frequencies decreasing toward the extremes. Histograms are useful for determining the "typical value" or the average for a set of data. When it comes to symmetrical bell curve histograms (like the previous example) the typical value, mode, and median are all easy to see. When a bell curve histogram has most of its data in the center of its graph, one can conclude that the standard deviation (or the numerical amount each value differs from the average) is very small. However, for other graphs, where there is no symmetry, determining this "typical value" is harder.

Skewed Histograms However things are different, when the data is not equally distributed in a bell-curve. In a histograms like that, that data is pushed to just one side of the graph with the smaller, declining side called the tail of the graph. Histograms may be skewed either left or right. Skewed left the mass distribution of data is concentrated on the left for right skewed and on the right for left skewed. To determine the separating line, one can either use the mean, median, or mode. For example on the graph to the top right, if we place the separating line as the median, or mode, it would be between the second and third columns, while most of the data falls to the right of it.

Skewed histograms are useful for observing a trend in a big set of data. Box Plots however, can also be useful in spotting trends and frequencies in skewed data for through this process random extreme outliers will not throw off the trends of the data.


 * Uses of histograms**

Histograms make it easier to spot interquartile ranges at a glance. Take the first example (the bell-curve one). Using your knowledge of interquartile range and histograms, you can pinpoint where your Q1 and Q2 values are. Between these two values, 50% of your data should lay with 25% of your outlier data at each extreme.

Histograms can also be used to calculate probability distribution. all one must do is calculate the area of the different bins. If all of the intervals are of the same length and are equally probable to occur, then the histograms shows that the data has uniform distribution. But if not, by calculating the area (and thus the frequency) of each bin, one can conclude which value or interval has the greatest probability of occurring. For instance in the first example, if one were to pick one number at random from the set of data, there's a high probability that that number is a 3.


 * Now, for some F-U-N Problems with Histograms!!**

1. The following histogram is a set of test scores. Given that there are 36 total grades, of which there are 24 evenly dispensed in the intervals 60 to 70, 70 to 80, 80 to 90, and 90 to 100 and 12 between 50 and 60, answer the following question. The x- axis is the grades received, and the y- axis is the number of students who received that grade.



Which of the following statements are true?

I. The middle score is 75. II. If the passing score is 60, most students failed. III. More students scored between 50 and 60 than 90 and 100.

A. I only B. II only C. III only D. II and III only E. I, II, and III

Answer: C

Because there are 36 total people, the median would be between the 18th and 19th person, which would be located in the third bar. We are not given any values, so it is impossible to attest to the exact value of the median, so the answer cannot be A. Most students did not fail because there are 24 scores distributed from 60 to 100, which is 2/3 of the total data; therefore, most students passed. C is correct because there are 12 students who scored between 50 and 60 while there are 6 students who score between 90 and 100.

2. The following scores are the SAT math scores for an Algebra class of 20 students:

663 657 609 669 639 642 674 659 660 519 666 667 634 670 649 675 574 672 644 649

The distribution of the scores is

A. Symmetric B. Skewed Left C. Skewed Right D. Uniform E. Bimodal

Answer: B

In order to solve this problem, it would be best to find put the score in numerical order, starting from the lowest. It would be this:

519, 574, 609, 634, 639, 642, 644, 649, 649, 657, 659, 660, 663, 666, 667, 669, 670, 672, 674, 675

By looking at the data, it is quite evident that there is a greater concentration of data on the right side; therefore, the distribution of the scores is skewed left, B.

Now that you know this, you are ready to put your histogram knowledge to the test with these interactive histogram examples:

AWESOMEAMAZINGHISTOGRAMS With this website, you can change how you display a set of Math SAT Scores on an ACTUAL REAL LIFE histogram. By sliding the bar back and forth, you can experiment with different intervals (or bins with which to group together the different scores), and observe how the frequencies of each category grows in relation to the interval. Because the set of the SAT Math Scores stays the same, the histograms will look more different with each slide of the bar.


 * Bibliography (further reading)**:

http://rchsbowman.files.wordpress.com/2008/09/shape-skewed.jpg http://quarknet.fnal.gov/toolkits/new/histograms.html http://www.shodor.com/interactivate/activities/Histogram/ http://74.125.113.132/search?q=cache:xsFr9hwTpewJ:www.swvgs.k12.va.us/FTP%2520DIRECTORY/kilgore/Review%2520graphs.doc+%22consider+the+following+histogram%22&cd=2&hl=en&ct=clnk&gl=us&client=firefox-a http://davidmlane.com/hyperstat/A11284.html http://www.itl.nist.gov/div898/handbook/eda/section3/histogr6.htm