sparkplot: creating sparklines in Python with
matplotlib
What is sparkplot?
Sparkplot is a Python module that uses the matplotlib plotting
library to create sparklines.
If you're in a hurry, you can download sparkplot from here, then read
below for installation and
usage instructions.
Edward Tufte introduced
sparklines in a sample
chapter of his upcoming book "Beautiful Evidence". In his words,
sparklines are "small,
high-resolution graphics embedded in a context of words, numbers,
images. Sparklines are data-intense, design-simple, word-sized graphics."
Sparkline examples
The following examples of sparkline graphics were created with
sparkplot.
Example 1
Here is the Los Angeles Lakers' road to their NBA title
in 2002. Wins are pictured with blue bars and losses with
red bars. Note how easy it is to see the streaks for wins and losses.
The Lakers' 2004
season was their last with Shaq, when they reached the NBA
finals and lost to Detroit (note the last 3 losses which sealed their
fate in the finals).
Compare those days of glory with their abysmal
2005
performance, with only 2 wins in the last 21 games.
Also note how the width of the last graphic is less than the previous
2, a consequence of the Lakers not making the playoffs this year.
Example 2
The southern oscillation is defined as the barometric pressure
difference between Tahiti and the Darwin Islands at sea level. The
southern oscillation is a predictor of El Nino which in turn is thought
to be a driver of world-wide weather. Specifically, repeated southern
oscillation values less than -1 typically defines an El Nino.
Here is a sparkline for the southern oscillation from
1955
to
1992 (456 sample data points obtained from NIST). The sparkline is
plotted with a horizontal span drawn along the x axis covering data
values between -1 and 0, so that values less than -1 can be more
clearly seen.
Example 3
Here is the per capita income in California from 1959
to 2003.
And here is the "real" per capita income (adjusted for inflation) in
California, from 1959
to 2003.
Example 4
Here is the monthly distribution of messages sent to comp.lang.py from
1994 to 2004, plotted per year. Minimum and maximum values are shown
with blue dots and labeled in the graphics.
| Year |
|
Total |
| 1994 |
 |
3,018 |
| 1995 |
 |
4,026 |
| 1996 |
 |
8,378 |
| 1997 |
 |
12,910 |
| 1998 |
 |
19,533 |
| 1999 |
 |
24,725 |
| 2000 |
 |
42,961 |
| 2001 |
 |
55,271 |
| 2002 |
 |
56,750 |
| 2003 |
 |
64,548 |
| 2004 |
 |
56,184 |
There was an almost constant increase in the number of messages per
year, from 1994
to
2004, the only exception being 2004, when there were
fewer message than in 2002 and 2003.
Sparkplot
installation and usage
1) Install the Numeric
Python module or the numarray
Python module (required by matplotlib).
2) Install matplotlib.
3) Download sparkplot.
4) Prepare data files: sparkplot simplistically assumes that its input
data file contains just 1 column of numbers.
5) Run sparkplot.py.
Here are some command-line examples to get you
going:
Example 1
- given only the input file and no other option, sparkplot.py will
generate a gray sparkline with the first and last data points plotted
in red
Running
sparkplot.py -i
CA_real_percapita_income.txt
produces:
The name of the output file is by default input_file_name_with_no_extension.png.
It can be changed with the -o
option.
The plotting of the first and last data points can be disabled with the
--noplot_first and --noplot_last options.
Example 2
- given the input file and the --label_first
--label_last --format=currency
options, sparkplot.py will generate a gray sparkline with the first and
last data points plotted in red and with the first and last data values
displayed in a currency format
Running
sparkplot.py -i
CA_real_percapita_income.txt --label_first --label_last
--format=currency
produces:
The currency symbol is $ by default, but it can be changed with the
--currency option.
Example 3
- given the input file and the
--plot_min --plot_max --label_min
--label_max --format=comma options,
sparkplot.py will generate a gray sparkline with the first and last
data points plotted in red, with the min. and max. data points plotted
in blue, and with the min. and max. data values displayed in a 'comma'
format (e.g. 23,456,789)
Running
sparkplot.py -i
clpy_1997.txt --plot_min --plot_max --label_min --label_max
--format=comma
produces:
Example 4
- given the input file and the --type=bars
option, sparkplot.py will draw
blue bars for the positive data values and red bars for the negative
data values
Running
sparkplot.py -i
lakers2005.txt --type=bars
produces:
As a side note, I think bar plots look better when the data file
contains a relatively large number of data points, and the variation of
the data is relatively small. This type of plots works especially well
for sports-related graphics, where wins are represented as +1 and
losses as -1.
Example 5
- given the input file and the -t
or --transparency option,
sparkplot.py will generate a transparent background for the PNG image
it produces
Running
sparkplot.py -i
CA_real_percapita_income.txt -t
produces a transparent-background image, which is shown here on top of
a table cell with a yellow background:
For other sparkplot options, run sparkplot.py -h
Other sparkline
implementations
Concluding thoughts
Kudos to John Hunter, the creator
of matplotlib. I found this
module extremely powerful and versatile. For a nice introduction to
matplotlib, see also John's talk at
PyCon05.
I
hope the sparkplot module will prove to be useful when
you need to include sparkline graphics in your Web pages. All the
caveats associated with alpha-level software apply :-) Let me know if
you find it useful. I'm very much a beginner at using matplotlib, and
as I become more acquainted with it I'll add more functionality to
sparkplot.
Grig Gheorghiu
grig _at_ gheorghiu _dot_ net
agiletesting.blogspot.com