官术网_书友最值得收藏!

Visualizing information

In many chapters, results are visualized using the graphics capabilities of R. Thus, we will give a very short introduction to the base graphics package, plus a short introduction to the package ggplot2 (Wickham, 2009).

The reader will learn briefly about the graphical system in R, different output formats for traditional graphics system, customization and fine tuning of standard graphics, and the ggplot2 package.

Tip

Other packages such as ggmap, ggvis, lattice, or grid are not touched on here. Interactive graphics are also beyond the scope of this book (Google Charts, rgl, iplots, JavaScript, and R).

The graphics system in R

Many packages include methods to produce plots. Generally, they either use the functionality of the base R package called graphics or the functionality of the package grid.

For example, the package maptools (Bivand and Lewin-Koh, 2015) includes methods for mapping; with this package one can produce maps. It uses the capabilities of the graphics package. Package ggplot2 is based on the grid package and uses the functionality of the grid package for constructing advanced graphics.

Both packages, graphics and grid, include basic plot methods, and both packages use the graphical system of the operating system for plotting, which is hidden from the user. These graphical features from the operating system are attached by the graphical devices (?grDevices) of R, which also include support for colors and fonts.

The graphical output is either on screen or saved as a file. The screen devices pop-up as soon as a function such as plot()* is called. Typical screen devices are X11()** (X Windows window), windows() (Microsoft Windows window), and quartz (OS X). File devices include postscript() (PostScript format), pdf() (PDF format), jpeg() (JPEG), bmp (bitmap format), svg() (scalable vector graphics), and cairo() — the cairo-based graphics device that comes with its own graphic library to generate PDF, PostScript, SVG, or bitmap output (PNG, JPEG, TIFF), and X11.

In practice, there is almost no difference between plotting on the screen or in a PDF. Note that the current state of a device can be stored and copied to other devices.

The most common graphic devices are X11(), pdf(), postscript(), png(), and jpg().

For example, to save in pdf:

pdf(file = "myplot.pdf")
plot(Horsepower ~ Weight, data = Cars93)
dev.off ()

The available graphics devices are machine dependent. You can check the available ones using ?Devices.

Note that various function arguments such as width, height, and quality, ..., can be used to modify the output.

The main question is which output format should be used?

The answer is rather simple: X11 for displaying the image (automatically with, for example, RStudio); and pdf (or PostScript) for line graphics. These graphics are scalable without losing quality. png (or jpg) should be used for pixel graphics or graphics with many data points. Pixel graphics are not scalable without losing quality. svg has advantages in the browser (scalable, responsive — important in web design), but it does not include all fonts.

The graphics package

The package graphics is the traditional graphics system. Even though there are other packages for visualization available that produce publication quality plots perfectly, the graphics package is mainly used for producing figures quickly in order to explore content on-the-fly. Within this package, different types of function exist:

  • High-level graphics functions: Opens a device and create a plot (for example, plot())
  • Low-level graphics functions: Add output to existing graphics (for example, points())
  • Interactive functions: Allow interaction with graphical output (for example, identify())

Typically, a combination of multiple graphics functions is used to create a plot.

Each graphics device can be seen as a (abstract) sheet of paper. Thus with the graphics package we can draw using many pens in many colors, but no eraser is available. Multiple devices can be simultaneously opened and one can draw in one (the "active") graphics device.

Warm-up example – a high-level plot

We want to plot filled circles where the size depends on the absolute value of y and the color depends on the value of x:

x <- 1:20 / 2 # x ... 0.5, 1.0, 1.5, ..., 10.0
y <- sin(x)
plot(x, y, pch = 16, cex = 10 * abs(y), col = grey(x / 14))

We have already learned that plot() is a high-level plot function, and that we can add something to an existing plot with a low-level plot function. Let's add some text, a curve and a line, as seen in the following figure:

plot(x, y, pch = 16, cex = 10 * abs(y), col = grey(x / 14)) 
text(x, y, 1:20, col="yellow")
curve(sin, -2 * pi, 4 * pi, add = TRUE, col = "red")
abline(h = 0, lty = 2, col = "grey")

The function plot() is a generic function. It is overloaded with methods, and R selects the method depending on the class of the object to be plotted (method dispatch of R). It also shows different outputs depending on the class of the object to be plotted.

This can be seen in the following, where a numeric vector is first plotted, and then a factor, and then a data frame with one variable as a factor, and finally an object of class ts, as shown in the following figure:

par(mfrow=c(2,2))
mpg <- mtcars$mpg
cyl <- factor(mtcars$cyl)
df <- data.frame(x1=cyl, x2=mpg)
tmpg <- ts(mpg)
plot(mpg); plot(cyl); plot(df); plot(tmpg)

To know which plot methods are currently available, one can type methods(plot) into R:

tail(methods(plot)) ## last 6
## [1] "plot.TukeyHSD" "plot.tune" "plot.varclus" "plot.Variogram"
## [5] "plot.xyVector" "plot.zoo"
## number of methods for plot
length(methods(plot))
## [1] 145

Subsequent calls produce (almost) equivalent results. Note that the last call uses the formula interface of R, see ?formula:

plot(x=mtcars$mpg, y=mtcars$hp)
plot(mtcars$mpg, mtcars$hp)
plot(hp ~ mpg, data=mtcars)

Control of graphics parameters

Customizing graphics and changing the default output is almost always necessary, because: high-level plot functions do not always produce the desired final result, functionality for the fine tuning of graphics is necessary (colors, icons, fonts, line widths, etc); you need information about the plot regions and coordinate system to place the output of low-level functions, multiple graphs on a page.

Graphical parameters are the key to changing the appearance of graphics, including, for example, colors, fonts, linetypes, and axis definitions.

Each open device has its own independent list of graphics parameters and most parameters can be directly specified in high- or low-level plotting functions.

Important: all graphic parameters can be set via function par, see ?par. The most important function arguments of par are: mfrow for multiple graphics, col for colors, lwd for line widths, cex for the size of symbols, pch for selecting symbols, and lty for different kinds of lines.

In the following examples we will only discuss the control of colors.

Here are some possibilities: in R you can default address colors by name via colors(). rgb() to mix red-green-blue. Using hsv() is even better as it offers a pre-defined set of pallets with rainbow colors and many others, for example, ?rainbow, predefined set of palettes with palette().

Predefined palettes using the RColorBrewer package (Neuwirth, 2014).

Multiple plots with package graphics can be done via par(mfrow = c (2,2)). However, a more flexible is to use the function layout, see ?layout for examples.

We will show one example for layout by constructing a plot that doesn't exist in R. Note that a slightly modified version of this example was used in the lectures of Friedrich Leisch.

We will first calculate some numbers that are used afterwards to place the graphics:

## min und max in both axis
xmin <- min(mtcars$mpg); xmax <- max(mtcars$mpg)
ymin <- min(mtcars$hp); ymax <- max(mtcars$hp)

## calculate histograms
xhist <- hist(mtcars$mpg, breaks=15, plot=FALSE)
yhist <- hist(mtcars$hp, breaks=15, plot=FALSE)

## maximum count
top <- max(c(xhist$counts, yhist$counts))
xrange <- c(xmin,xmax)
yrange <- c(ymin, ymax)

We now produce the following figure:

m <- matrix(c(2, 0, 1, 3), 2, 2, byrow = TRUE)
## define plot order and size
layout(m, c(3,1), c(1, 3), TRUE)
## first plot
par(mar=c(0,0,1,1))
plot(mtcars[,c("mpg","hp")], xlim=xrange, ylim=yrange, xlab="", ylab="") 
## second plot -- barchart of margin
par(mar=c(0,0,1,1))
barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0)
## third plot -- barchart of other margin
par(mar=c(3,0,1,1
))
barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)

The ggplot2 package

Why use ggplot2?

  • Gives a consistent and systematic approach to generating graphics
  • Based on the book Grammar of Graphics by Wilkinson (1999)
  • Very flexible
  • Customizable, allows you to define your own themes (for example, for cooperative designs in companies)
  • But slow and not as easy to learn

In ggplot2, the parts of a plot are defined independently. The anatomy of a plot consists of: data, must be a data frame (object of class data.frame); and aesthetic mapping, which describes how variables in the data are mapped to visual properties (aesthetics) of geometric objects. This must be done within the function aes(). assignment, where values are mapped to visual properties. It must also be done outside the function aes(). geometric objects (geom's, aesthetic will be mapped to geometric objects), for example, geom_point() - statistical transformations, for example, function stat_boxplot(), scales, coordinate system, position adjustments, and faceting, for example, function facet_wrap.

Aesthetic means "something you can see", for example, colors, fill (color), shape (of points), linetypes, size, and so on. Aesthetic mapping to geometric objects is done using function aes().

To make a scatterplot with Horsepower and MPG.city for the Cars93 data set, you use the following command:

library("ggplot2")
ggplot(Cars93, aes(x = Horsepower, y = MPG.city)) + geom_point(aes(colour = Cylinders))

This command generates the following output:

Here, we mapped Horsepower to the x variable, MPG.city to the y variable, Cylinders to color (within aes()!). We used geom_point to tell ggplot2 to produce a scatterplot. Note that a statistical transformation is always defined, mostly this, as in our example, is the identity (leave the points as they are).

Note that each type of geom accepts only a subset of aesthetics (for example, setting shape in aes() and makes no sense when mapped to geom_bar) We add a geom by using +.

We can simple use more than one geom, here we also add a scatterplot smoothing, and we perform an aesthetic mapping of weights to color (within aes()). Automatically, a legend is produced, as seen in the following screenshot:

g1 <- ggplot(Cars93, aes(x=Horsepower, y=MPG.city))
g2 <- g1 + geom_point(aes(color=Weight))
g2 + geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

We still have g1 available in our workspace, so we can add something else, for example, text, resulting in following screenshot:

g1 <- g1 + geom_text(aes(label=substr(Manufacturer,1,3)), size=3.5)
g1 + geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

Each block of a ggplot2 is defined by a list of parameters. To make life easy, sensible standard default parameter values are set.

All (geoms) are associated with statistical transformations. For some geoms, the data is modified, for example, geom_boxplot. Note that geom_boxplot must not be called whenever the statistical transformation stat_boxplot is called — automatically ggplot2 knows that it should use geom_boxplot.

With scales, aesthetics for variables can be defined, such as color and fill (color), size, shape, linetype, and so on, by using the function **scale__**.

A "standardized" graphic for each group in the data can be done — for one grouping variable use facet_wrap(), for two grouping variables use facet_grid().

We will show one example of how to set the syntax for facet_wrap, and we use another theme for the following output:

gg <- ggplot(Cars93, aes(x=Horsepower, y=MPG.city))
gg <- gg + geom_point(aes(shape = Origin, colour = Price))
gg <- gg + facet_wrap(~ Cylinders) + theme_bw()
gg

The output is as follows:

Note that themes can be used for cooperative design. Two themes comes with ggplot2: theme_gray() (the default one), and theme_bw().

Take a look at more information on themes by typing ?theme_gray into R references:

主站蜘蛛池模板: 航空| 喀喇沁旗| 班戈县| 乌拉特中旗| 工布江达县| 衡东县| 余庆县| 伊金霍洛旗| 颍上县| 安宁市| 潞城市| 吉木乃县| 山丹县| 柳河县| 明水县| 泗阳县| 光泽县| 濮阳市| 平遥县| 庆元县| 西贡区| 青川县| 晋中市| 嘉峪关市| 磐安县| 梁平县| 安福县| 海淀区| 逊克县| 罗源县| 辰溪县| 简阳市| 陕西省| 什邡市| 盖州市| 遵化市| 左云县| 泾阳县| 伊川县| 霍州市| 瓦房店市|