ggplot2
One student from Philippine Science High School-Cordillera Administrative Region Campus shared a drawing of Super Mario plotted using Cartesian Coordinates by hand. The drawing comes with hand-written note containing coordinates of the shapes that comprise the Super Mario drawing. I wanted to try plotting the points manually at first but I could not read some of the coordinates in the hand-written note.1 Doing a bit of searching in the web, I found this PDF containing the coordinates for the corresponding body parts (shapes) of Super Mario. The PDF turns out to be a worksheet for a plotting work.
Now, the list of coordinates is long. And I am lazy at plotting points manually. So I thought, if I can only parse the coordinates and store them as a tidy data set, I can easily plot Mario using ggplot2
.
In order to plot Mario in R using ggplot2
, we need to do the following:
ggplot2
to connect the points according to the shape that they are grouped in.Tidy data is the foundation of a ggplot2
graphic. To know more about tidy data, read this by Hadley Wickham published in the Journal of Stistical Software.
We can actually parse the required information directly from the PDF in R, but I think the old trick of copying and pasting will do. I have copied and pasted the relevant information from the PDF and saved it in the file mario.txt
that you can download here. Again, since I am a lazy person, I did not do any more tidying beyond that. Everything else will be done in R.
We shall now import the text file into R.
library(tidyverse)
## + ggplot2 2.2.1.9000 Date: 2017-10-12
## + tibble 1.3.4 R: 3.4.2
## + tidyr 0.7.1 OS: Manjaro Linux
## + readr 1.1.1 GUI: X11
## + purrr 0.2.3 Locale: en_PH.UTF-8
## + dplyr 0.7.4 TZ: Asia/Manila
## + stringr 1.2.0
## + forcats 0.2.0
## ── Conflicts ────────────────────────────────────────────────────
## * filter(), from dplyr, masks stats::filter()
## * lag(), from dplyr, masks stats::lag()
mario <- "mario.txt"
mario <- readChar(mario, file.info(mario)$size)
str(mario)
## chr "# Shape 01\n(-11,4.5) , (-10.5,5.5) , (-9,4.5) , (-10,3.5)\n# Shape 02\n(-9,4.5) , (-8.5,3.5) , (-8,2) , (-7.5,"| __truncated__
At this point, we have a very messy data. We can see that each shape grouping is indicated by #
symbol, the word Shape
, and a two-digit, numeric identifier. Also, every new line is indicated as \n
. We shall remove \n
from mario
first.
mario <- str_replace_all(mario, "\n", "")
We shall now remove the pattern # Shape no
from mario
and store the data as a vector of coordinates. Here, [:blank:]
is a literal for a blank space, [:alnum:]
is a literal for an alphanumeric character. [^[:alnum:]]
is not an alphanumeric character.
mario <- str_split(mario, "[^[:alnum:]][:blank:]Shape[:blank:][:alnum:][:alnum:]", simplify = TRUE)
head(t(mario))
## [,1]
## [1,] ""
## [2,] "(-11,4.5) , (-10.5,5.5) , (-9,4.5) , (-10,3.5)"
## [3,] "(-9,4.5) , (-8.5,3.5) , (-8,2) , (-7.5,1) , (-8.5,.5) , (-9,1) , (-10,2) , (-11,3.5) , (-11.5,5) , (-11,6) , (-10,6.5) , (-9.5,6.5) , (-9,6) , (-8.5,5) , (-8.5,3.5)"
## [4,] "(-8,2) , (-7.5,2.5) , (-7,3.5) , (-6.5,5) , (-7.5,7) , (-8,7) , (-9.5,6.5) , (-9.5,7) , (-11,9.5) , (-11,11.5) , (-10,13) , (-8,15) , (-5.5,17) , (-3.5,18) , (-1,19) , (1,19) , (4,17.5) , (6,15.5) , (7,14) , (7.5,12) , (7.5,10) , (6.5,9.5) , (5,10.5) , (3.5,11) , (-1.5,11) , (-4,10) , (-8,7.5) , (-8,7)"
## [5,] "(-1.5,11) , (-2.5,12) , (-2.5,13.5) , (-2,14.5) , (-1,15) , (0,15.5) , (2,15.5) , (3.5,15) , (4,14) , (4,13) , (3.5,12) , (3.5,11)"
## [6,] "(-.5,11.5) , (-1.5,12) , (0,15) , (1,14) , (1.5,15) , (3.5,12.5) , (2.5,11.5) , (1.5,13.5) , (1,12.5) , (0,13.5) , (-.5,11.5)"
As we can now see, the points are now somewhat indexed according to the shape where they belong. We shall now make use of this fact and convert mario
into a data.frame
object.
mario <- data.frame(coords = t(mario))
head(mario)
## coords
## 1
## 2 (-11,4.5) , (-10.5,5.5) , (-9,4.5) , (-10,3.5)
## 3 (-9,4.5) , (-8.5,3.5) , (-8,2) , (-7.5,1) , (-8.5,.5) , (-9,1) , (-10,2) , (-11,3.5) , (-11.5,5) , (-11,6) , (-10,6.5) , (-9.5,6.5) , (-9,6) , (-8.5,5) , (-8.5,3.5)
## 4 (-8,2) , (-7.5,2.5) , (-7,3.5) , (-6.5,5) , (-7.5,7) , (-8,7) , (-9.5,6.5) , (-9.5,7) , (-11,9.5) , (-11,11.5) , (-10,13) , (-8,15) , (-5.5,17) , (-3.5,18) , (-1,19) , (1,19) , (4,17.5) , (6,15.5) , (7,14) , (7.5,12) , (7.5,10) , (6.5,9.5) , (5,10.5) , (3.5,11) , (-1.5,11) , (-4,10) , (-8,7.5) , (-8,7)
## 5 (-1.5,11) , (-2.5,12) , (-2.5,13.5) , (-2,14.5) , (-1,15) , (0,15.5) , (2,15.5) , (3.5,15) , (4,14) , (4,13) , (3.5,12) , (3.5,11)
## 6 (-.5,11.5) , (-1.5,12) , (0,15) , (1,14) , (1.5,15) , (3.5,12.5) , (2.5,11.5) , (1.5,13.5) , (1,12.5) , (0,13.5) , (-.5,11.5)
We have created a variable coords
in the mario
data frame that contains the coordinates. Each row now contains all of the coordinates that corresponded to the same shape. Unfortunately, R cannot yet use this for plotting since the data is not yet tidy. Before we do anything else, let us create the shape
variable that will contain the shape group identifier for each row.
mario <- mario %>% mutate(shape = rownames(mario))
head(mario,2)
## coords shape
## 1 1
## 2 (-11,4.5) , (-10.5,5.5) , (-9,4.5) , (-10,3.5) 2
nrow(mario)
## [1] 40
We can see that the first row contains no coordinates. We shall fix this soon enough. We shall also split the coordinates separated by a blank space, a comma, and another blank space, and store the separated coordinates as new rows, but still retaining the shape ID of each pair of coordinates.
mario <- mario %>%
mutate(coords = strsplit(as.character(coords), " , ")) %>%
unnest(coords)
head(mario, 10)
## shape coords
## 1 2 (-11,4.5)
## 2 2 (-10.5,5.5)
## 3 2 (-9,4.5)
## 4 2 (-10,3.5)
## 5 3 (-9,4.5)
## 6 3 (-8.5,3.5)
## 7 3 (-8,2)
## 8 3 (-7.5,1)
## 9 3 (-8.5,.5)
## 10 3 (-9,1)
nrow(mario)
## [1] 365
As you can see, the new data frame contains one less row now. The shape ID now also starts at 2. We can actually leave it at that. But since I am somewhat obsessive about these details, I subtracted one from each shape
ID.
mario <- mario %>% mutate(shape = as.integer(shape) - 1)
head(mario,10)
## shape coords
## 1 1 (-11,4.5)
## 2 1 (-10.5,5.5)
## 3 1 (-9,4.5)
## 4 1 (-10,3.5)
## 5 2 (-9,4.5)
## 6 2 (-8.5,3.5)
## 7 2 (-8,2)
## 8 2 (-7.5,1)
## 9 2 (-8.5,.5)
## 10 2 (-9,1)
nrow(mario)
## [1] 365
We are slowly getting there. But R does not recognize coords
yet as a pair of \(x\) and \(y\) coordinates. We need to separate them into different columns.
mario <- separate(mario, coords, sep = ",", into = c("x","y"))
head(mario)
## shape x y
## 1 1 (-11 4.5)
## 2 1 (-10.5 5.5)
## 3 1 (-9 4.5)
## 4 1 (-10 3.5)
## 5 2 (-9 4.5)
## 6 2 (-8.5 3.5)
Some final steps have to be undertaken. We need to remove the parentheses from the coordinates, and then tell R that \(x\) and \(y\) are numeric
and not characters
.
mario <- mario %>%
mutate(
# delete the first occurrence of a non-alphanumeric character
x = as.numeric(str_replace(x,"[^[:alnum:]]", "")),
# delete the last occurrence of a non-alphanumeric character
y = as.numeric(str_replace(y,"[^[:alnum:]]$", ""))
)
head(mario)
## shape x y
## 1 1 -11.0 4.5
## 2 1 -10.5 5.5
## 3 1 -9.0 4.5
## 4 1 -10.0 3.5
## 5 2 -9.0 4.5
## 6 2 -8.5 3.5
Looking at the data mario
, we can now see that each occurrence of a pair of points belong to only one row, and that each column contains only one variable. Our data is now tidy! We have stored the tidied data set in the file mario_tidy.txt
. You can also download that here.
We can easily plot the points by using geom_point
in ggplot2
.
ggplot(mario, aes(x, y)) +
geom_point()
Hey! I think we can already distinguish Super Mario with just the points! But how do we connect the points (dots) sequentially? Well, good for us, geom_path
does just that! However, we want to connect only the points that belong to a unique shape identifier. That is easily done with the group
aesthetic in geom_path
.
ggplot(mario, aes(x, y)) +
geom_point() +
geom_path(aes(group=shape))
We can clean the figure by removing the points.
ggplot(mario, aes(x, y)) +
geom_path(aes(group=shape))
I prefer a clean background for Super Mario.
ggplot(mario, aes(x, y)) +
geom_path(aes(group=shape)) +
theme_bw()
And that is it! Some work should be done if you want to add some colors to this plot. Ggplot2’s geom_polygon
is up to such a task. We only need to specify the color for each identifier. In fact we need to specify some more groupings of coordinates in order for this to work as you can see in the image below.
ggplot(mario, aes(x, y)) +
geom_polygon(aes(group=shape,fill=as.factor(shape))) +
theme_bw() +
geom_path(aes(group=shape)) +
theme(
legend.position = "none"
)
As background, the student’s math class will soon have an assessment about the Cartesian Coordinate system and the topic includes plotting points and lines.↩