Introduction

One student from Philippine Science High School-Cordillera Administrative Region Campus shared a drawing of Super Mario plotted using Cartesian Coordinates by hand. The drawing comes with hand-written note containing coordinates of the shapes that comprise the Super Mario drawing. I wanted to try plotting the points manually at first but I could not read some of the coordinates in the hand-written note.1 Doing a bit of searching in the web, I found this PDF containing the coordinates for the corresponding body parts (shapes) of Super Mario. The PDF turns out to be a worksheet for a plotting work.

Now, the list of coordinates is long. And I am lazy at plotting points manually. So I thought, if I can only parse the coordinates and store them as a tidy data set, I can easily plot Mario using ggplot2.

In order to plot Mario in R using ggplot2, we need to do the following:

  • Make a tidy data set from the coordinates provided in the PDF. This means that every row should contain a unique pair of \(x\) and \(y\) coordinates, and there should be separate columns for each of the \(x\) and \(y\) coordinates.
  • Use ggplot2 to connect the points according to the shape that they are grouped in.

Tidying the data

Tidy data is the foundation of a ggplot2 graphic. To know more about tidy data, read this by Hadley Wickham published in the Journal of Stistical Software.

We can actually parse the required information directly from the PDF in R, but I think the old trick of copying and pasting will do. I have copied and pasted the relevant information from the PDF and saved it in the file mario.txt that you can download here. Again, since I am a lazy person, I did not do any more tidying beyond that. Everything else will be done in R.

We shall now import the text file into R.

library(tidyverse)
## + ggplot2 2.2.1.9000        Date: 2017-10-12
## + tibble  1.3.4                R: 3.4.2
## + tidyr   0.7.1               OS: Manjaro Linux
## + readr   1.1.1              GUI: X11
## + purrr   0.2.3           Locale: en_PH.UTF-8
## + dplyr   0.7.4               TZ: Asia/Manila
## + stringr 1.2.0           
## + forcats 0.2.0
## ── Conflicts ────────────────────────────────────────────────────
## * filter(),  from dplyr, masks stats::filter()
## * lag(),     from dplyr, masks stats::lag()
mario <- "mario.txt"
mario <- readChar(mario, file.info(mario)$size)
str(mario)
##  chr "# Shape 01\n(-11,4.5) , (-10.5,5.5) , (-9,4.5) , (-10,3.5)\n# Shape 02\n(-9,4.5) , (-8.5,3.5) , (-8,2) , (-7.5,"| __truncated__

At this point, we have a very messy data. We can see that each shape grouping is indicated by # symbol, the word Shape, and a two-digit, numeric identifier. Also, every new line is indicated as \n. We shall remove \n from mario first.

mario <- str_replace_all(mario, "\n", "")

We shall now remove the pattern # Shape no from mario and store the data as a vector of coordinates. Here, [:blank:] is a literal for a blank space, [:alnum:] is a literal for an alphanumeric character. [^[:alnum:]] is not an alphanumeric character.

mario <-  str_split(mario, "[^[:alnum:]][:blank:]Shape[:blank:][:alnum:][:alnum:]", simplify = TRUE)
head(t(mario))
##      [,1]                                                                                                                                                                                                                                                                                                             
## [1,] ""                                                                                                                                                                                                                                                                                                               
## [2,] "(-11,4.5) , (-10.5,5.5) , (-9,4.5) , (-10,3.5)"                                                                                                                                                                                                                                                                 
## [3,] "(-9,4.5) , (-8.5,3.5) , (-8,2) , (-7.5,1) , (-8.5,.5) , (-9,1) , (-10,2) , (-11,3.5) , (-11.5,5) , (-11,6) , (-10,6.5) , (-9.5,6.5) , (-9,6) , (-8.5,5) , (-8.5,3.5)"                                                                                                                                           
## [4,] "(-8,2) , (-7.5,2.5) , (-7,3.5) , (-6.5,5) , (-7.5,7) , (-8,7) , (-9.5,6.5) , (-9.5,7) , (-11,9.5) , (-11,11.5) , (-10,13) , (-8,15) , (-5.5,17) , (-3.5,18) , (-1,19) , (1,19) , (4,17.5) , (6,15.5) , (7,14) , (7.5,12) , (7.5,10) , (6.5,9.5) , (5,10.5) , (3.5,11) , (-1.5,11) , (-4,10) , (-8,7.5) , (-8,7)"
## [5,] "(-1.5,11) , (-2.5,12) , (-2.5,13.5) , (-2,14.5) , (-1,15) , (0,15.5) , (2,15.5) , (3.5,15) , (4,14) , (4,13) , (3.5,12) , (3.5,11)"                                                                                                                                                                             
## [6,] "(-.5,11.5) , (-1.5,12) , (0,15) , (1,14) , (1.5,15) , (3.5,12.5) , (2.5,11.5) , (1.5,13.5) , (1,12.5) , (0,13.5) , (-.5,11.5)"

As we can now see, the points are now somewhat indexed according to the shape where they belong. We shall now make use of this fact and convert mario into a data.frame object.

mario <- data.frame(coords = t(mario))
head(mario)
##                                                                                                                                                                                                                                                                                                            coords
## 1                                                                                                                                                                                                                                                                                                                
## 2                                                                                                                                                                                                                                                                  (-11,4.5) , (-10.5,5.5) , (-9,4.5) , (-10,3.5)
## 3                                                                                                                                            (-9,4.5) , (-8.5,3.5) , (-8,2) , (-7.5,1) , (-8.5,.5) , (-9,1) , (-10,2) , (-11,3.5) , (-11.5,5) , (-11,6) , (-10,6.5) , (-9.5,6.5) , (-9,6) , (-8.5,5) , (-8.5,3.5)
## 4 (-8,2) , (-7.5,2.5) , (-7,3.5) , (-6.5,5) , (-7.5,7) , (-8,7) , (-9.5,6.5) , (-9.5,7) , (-11,9.5) , (-11,11.5) , (-10,13) , (-8,15) , (-5.5,17) , (-3.5,18) , (-1,19) , (1,19) , (4,17.5) , (6,15.5) , (7,14) , (7.5,12) , (7.5,10) , (6.5,9.5) , (5,10.5) , (3.5,11) , (-1.5,11) , (-4,10) , (-8,7.5) , (-8,7)
## 5                                                                                                                                                                              (-1.5,11) , (-2.5,12) , (-2.5,13.5) , (-2,14.5) , (-1,15) , (0,15.5) , (2,15.5) , (3.5,15) , (4,14) , (4,13) , (3.5,12) , (3.5,11)
## 6                                                                                                                                                                                   (-.5,11.5) , (-1.5,12) , (0,15) , (1,14) , (1.5,15) , (3.5,12.5) , (2.5,11.5) , (1.5,13.5) , (1,12.5) , (0,13.5) , (-.5,11.5)

We have created a variable coords in the mario data frame that contains the coordinates. Each row now contains all of the coordinates that corresponded to the same shape. Unfortunately, R cannot yet use this for plotting since the data is not yet tidy. Before we do anything else, let us create the shape variable that will contain the shape group identifier for each row.

mario <- mario %>% mutate(shape = rownames(mario))
head(mario,2)
##                                           coords shape
## 1                                                    1
## 2 (-11,4.5) , (-10.5,5.5) , (-9,4.5) , (-10,3.5)     2
nrow(mario)
## [1] 40

We can see that the first row contains no coordinates. We shall fix this soon enough. We shall also split the coordinates separated by a blank space, a comma, and another blank space, and store the separated coordinates as new rows, but still retaining the shape ID of each pair of coordinates.

mario <- mario %>% 
  mutate(coords = strsplit(as.character(coords), " , ")) %>%
  unnest(coords)
head(mario, 10)
##    shape      coords
## 1      2   (-11,4.5)
## 2      2 (-10.5,5.5)
## 3      2    (-9,4.5)
## 4      2   (-10,3.5)
## 5      3    (-9,4.5)
## 6      3  (-8.5,3.5)
## 7      3      (-8,2)
## 8      3    (-7.5,1)
## 9      3   (-8.5,.5)
## 10     3      (-9,1)
nrow(mario)
## [1] 365

As you can see, the new data frame contains one less row now. The shape ID now also starts at 2. We can actually leave it at that. But since I am somewhat obsessive about these details, I subtracted one from each shape ID.

mario <- mario %>% mutate(shape = as.integer(shape) - 1)
head(mario,10)
##    shape      coords
## 1      1   (-11,4.5)
## 2      1 (-10.5,5.5)
## 3      1    (-9,4.5)
## 4      1   (-10,3.5)
## 5      2    (-9,4.5)
## 6      2  (-8.5,3.5)
## 7      2      (-8,2)
## 8      2    (-7.5,1)
## 9      2   (-8.5,.5)
## 10     2      (-9,1)
nrow(mario)
## [1] 365

We are slowly getting there. But R does not recognize coords yet as a pair of \(x\) and \(y\) coordinates. We need to separate them into different columns.

mario <- separate(mario, coords, sep = ",", into = c("x","y"))
head(mario)
##   shape      x    y
## 1     1   (-11 4.5)
## 2     1 (-10.5 5.5)
## 3     1    (-9 4.5)
## 4     1   (-10 3.5)
## 5     2    (-9 4.5)
## 6     2  (-8.5 3.5)

Some final steps have to be undertaken. We need to remove the parentheses from the coordinates, and then tell R that \(x\) and \(y\) are numeric and not characters.

mario <- mario %>%
  mutate(
    # delete the first occurrence of a non-alphanumeric character
    x = as.numeric(str_replace(x,"[^[:alnum:]]", "")),
    # delete the last occurrence of a non-alphanumeric character
    y = as.numeric(str_replace(y,"[^[:alnum:]]$", ""))
  )
head(mario)
##   shape     x   y
## 1     1 -11.0 4.5
## 2     1 -10.5 5.5
## 3     1  -9.0 4.5
## 4     1 -10.0 3.5
## 5     2  -9.0 4.5
## 6     2  -8.5 3.5

Looking at the data mario, we can now see that each occurrence of a pair of points belong to only one row, and that each column contains only one variable. Our data is now tidy! We have stored the tidied data set in the file mario_tidy.txt. You can also download that here.

Plotting the coordinates

Where are the points located?

We can easily plot the points by using geom_point in ggplot2.

ggplot(mario, aes(x, y)) + 
  geom_point()
Plotting dots to connect

Plotting dots to connect

Connecting the dots

Hey! I think we can already distinguish Super Mario with just the points! But how do we connect the points (dots) sequentially? Well, good for us, geom_path does just that! However, we want to connect only the points that belong to a unique shape identifier. That is easily done with the group aesthetic in geom_path.

ggplot(mario, aes(x, y)) + 
  geom_point() +
  geom_path(aes(group=shape))
Connecting the dots

Connecting the dots

Removing the dots

We can clean the figure by removing the points.

ggplot(mario, aes(x, y)) + 
  geom_path(aes(group=shape))
Cleaning the guides

Cleaning the guides

I prefer a clean background for Super Mario.

ggplot(mario, aes(x, y)) + 
  geom_path(aes(group=shape)) + 
  theme_bw()
Plotting Mario on a white background

Plotting Mario on a white background

Some work to be done

And that is it! Some work should be done if you want to add some colors to this plot. Ggplot2’s geom_polygon is up to such a task. We only need to specify the color for each identifier. In fact we need to specify some more groupings of coordinates in order for this to work as you can see in the image below.

ggplot(mario, aes(x, y)) + 
  geom_polygon(aes(group=shape,fill=as.factor(shape))) +
  theme_bw() +
  geom_path(aes(group=shape)) + 
  theme(
    legend.position = "none"
  )
Technicolor Mario

Technicolor Mario


  1. As background, the student’s math class will soon have an assessment about the Cartesian Coordinate system and the topic includes plotting points and lines.