Mapping Waldo: Choices in Visualization

Waldo, also known as “Wally” in some countries, is a fictional character who gained popularity through a series of children’s puzzle books created by British illustrator Martin Handford. The goal of these books is to find Waldo among a large and detailed illustration filled with numerous people, objects, and colorful scenes.

Dr. Olson created a dataset marking Waldo’s appearances across these books, and it would be interesting to explore the distribution and where should one start looking given the pattern found in past books. This blog is going to visualize the data using R and discuss the pros and cons of the four different illustrations strategies.

waldo <- read.csv("wheres-waldo-locations.csv", header=TRUE)
library(MASS); library(ggplot2) with(waldo, plot(X, Y, col=adjustcolor("steelblue", 0.5), pch=19, cex=5

This chart uses alpha blending, each point represents one appearance of Waldo. Thus, a darker color represents higher frequency of Waldo’s appearance.

Pro: Clearly map all locations in one place, semi-transparent dots allow the overlap to be emphasized, with the frequency information.

Cons: The overlaps are of smaller sizes, giving reader the illusion of low rate of appearance, and the irregular shapes makes the graph not as clean.

require("hexbin") x = waldo$X y = waldo$Y bins <- hexbin(x, y, xbins=10) hexbinplot(y ~ x, data=waldo, xbins=10,colramp=function(n) rev(gray.colors(n)), aspect=1, main="Hexagonal bin histogram")

This chart divides the map into equal-size hexagons, and note the number of times Waldo has appeared on each hexagon, with colour coded frequency.

Pros: Simple and Concise, with eye-catching highlights on the most likely spots.

Cons: This method forces the a square presentation, which works less ideally for this specific dataset which is a rectangular map. It doesn’t allow easy pin-point to the map compared to the first chart.

den <- kde2d(x=x, y=y, n=100 ) zlim <- range(den$z) plot(x, y, pch=19, col=adjustcolor("steelblue", 0.5)) contour(den$x, den$y, den$z, col="grey10", levels = pretty(zlim, 10), lwd=1, add=TRUE)

Use frequency of Waldo’s appearance to create a contour map, and put the actual appearances on the map in the form of dots.

Pros: Easy for reader to find peaks while keeping the specific location information.

Cons: Density plot could be a little less intuitive, and learning curve exists for reading a contour map.

ggplot(waldo, aes(x,y)) + geom_point(aes(color = Book))

This chart colour codes positions from different books differently, and plots the position points on xy-plane.

Pros: Straight-forward historical positions, colour coding helps identify trends across books.

Cons: Readers need to eye-ball the density.

The exploration of the four ways to visualize the Waldo problem underscores the diversity of approaches available. Each method comes with its unique set of benefits and drawbacks. The choice among these approaches ultimately hinges on the specific business problem and audience behind. By understanding the strengths and limitations of each visualization method, we can make informed decisions, selecting the most suitable approach that aligns with the nature of the problem and the objectives of analysis.

Leave a Reply

Related Post

Data Analysis of Apartment Buildings in the Toronto AreaData Analysis of Apartment Buildings in the Toronto Area

Analyzing the Pandemic’s Impact with ArcGISAnalyzing the Pandemic’s Impact with ArcGIS

GTA Real Estate Trends: A React Application for Tracking Property PricesGTA Real Estate Trends: A React Application for Tracking Property Prices