Chris Lloyd wrote:
Replying to John. If you plot price against say carats, you will see the truncation very clearly (at about $19,000).
Chris, the real issue is that the distribution of values of 'carat' is strongly banded, which gives the dark vertical bands in the plots to which you refer. Most of the points are concentrated in these bands, so that one does not see the dropoff in density as the price increases. One needs to use smoothScatter() or an equivalent to see a more visually meaningful picture. I am attaching the plot from
with(ggplot::diamonds,(smoothScatter(carat,price)))
Or, repeat the ggplot2 plot with a small enough sample of data that the dark bands separate out to show the separate points. Possibly also an issue is that only for higher valued diamonds is there finesse in setting the carat, setting it higher and moving the carat measures for high dollar diamonds up and away from the dark vertical bands. The second plot is from a sample of 1000 points.
samp1K <- diamonds[sample(1:nrow(diamonds),1000),]
ggplot(aes(x=carat, y=price), data=samp1K) +
geom_point(fill=I("#F79420"), color=I("black"), shape=21) +
scale_x_continuous(lim = c(0, quantile(diamonds$carat, 0.99)) ) +
scale_y_continuous(lim = c(0, quantile(diamonds$price, 0.99)) ) +
ggtitle("Diamonds: Price vs. Carat")