A dataset containing detailed plot descriptions for 3670 British movies from 1920 to 2017 from their Wikipedia pages.

movies

Format

A data frame with 3670 rows and 7 variables. The variables are as follows:

year

Year of film release

title

Film title

director

Director of film

cast

Main cast of actors and actresses in film

genre

Genre of film

url

Wikipedia web page

plot

Film's plot from Wikipedia page

Source

The data are a subset of the Kaggle Wikipedia movie plots dataset https://www.kaggle.com/jrobischon/wikipedia-movie-plots

Note

This is a good dataset for text mining.

Examples

summary(movies)
#> year title director cast #> Min. :1920 Length:3670 Length:3670 Length:3670 #> 1st Qu.:1951 Class :character Class :character Class :character #> Median :1967 Mode :character Mode :character Mode :character #> Mean :1973 #> 3rd Qu.:2001 #> Max. :2017 #> genre url plot #> Length:3670 Length:3670 Length:3670 #> Class :character Class :character Class :character #> Mode :character Mode :character Mode :character #> #> #>
movies[1, ]
#> year title director cast #> 1 1920 The Amateur Gentleman Maurice Elvey Langhorn Burton, Cecil Humphreys #> genre url #> 1 drama https://en.wikipedia.org/wiki/The_Amateur_Gentleman_(1920_film) #> plot #> 1 In Regency Britain a young man tries to establish his father's innocence of an accused crime, by travelling to London disguised as a gentleman.