Predicting Film Box Office Performance Using Wikipedia Edit Data


Authors : Niraj Patel

Volume/Issue : Volume 10 - 2025, Issue 2 - February


Google Scholar : https://tinyurl.com/33ym29nh

Scribd : https://tinyurl.com/4d5uvaau

DOI : https://doi.org/10.5281/zenodo.14987459


Abstract : This study explores the potential of Wikipedia edit data as a predictor of opening box office revenues for films released in the US. After analyzing films from 2007 to 2011, we developed a predictive model based on Wikipedia article edits using gradient boosting trees as the primary algorithm. Our model incorporates features such as the frequency of Wikipedia edits, the size and content of article revisions, and the revenues of similar films. The results demonstrate that Wikipedia activity can serve as a rough indicator of film popularity, though the model’s predictive accuracy is limited. We find that Wikipedia-based features, particularly edit runs and content changes, significantly contribute to the model’s performance, achieving an R2 of 0.54 for films released in 2012. This suggests that while Wikipedia data offers valuable insights into social interest, it is best used in conjunction with other predictors for more reliable revenue estimates.

References :

  1. “Ensemble methods.” Retrieved 13 Jan 2012. http://scikit-learn. org/stable/modules/ensemble.html
  2. Friedman, Jerome H. (19 Apr 2001). “Greedy Function Approx- imation: A Gradient Boosting Machine.” Retrieved 10 Jan 2012. http://www-stat.stanford.edu/∼jhf/ftp/trebst.pdf
  3. “Gradient boosting.” Retrieved 13 Jan 2012. http://en.wikipedia. org/wiki/Gradient boosting
  4. “List of hoaxes on Wikipedia.” Retrieved 10 Jan 2012. http:// en.wikipedia.org/wiki/Wikipedia:List  of  hoaxes  on Wikipedia
  5. Pfeiffer, Eric (4 Jan 2013). “War is over: Imaginary ‘Bicholm’ conflict removed from Wikipedia after five years.” Retrieved 10 Jan 2012.
  6. “Wikipedia.” Retrieved 10 Jan 2012. http://en.wikipedia.org/ wiki/Wikipedia

This study explores the potential of Wikipedia edit data as a predictor of opening box office revenues for films released in the US. After analyzing films from 2007 to 2011, we developed a predictive model based on Wikipedia article edits using gradient boosting trees as the primary algorithm. Our model incorporates features such as the frequency of Wikipedia edits, the size and content of article revisions, and the revenues of similar films. The results demonstrate that Wikipedia activity can serve as a rough indicator of film popularity, though the model’s predictive accuracy is limited. We find that Wikipedia-based features, particularly edit runs and content changes, significantly contribute to the model’s performance, achieving an R2 of 0.54 for films released in 2012. This suggests that while Wikipedia data offers valuable insights into social interest, it is best used in conjunction with other predictors for more reliable revenue estimates.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe