WSB image

Reddit /r/WallStreetBets data for August of 2021

  • 25,751
  • 1,001,160

This NLP dataset encompasses a month's worth of post and comment data from the stock-trading powerhouse known as /r/WallStreetBets. The results come with some surface-level analysis done already.

To reproduce, run the following export:


Updated on Oct 4, 2021: Added post and comment scores.

  • Published on Sep 15, 2021
  • Licensed under CC BY 4.0
NNN image

Reddit /r/NoNewNormal dataset

  • 121,113
  • 2,474,569

This NLP dataset encompasses the complete post and comment history of the banned subreddit /r/NoNewNormal, a hotspot for vaccine hesitancy and related conspiracy theories on the Internet. As usual, the results come with some surface-level analysis already done.

To reproduce, run the following export:


Updated on Oct 5, 2021: Added post and comment scores.

  • Published on Sep 16, 2021
  • Licensed under CC BY 4.0
Crypto image

Reddit cryptocurrency data for August of 2021

  • 250,569
  • 3,756,097

This NLP dataset encompasses a month's worth of posts and comments from select cryptocurrency subreddits. The subreddits included are /r/cryptocurrency, /r/satoshistreetbets, /r/cryptomoonshots, and many others - indexed for the whole month of August 2021.

To reproduce, run the following exports:








  • Published on Sep 27, 2021
  • Licensed under CC BY 4.0
Million questions image

One Million Reddit Questions

  • 1,000,000

This NLP dataset contains a million of /r/AskReddit questions, going back from the end of September 2021.

To reproduce, run the following export:


  • Published on Oct 11, 2021
  • Licensed under CC BY 4.0
Million confessions image

One Million Reddit Confessions

  • 1,000,000

This NLP dataset contains a million confessions from four of Reddit's most popular confession subreddits, going back from the end of September 2021.

To reproduce, run the following exports:





  • Published on Oct 12, 2021
  • Licensed under CC BY 4.0