Newsroom

Summarization Dataset

The Data

CORNELL NEWSROOM is a large dataset for training and evaluating summarization systems. It contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications. The summaries are obtained from search and social metadata between 1998 and 2017 and use a variety of summarization strategies combining extraction and abstraction.

Grusky et al., 2018    

Getting Started

Use this site to explore the dataset and better understand the task of summarization as used by newsrooms around the Web.

Explore example summaries in the dataset across publications, time, and summarization strategies, analyze overall statistics in the dataset across these categories, and evaluate performance of existing summarization systems trained and tested on the unreleased NEWSROOM data.

The full dataset is available to download online with tools for extracting text and summaries from Archive.org, analyzing summary extractiveness, and evaluating system performance.

Submission

NEWSROOM contains an independent, unreleased test dataset for evaluation of leaderboard submission. To submit a system for evaluation, read the submission instructions. To preserve the integrity of the unreleased test data, submissions must be made at least two months apart.

Leaderboard

Date System R-1 R-2 R-L HUM

ABS   Abstractive Systems

EXT   Extractive Systems

MIX   Mixed-Strategy Systems

The leaderboard ranks systems using unstemmed, untokenized ROUGE-1 F-score by default, in order to fully reflect the difficulty of the summarization task, account for generated summary length, and measure performance most comparably across systems. Explore other stemming, tokenization, and ROUGE score variants above. HUM is the composite score of the NEWSROOM human evaluation task.

Click system names in the table to explore their performance in depth and read example summaries.

* Evaluation performed in Grusky, et al. 2018.

NEWSROOM was developed as part of the Connected Experiences Lab.
This work is supported by Oath and by a Google Research award.