Is there already a popular open-source script to do that?
The Python library GPT Index (MIT license) can summarize a large document or collection of documents with GPT-3.
From the documentation:
index = GPTTreeIndex(documents)
response = index.query("<summarization_query>", mode="summarize")
The “default” mode for a tree-based query is traversing from the top of the graph down to leaf nodes. For summarization purposes we will want to use mode="summarize".
A summarization query could look like one of the following:
- “What is a summary of this collection of text?”
- “Give me a summary of person X’s experience with the company.”
The documentation includes a notebook with complete examples: https://github.com/jerryjliu/gpt_index/blob/main/examples/paul_graham_essay/TestEssay.ipynb
Another Python library: https://github.com/hwchase17/langchain (MIT license). From the documentation:
from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)
FYI {1,2} are two great papers looking at GPT-3 performance for summarization, but they only looked at short texts.
Update 2023-02-23: the next version of GPT may allow 32k tokens:

Update 2023-11-15: Interesting leaderboard for summarization of relatively short documents: https://github.com/vectara/hallucination-leaderboard

{2} compared human vs. LLM for summarization:



References: