How to Write 10,000 Word Paper, Report or Book with AI
2024-08-16
There’s a real demand for generating long-form content — whether it’s a 10,000+ word paper, detailed report, or even a book.
Everybody were trying various ways to use GPT-4o or Claude 3.5 Sonnet for that, but they just don’t deliver.
Here’s the average word count for different types of texts:
Today, ultra-long context Large Language Models (LLMs) can handle over 100,000 tokens in a single go but we know that they struggle to produce equally long responses.
While they can take in all that lengthy context, they can’t generate more than about 2,000 words in one go.
How do we push these limits? It seems there’s finally a light at the end of the tunnel.
How Do We Test the Limits of 10,000 Word Challenge?
Researchers from Tsinghua University and Zhipu AI tried pushing big models with prompts like “Write a 10,000-word article on the history of the Roman Empire” — and guess what?
Every single model fell short, maxing out at around 2,000 words. It’s like asking a marathon runner to do an ultramarathon, but they keep hitting the wall at mile 20.
Absolutely frustrating…
The funny thing is, this isn’t just a random hiccup.
When the team dug into some user interaction logs from WildChat, it turns out that over 1% of user prompts are explicitly asking for outputs that go way beyond this 2,000-word mark.
That’s a huge segment of customers, waiting to be served!
Why Are Large Language Models Hitting a Wall?
So, what’s going on under the hood?
The deeper the team looked, the more it became clear that the problem isn’t with the model’s architecture per se, but rather the data it’s been trained on — specifically, the Supervised Fine-Tuning (SFT) datasets.
These datasets, which the models rely on for generating output, seem to have a ceiling when it comes to length.
The max output length in these datasets? You guessed it, around 2,000 words.
It’s like trying to become a novelist when all you’ve ever read are short stories.
That explains why the models are hitting a wall.
Building a Ladder: Introducing AgentWrite
But you know what they say — when you hit a wall, it’s time to build a ladder.
That’s where AgentWrite comes in.
It’s an agent-based pipeline the research team has been working on that uses existing LLMs in a clever way to crank out longer, more coherent outputs.
Here’s how it works:
- First, it creates a detailed plan, kind of like an outline, that maps out the structure and target word count for each paragraph.
- Then, it feeds this plan to the model, prompting it to generate content step by step.
- The result is high-quality, cohesive text that can stretch up to 20,000 words.
Pretty sweet, right?
Expanding Capabilities with LongWriter-6k Dataset
But the team didn’t stop there.
Building on the AgentWrite pipeline, they took things a step further and created LongWriter-6k — a dataset designed to stretch the model’s writing muscles.
By training on LongWriter-6k, they managed to unlock the model’s potential to produce well-structured outputs that exceed 10,000 words.
They also developed LongBench-Write, a benchmark packed with diverse writing tasks, from short snippets to epic-length content, to rigorously test the model’s capabilities.
Breaking New Ground in Long-Form Content
The results?
Their 9B model is now outpacing even larger, proprietary models when it comes to long-form content generation. And thanks to the DPO (Direct Preference Optimization) method, the model isn’t just writing longer — it’s writing better.
To sum it up, here’s what the research team has accomplished:
- Cracked the Length Code: They pinpointed the real reason why these long-context models were getting stuck at around 2,000 words — the limitations in their training data.
- Built a New Tool: AgentWrite, their divide-and-conquer approach, is helping models generate ultra-long outputs with coherence and quality.
- Pushed the Boundaries: With the LongWriter-6k dataset, they’ve expanded the models’ output capacity to over 10,000 words without sacrificing quality.
Let’s see how it works in practice!
Stay in the loop
Writing 10000+ Word Articles and Books with LongWriter LLMs
Team open-sourced two models:
Models are trained based on GLM-4–9B and Meta-Llama-3.1–8B, respectively.
These two models also point to the “LongWriter-9B-DPO” and “LongWriter-8B” models in their paper.
Team recommends using transformers==4.43.0 to successfully deploy our models. For LongWriter-glm4-9b model, please make sure to install FlashAttention 2 according to the code base of FlashAttention.
This how you can try the model:
from transformers import AutoTokenizer, AutoModelForCausalLMimport torch
tokenizer = AutoTokenizer.from_pretrained("THUDM/LongWriter-glm4-9b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("THUDM/LongWriter-glm4-9b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
model = model.eval()
query = "Write a 10000-word China travel guide"
response, history = model.chat(tokenizer, query, history=[], max_new_tokens=32768, temperature=0.5)
print(response)
It’s that simple!
You may also deploy your own LongWriter chatbot by running:
CUDA_VISIBLE_DEVICES=0 python trans_web_demo.py
If you are struggling with long for content generation, let us know in the comments!
Further Resources to Learn More About LongWriter
-> Github
-> HF Repo
-> Paper
-> HF Space
Bonus Content : Building with LLMs
And don’t forget to have a look at some practitioner resources that we published recently:
Thank you for stopping by, and being an integral part of our community.
Happy building!