Logo
Datadrifters Blog Header Image

How to Write 10,000 Word Paper, Report or Book with AI

2024-08-16


There’s a real demand for generating long-form content — whether it’s a 10,000+ word paper, detailed report, or even a book.


Everybody were trying various ways to use GPT-4o or Claude 3.5 Sonnet for that, but they just don’t deliver.


Here’s the average word count for different types of texts:



Today, ultra-long context Large Language Models (LLMs) can handle over 100,000 tokens in a single go but we know that they struggle to produce equally long responses.


While they can take in all that lengthy context, they can’t generate more than about 2,000 words in one go.


How do we push these limits? It seems there’s finally a light at the end of the tunnel.


How Do We Test the Limits of 10,000 Word Challenge?


Researchers from Tsinghua University and Zhipu AI tried pushing big models with prompts like “Write a 10,000-word article on the history of the Roman Empire” — and guess what?


Every single model fell short, maxing out at around 2,000 words. It’s like asking a marathon runner to do an ultramarathon, but they keep hitting the wall at mile 20.



Absolutely frustrating…


The funny thing is, this isn’t just a random hiccup.


When the team dug into some user interaction logs from WildChat, it turns out that over 1% of user prompts are explicitly asking for outputs that go way beyond this 2,000-word mark.


That’s a huge segment of customers, waiting to be served!


Why Are Large Language Models Hitting a Wall?


So, what’s going on under the hood?


The deeper the team looked, the more it became clear that the problem isn’t with the model’s architecture per se, but rather the data it’s been trained on — specifically, the Supervised Fine-Tuning (SFT) datasets.


These datasets, which the models rely on for generating output, seem to have a ceiling when it comes to length.


The max output length in these datasets? You guessed it, around 2,000 words.


It’s like trying to become a novelist when all you’ve ever read are short stories.


That explains why the models are hitting a wall.


Building a Ladder: Introducing AgentWrite


But you know what they say — when you hit a wall, it’s time to build a ladder.


That’s where AgentWrite comes in.


It’s an agent-based pipeline the research team has been working on that uses existing LLMs in a clever way to crank out longer, more coherent outputs.



Here’s how it works:



Pretty sweet, right?


Expanding Capabilities with LongWriter-6k Dataset


But the team didn’t stop there.


Building on the AgentWrite pipeline, they took things a step further and created LongWriter-6k — a dataset designed to stretch the model’s writing muscles.


By training on LongWriter-6k, they managed to unlock the model’s potential to produce well-structured outputs that exceed 10,000 words.


They also developed LongBench-Write, a benchmark packed with diverse writing tasks, from short snippets to epic-length content, to rigorously test the model’s capabilities.


Breaking New Ground in Long-Form Content


The results?


Their 9B model is now outpacing even larger, proprietary models when it comes to long-form content generation. And thanks to the DPO (Direct Preference Optimization) method, the model isn’t just writing longer — it’s writing better.


To sum it up, here’s what the research team has accomplished:



Let’s see how it works in practice!


Stay in the loop


Writing 10000+ Word Articles and Books with LongWriter LLMs


Team open-sourced two models:


-> LongWriter-glm4–9b


-> LongWriter-llama3.1–8b


Models are trained based on GLM-4–9B and Meta-Llama-3.1–8B, respectively.


These two models also point to the “LongWriter-9B-DPO” and “LongWriter-8B” models in their paper.


Team recommends using transformers==4.43.0 to successfully deploy our models. For LongWriter-glm4-9b model, please make sure to install FlashAttention 2 according to the code base of FlashAttention.


This how you can try the model:

from transformers import AutoTokenizer, AutoModelForCausalLMimport torch  

tokenizer = AutoTokenizer.from_pretrained("THUDM/LongWriter-glm4-9b", trust_remote_code=True)  

model = AutoModelForCausalLM.from_pretrained("THUDM/LongWriter-glm4-9b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")  
model = model.eval()  

query = "Write a 10000-word China travel guide"  

response, history = model.chat(tokenizer, query, history=[], max_new_tokens=32768, temperature=0.5)  
print(response)

It’s that simple!


You may also deploy your own LongWriter chatbot by running:


CUDA_VISIBLE_DEVICES=0 python trans_web_demo.py


If you are struggling with long for content generation, let us know in the comments!


Further Resources to Learn More About LongWriter


-> Github


-> HF Repo


-> Paper


-> HF Space


Bonus Content : Building with LLMs


And don’t forget to have a look at some practitioner resources that we published recently:


Say Hello to ‘Her’: Real-Time AI Voice Agents with 500ms Latency, Now Open Source

Fine-Tune Meta’s Latest AI Model: Customize Llama 3.1 5x Faster with 80% Less Memory

Fine Tuning FLUX: Personalize AI Image Models on Minimal Data for Custom Look and Feel

Data Management with Drizzle ORM, Supabase and Next.js for Web & Mobile Applications


Thank you for stopping by, and being an integral part of our community.


Happy building!