Wrangling chat GPT for writing

10 Feb 2024

Depending on where you get your feed, we might be headed for certain doom with what I'm about to describe. I'm not sure if that's the case. People are saying that the internet will be awash with LLM generated content, and our last ever authentic human thought will be published. Read on, to hear about my evening adventure attempting to generate a blog post with a handful of dozen lines of python. Stay until the end, to find out the problems with this approach, and how I might solve them.

def write_a_blog_the_hard_way(prompt: str) -> str:
    """
    Talk to a chat gpt model, using the prompy to develop a blog outline then deliver each part of the outline iteratively

    :param prompt:
    :return: the post
    """
    total_post = ""
    chats = [make_blog_summary(prompt)]
    for i in range(5):
        chats = chat(chats + [f"""
        {chats[-1]}
        ---
        Instructions: Write a paragraph for section {i + 1}
        """])
        total_post += chats[-1] + "\n----\n"
    return total_post

Behold the above 5ish lines of actual python code to write a blog post using chat gpt. Now I'm not saying this is going to change the world and destroy journalism and blogging. But it's interesting to see a naive approach at implementing a blog generator on top of the chat gpt api. Let's dive in and see how it holds together.

Getting output from chat gpt

I'm using the openai pip module to interact with the chat gpt api. It's a simple matter of getting an API key, instantiating the class and getting started. You can essentially look at the first page of their documentation to see how to interact with the API. I've created an even simpler abstraction on top of it with my chat function def chat(msgs: List[str]) -> List[str]: which takes a list of messages and returns a list of messages. This is a simple way to keep the conversation going, and to keep the state of the conversation in memory. The conversation is alternating messages between you and the agent, starting with your first prompt message.

Setup prompt

In late 2022, I first came across 'prompt engineering' as a term. What it means, is to 'engineer' your input in to a text model to get the expected output and solve the problem. Now, I'm no expert here, and I'm not even sure if we should be calling it prompt engineering. My Take is to have user provided input, mashed together with a few instructions to the model on how to create an outline to plan the blog.

For example, the user might input something like this:

Write me a blog in the first person perspective about the benifits of using RSS feeds in 2024

And in my code I add the following:

Your task is to create a five part outline for a comprehensive blog post.

We simply send this combined output to the chat agent, and await a response, which should be a set of headings and bullet points for the post.

The loop

Now that we have an outline for the blog post, lets loop through and ask the agent to give us output for the different paragraphs of the post. Our outline is provided to the LLm, along with the instruction write part {N} of the blog post. Using this technique I have been able to build larger sets of content from the LLM, nearing a reasonable quality of writing.

But there are problems..

Is this post you are reading now generated by the above code? No, it's not. I found that the quality of the writing was not up to the standard I would like to publish on my blog. I found that the LLM would often repeat itself. Plus the writing just wasn't quite up to the standard of what I would be happy putting out in to the world. Hallucination is the other big problem here, your generated blog will lie and vary from the plan you set out. A human is definitely needed to boost quality and make something that is actually usable.

I have some untested ideas on how we can solve these problems. Although I think we are still early days on finagling the output from LLM's, and there are scare resources around on how to solve the issues in making longer form content with them.

Repetitive sections of the post

More context on the sections/parts of the post that have been already covered might help here. Or maybe a more granular approach to building the content. One idea would be to take the prompt to an outline, but maybe it should expand on the outline to a next step. Simple sentences. Simple paragraphs. This might be a way to get a bit more control over the output, and get rid of the often repeated paragraphs/sentences.

Low quality content

Just not happy with the quality of the content. The voicing never seems quite right. It's just a bit cringe-worthy! We might be able to improve things with a more refined prompt. Maybe ingest your current content, extract the 'tone' and 'style' aspects to then generate a prompt which makes similar content.

Hallucination

As we all know by now, the LLM's are prone to hallucination. Having a human in the loop is our only way to ensure this doesn't happen. At a certain degree you might get some solid results from prompt engineering, but I suspect this to be a minor improvement on results.

Wrapping up

Was this a fun evening hack? You betcha. Will I be using LLM's in my workflow, yep! But we are far from the dystopian world of an internet filled with pure LLM garbage. There will continue to be real humans, posting for the short term, and I'm still on my journey of regular publishing and writing.

Thanks for staying tuned!