Skip to Content

HOW LARGE LANGUAGE MODELS CAN HELP TO WRITE ACCESSIBLE LETTERS

January 28, 2025
Joleen van der Zwan

A main asset of Large Language Models (LLMs) is their all-round utility. LLMs can often generate an outcome that is sensible and useful. For example, they can write texts on a wide variety of topics, write programming code and conduct conversations. However, this is not a guarantee for every request. And there are also downsides and challenges that we need to consider when deploying an LLM. In our previous blog, we explained the potential use of LLMs in government communication. In this blog we will discuss in depth the use of LLMs for making letters of Netherlands Enterprise Agency (RVO) more accessible to its target audience.

This image was generated using Microsoft CoPilot

LLMs open up new possibilities

LLMs have existed since 2017, and follow a longer tradition of plain Language Models. In essence, these models can predict which words tend to follow one another. This predictive capability is based on statistics applied to a set of texts. It has traditionally been applied to assess whether words in a text are in line with what could be expected. This was useful for a task like automatic speech recognition. In such a case, speech was decoded into characters and the language model helped to transform these characters into a sensible sequence of words. 

A difficulty with these traditional language models was that they were mostly able to predict the next word from the near context. This context was typically the final couple of words. In addition, these models could be optimized for predicting word order in a small number of text domains. However, they could not learn from a wide variety of texts. Thus, they failed when applied to texts that were from a different domain than the one they were trained on.  

With the advent of Large Language Models the application domains have become broader. These models can learn from a larger volume of data and can collect contextual information from it. This results in models that are better able to produce fluent sentences, and on top of that are applicable to a wide variety of text genres and topics. The true breakthrough in accessibility came when ChatGPT was first released. This ‘instruction-tuned LLM’ can answer questions and write pieces of text, and can be instructed to write a text in a certain style. This makes the simplification of letters a lot more accessible. Reformulating in a simpler way can also be seen as a styleshift. 

Aspects of text simplification using LLM’s

In text simplification, there are different aspects of  text that can be addressed and different ways to do so. The first aspect is vocabulary – a text may be difficult due to the use of rare or difficult words. There are two ways to make such a text more readable. These words could either be replaced by alternatives, or an elaboration could be added where an uncommon word is explained. The second aspect is grammar, or syntactic structure. Long sentences can be cognitively demanding to the reader. They require the reader to remember the words in the sentence until the end is reached. Such grammatical aspects can be addressed by breaking down long sentences in multiple smaller ones, or by changing the structure of a sentence into a simpler one. 

Before LLMs were available, these approaches to simplification could be automated by either using a rule-based approach or a machine-learning approach. Both approaches were limited due to the need for dedicated rules or examples of simplified phrases. It would cost time to implement a text simplification model in this way. Also, the model would fail when faced with novel topics that were not catered for in the rules or training data.  In this respect, LLMs could be a more adaptive alternative. 

Approach using LLMs with Netherlands Enterprise Agency (RVO) letters

We set out to test how LLMs may help to simplify letters written by RVO, with two notable constraints. The first was that we were not able to make use of the ChatGPT API. The Dutch government policy does not allow its use for security reasons. Second, we did not have the computational facilities to experiment with the biggest LLMs available for the Dutch language sphere. Both the size of the memory and the computational power were not sufficient for the biggest models like Meta LlamaMistral, and Geitje. Instead, we experimented with smaller models: FietjeGPT2, MBart and T5.

Relevant model characteristics

The models have several defining characteristics and ways in which we could implement them. Three of them were important in the current project. The first characteristic is the type of LLM. There are encoder-models, useful for interpreting a text, and decoder-models, useful for generating text. Finally, there are encoder-decoder models (or sequence-to-sequence), useful for formulating one text that aligns well with another (for example: translating text A into text B or formulating a proper response to a question). The task of text simplification is aimed at generating simplified text. Accordingly, we experimented with both decoder models and encoder-decoder models. 

A second consideration was the size of the model, in terms of the number of parameters. The more parameters a model has, arguably the more contextual information it can utilize when deciding on a sequence of words to generate. At the same time, larger models require more powerful hardware. Within the limits of our computational facilities, we experimented with models of several sizes. 

Third, we could fine-tune a model on the particular task at hand. We do this by providing it with typical examples of a sentence and a simplified sentence. The alternative to this is to give the model a description of the task (this is called ‘zero-shot learning’) or a description with a couple of prototypical examples (‘few-shot learning’). These descriptions are similar to prompting. This is an example of a prompt that was used in our project:

I want you to replace my complex sentence with a simple sentence. The meaning of the sentence should remain the same, but make it easier. 

Complex: {text}

Simple:

What we learned

We applied these comparisons to 131 letter templates of RVO. In these, we measured readability before and after the automated text simplification. We used LiNT to measure this, which will be described in more detail in the third blog of this series. In addition, we scored the simplification on several elements: coherence, not repeating the same information, grammaticality and whether the original information was still given. With respect to the type of model, decoder models worked best. 

Unfortunately, but expected, larger models work better than smaller models, with the model Fietje as the best of the test. Finally, we found that few-shot prompting with Fietje resulted in better performance than fine-tuning a model on a larger set of training examples. Especially maintaining the information and not repeating information were issues of the lesser performing models. They tended to make up new information or repeat a phrase multiple times. Such derailment is a notable challenge when working with LLMs. We found that especially the smaller models like GPT2 led to this type of output. 

After the experimentation, we learned about the feasibility of the task and the best model. Yet, most of the generated text needs additional editing by a human to align it with the original content of a letter. This is the case even when using Fietje with a few-shot prompting approach. There are ways to steer the LLM a bit more, such as a method called ‘Plug-and-play LLM’. In this method we provide the LLM with a list of words it should favor and a list of words it should avoid. When generating text, these lists are consulted to alter the output and choose other wordings. However, this is not sufficient to make the model more robust. 

How Large Language Models can help to write accessible letters

The main conclusion is that a robust automated text simplification of the RVO letters requires larger models than we are currently using. For understandable reasons the government is hesitant to allow the use of an application like ChatGPT. Our study does show that it would be best to wait until the safe usage is facilitated of larger LLMs with a good track record on Dutch language. The downside of this is that larger LLMs require more energy, and this is a less sustainable approach. A solution would be to only apply the LLM to sentences in a letter where there is actually room for simplification. To this end, in the final blog of this series we will discuss what makes a letter difficult to comprehend, and how the readability could be measured. 

Co-author

This blog is co-authored by Florian Kunneman. Florian is Assistant Professor at the Institute for Language Sciences at Utrecht University. His research is aimed at supporting governmental communication by means of language technology. To this end, he studies online discussion fora to gain insight into societal trends, he improves on how conversational systems (chatbots) converse with different types of users and he works on automated text simplification.

About the author

Innovation Consultant | Netherlands
Joleen van der Zwan is an energetic consultant with broad experience in multiple industries and a variety of roles. With her passion for innovation, she provide customers with insights and advice on their strategy related to innovation and new technologies.

Leave a Reply

Your email address will not be published. Required fields are marked *

Slide to submit