Risks and benefits of large language models at the ICRC

Generative artificial intelligence – algorithms like large language models (LLMs) – has been the biggest technology story of 2023.

Two of the best known LLMs are OpenAI’s ChatGPT and Google’s Bard. In a matter of months, these and other deep learning algorithms have transformed the way we can process and generate text.

For an organization like the ICRC, which produces hundreds of text-based documents daily, LLMs offer huge potential. But they also carry new risks.

Earlier this year the ICRC’s Innovation Team launched a research project to examine how LLMs might improve the use and accessibility of information.

It was carried out jointly with the ICRC’s Information Governance Unit (IGOV) and the Federal Institute of Technology in Lausanne (EPFL).

“Right now, there are more than two million files referenced in the ICRC’s SharePoint platform, and we keep accumulating more and more emails, reports, minutes of meetings, guidelines etc.,” says ICRC Innovation Officer Blaise Robert, who led the project.

“We are producing knowledge and information, but it’s hard to take it all in because there’s so much of it. The aim of this research was to look into how we can use artificial intelligence to improve the way we retrieve actionable knowledge out of the data we have accumulated – to filter the signal from the noise.”

Complex process

This is challenging because text is what is known as unstructured or semi-structured data – essentially a jumble of words. Being able to systematically and consistently retrieve information from text is complicated from a data science perspective.

In recent years, the field of natural language processing (NLP), which is a growing family of computer algorithms designed to process human language, has been looking into this problematic.

The ICRC has several ongoing NLP projects of its own – for example sentiment analysis for reputation monitoring. But the explosion of LLMs has opened the door to new use cases which were difficult to perform previously such as information retrieval, topic extraction and higher quality text generation.

The main thrust of the LLM research project was on the retrieval of information from texts in SharePoint with a view to automating the production of summaries.

Using open-source models available on the Hugging Face platform, EPFL researcher Alexandre Sallinen and Professor Mary-Anne Hartley (who leads EPFL’s intelligent Global Health research group) developed a new LLM-powered search engine. This focused on sentences with similar meanings (semantic search).

The ICRC currently uses iSearch, which retrieves documents using keyword (lexical) searches.

The aim was to see whether semantic search could improve the ranking of relevant documents and widen the search parameters, as well as produce summaries.

“While this is tricky in itself, the real challenge was to adapt this algorithm to the computational constraints of the ICRC,” says Hartley.

This is where Sallinen’s years of experience from coding on a low-power laptop came in handy, as he was able draw on optimization hacks that he had developed over the years.

“We were really impressed with the performance for information retrieval and question answering. Our models also managed to accurately detect relevant content from the challenging and sometimes obscure vocabulary of the NGO world,” says Sallinen.

The team experimented with an additional feature to produce summaries of the most relevant documents. But this turned out to be slow and resource-heavy, with the results ranging from “impressive truth to convincing fabrication”.

Other possible use cases that were identified include:

extracting all interactions with a given country within a delegation
accessing past interactions with an interlocutor or organization
autoclassification of documents/emails by topic
production of thematic summaries from the archives.

Benefits vs risks

At the same time, the project uncovered several risks and limitations, not least the fact that in-house LLMs are expensive to run in terms of ICT hardware and servers.

They are also prone to generating “hallucinations” – false information that looks convincing.

“The issue of hallucinations in generative AI is not unsolvable, and there is still more interesting work to do,” says Hartley.

EPFL and the ICRC Innovation Team are now designing a new approach to optimize LLMs and enable them to learn within tighter resource constraints.

“In terms of lessons learnt, the use of ChatGPT or other commercial LLMs is impossible for many ICRC use cases due to the sensitivity of much of our information,” says Robert.

“Open-source LLMs exist but still require substantial computing power. We observed that in the production of summaries, for example, opting for smaller models substantially limits performance. These lessons help us to have realistic expectations of what we are capable of doing. But they are not the end of the road, because this is a rapidly evolving field.”

Advanced algorithms

At the ICRC, other ongoing research projects into NLP and LLMs include:

sentiment analysis on the perception of the ICRC by processing X (formerly Twitter) posts (with EPFL).
monitoring patterns of violent events
automating the transcription and labelling of audio files (with the ICRC’s Archive Unit)
testing LLM-powered conversational bots to help ICRC staff navigate documents and guidelines.

The Central Tracing Agency’s Missing Persons Digital Matching search tool, which is being introduced in all ICRC delegations, also uses an advanced AI algorithm to reconnect families.

While cloud-based LLMs, such as ChatGPT, cannot be considered for sensitive/confidential data, the research team noted that they could be used to simplify some ICRC processes. These include editorial support, brainstorming and project management.

“What’s clear is that artificial intelligence capabilities are going to fundamentally change the way we manage our information,” says Véronique Ingold, head of IGOV.

“Such a research project is exciting because it helps us understand the potential and the boundaries of possible use cases. Ultimately, though, how we use artificial intelligence – why, when, and the decisions we make based on artificial intelligence – will remain the responsibility of the people behind the machine.”

If you would like to find out more about NLP and LLM research projects at the ICRC, please contact Blaise Robert brobert@icrc.org