Google’s ai model lamda could become assistant 2.0 become

Google's AI model LaMDA could become Assistant 2.0

In May 2021, Google showed off two major AIs at its I/O developer conference: the MUM (Multimodal unified model), trained for search, and the LaMDA (Language Model for Dialogue Applications) dialogue AI.

Google CEO Sundar Pichai demonstrated the capabilities of the dialogue AI: LaMDA had a conversation with a human about Pluto and paper airplanes – for this, the AI put itself in the role of the objects and answered from their perspective.

Google LaMDA as Pluto

LaMDA can impersonate Pluto to convey information in a conversational manner. | Image: Google

So while MUM is the future of search, LaMDA could retire Google’s current assistant.

Google releases LaMDA paper

Then in September 2021, there was an update to MUM including a roadmap for the gradual introduction of the multimodal model into Google search. Now, in a blog post and paper, Google provides insight into LaMDA’s current state and details the training process.

As already known, LaMDA relies on the Transformer architecture and is specialized in dialog. The goal is an AI system that can have high-quality, safer and more informed conversations, Google says. Google measures quality in three categories: Empathy, Specificity, and Relevance.

Answers should also become verifiable by relying on external sources. Current language models such as GPT-3 pull information directly from their models and are known for answers that seem plausible but may contradict facts.

LaMDA is also designed to avoid obscenities, violent content and slurs or hateful stereotypes towards certain groups of people. The development of practical security metrics is still in its infancy and there is still much progress to be made, Google writes.

LaMDA is (pre-)trained with dialogue

The largest LaMDA model has 137 billion parameters and is trained on the Infiniset dataset. Google says Infiniset includes 2.97 billion documents and 1.12 billion dialogs. In total, LaMDA has thus been trained with 1.56 trillion words, he said. Strong focus on dialog data during language model pre-training improves dialog skills even before subsequent fine tuning.

After training with Infiniset, the Google team trained LaMDA with three manually created data sets for increased quality, confidence and soundness. The first dataset contains 6400 dialogs with labels for useful, specific, and interesting answers, and the second dataset contains nearly 8000 dialogs with labels for safe and unsafe answers.

The third dataset includes 4000 dialogs in which crowdworkers make queries to an external source and use the results to adjust LaMDA’s responses, and another 1000 dialogs in which LaMDA-generated queries to external sources are evaluated.

  • without advertising banner
  • Access to more than 9.000 items
  • Termination possible online at any time

LaMDA is making progress

After training, LaMDA can ask questions of external sources to gather information for answers. For each answer, LaMDA generates multiple variants, which are then scored by learned classifiers for safety, meaningfulness, specificity and relevance.

LaMDA filters its own answers before outputting them. | Image: Google

As shown in the first demonstration at Google’s developer conference, LaMDA can be a normal interlocutor or take on the role of objects. In one example, LaMDA speaks as Mount Everest. In dialogue, facts are proven with sources.

Google's ai model lamda could become assistant 2.0 become

So simple fact queries LaMDA can answer, but more complex reasoning is still out of reach even for Google’s language model, the team says.

The quality of the answers was on average at a high level. However, the model still suffers from subtle quality issues: For example, it may repeatedly promise to answer a user’s question in the future, try to end the conversation prematurely, or provide false information about the user.

Google: "A recipe for LaMDAs."

Further research is also needed to develop robust standards for safety and fairness, Google said. One problem among many is the time-consuming process of creating suitable training data.

So the crowdworker population does not reflect the entire user base. In this case, for example, the age group between 25 and 34 is overrepresented. Still, according to Google, the results show that security and soundness of language models can be improved with larger models and fine tuning with high quality data.

Google wants to tie to these results: "This is not the final version of LaMDA. Rather, it is a recipe for creating ‘LaMDAs’ and should be seen as a way to eventually create production-ready versions for specific applications."

The development of new possibilities to improve the security and soundness of LaMDA will continue to be the main focus.

Like this post? Please share to your friends:
Leave a Reply

;-) :| :x :twisted: :smile: :shock: :sad: :roll: :razz: :oops: :o :mrgreen: :lol: :idea: :grin: :evil: :cry: :cool: :arrow: :???: :?: :!: