Auditing the Quality of Google’s AI Overviews and Feature Snippets

Snurb — Thursday 25 September 2025 21:50

Search Engines | SEASON 2025 | Liveblog |

The next speaker in this session at the SEASON 2025 conference is Desheng Hu, whose project is auditing Google’s AI Overviews and Featured Snippets, especially in the context of information seeking on pregnancy and childbirth. Many new parents will be using Google for information-seeking as well, and may blindly trust the information they obtain; this information now also includes the recently introduced AI Overviews and/or Featured Snippets.

These are often placed at the very top of the page, and will thereby capture a great deal of users’ attention – but they have already been shown to be unreliable and inconsistent in some cases. In critical health contexts, for instance, this can have direct and material consequences on information seekers. When and how such information appears, and what quality it represents, may depend on the quality and sentiment of the questions being asked. Further, when AI Overviews and Featured Snippets both occur, how consistent are they with each other?

This project collected real-world data from Google queries, by filtering the ORCAS dataset for pregnancy and baby care keywords and using the remaining questions as queries; it distinguished different query times (why, what, how), and distinguished between positive, neutral, and negative query sentiment.

It also constructed an auditing pipeline that controlled for location, browser, and personalisation aspects, and gathered data on 8-9 April 2025; it parsed AIO and FS content from the results produced by these queries. Evaluation metrics for these results included frequency, consistency between AIOs and FSs (checking for binary, numerical, and other problematic distinctions), response quality (testing for relevance, the presence of explicit or implicit safeguards, and source category and credibility), and sentiment.

AIOs were present in 84% of responses, Featured Snippets in 33%; both co-occurred in 22%. Query formulation mattered in producing these elements. In 33% of whole responses, AIOs and FSs contradicted each other; where Google suppressed the visibility of AIOs after generation, this rose to 70%. Safeguards like medical disclaimers were very rare (11%). AIOs cited more health-related Websites than the organic search results, but FSs contained disproportionately more business and shopping sites. Half of the top 10% of domains were low- or medium-credibility sources. There was no evidence of confirmation bias in the sentiment between queries and and answers.

AIOs may shape information exposure and affect user decisions, then; query formulation matters. Contradictions between AIOs and FSs are especially concerning, particularly also in how they may affect decision-making. The considerable presence of low- to medium-credibility sources might increase health risks, especially also given the lack of safeguards in the responses generated. It will be important to translate this auditing approach to other high-stakes information domains, too.

42 views