Google Search Docs Leak
-
We have always heard the stories of how Google employs some magic within the algorithm. Deeming certain sites worthy enough to grace the top of the first page of results. Some of what we might have heard may have some elements of truth, as found out in documents leaked from within Google’s search team.
I didn’t want to jump on the bandwagon of “Google Confirms Leak Is Real” as so many other publications have already when Google hasn’t said anything to that fact. It’s actually that Google hasn’t confirmed that these leaked documents are genuine. That would be a huge deal if true and who knows, weeks or even months done the line they may release a statement in regard to these documents.
“We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information,” Google spokesperson Davis Thompson told The Verge in an email. “We’ve shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation.”
Google confirms the leaked Search documents are real - The Verge
When you read through the article, there is no mention of confirmation from Google unless you extrapolate information from the quote above. Google just mentions to not make assumptions about search based on the docs. It’s only when you dive into the sources of what was written about these leaked documents that some air of truthfulness starts to emerge.
A critical next step in the process was verifying the authenticity of the API Content Warehouse documents. So, I reached out to some ex-Googler friends, shared the leaked docs, and asked for their thoughts. Three ex-Googlers wrote back: one said they didn’t feel comfortable looking at or commenting on it.
Reading through this article the case for seeing how all of this could be true leads to some credibility. Rand Fishkin, the author of the linked article, went into some detail of how he reached out to previous Google employees and many, if not all, confirmed that the documents did appear to be genuine Google documents.
I have reviewed the API reference docs and contextualized them with some other previous Google leaks and the DOJ antitrust testimony. I’m combining that with the extensive patent and whitepaper research done for my upcoming book, The Science of SEO. While there is no detail about Google’s scoring functions in the documentation I’ve reviewed, there is a wealth of information about data stored for content, links, and user interactions. There are also varying degrees of descriptions (ranging from disappointingly sparse to surprisingly revealing) of the features being manipulated and stored.
Mike King goes into some very steep research of these documents and provides some lengthy details. He does mention in that the leak represents the current, active architecture of Google Search Content Storage as of March 2024. If you are wanting to get into the nerdy weeds of Google Search API, then I would suggest to read his article.
If the leak is indeed real then, we might begin to see Google search algorithm go through some changes in the very near future. We already see the effects of how Google places SEO AI-stuffed articles higher up just below promoted links within the search page. I’ve gone over how smaller independent creators have seen the results of their traffic plummet over the past several months. And now with the introduction of Gemini being added into search, we’re already at a point in which the great majority of users are not visiting the sites whose information is being scraped. Why would anyone who uses Google ever need to click into a link to the site in order to get information that is freely given on the Google search page?