When transitioning seasons, especially as winter approaches, many people have the habit of pulling back the curtains after getting up in the morning to see how others outside are dressed, in order to gauge whether the day's temperature will be cold or warm.
This is actually a simple form of "data analysis" behavior—inferring overall trends by observing samples around you.
The "Street Sweep List" recently launched by Amap essentially follows a similar logic, using data on navigation destinations and travel behaviors to determine whether a shop is worth visiting.
Limitations of Observational Indicators
The problem with this judgment method is obvious.
Winter in Guangdong is a typical chaotic sample—some people wear short sleeves, others wear down jackets, and some even pair windbreakers with shorts or scarves with short sleeves.
In such an environment, even if you see most people wearing short sleeves within a minute, you cannot conclude that "it's very hot today."
From a statistical perspective, this is a combination of "small sample bias" and "high variance noise."
The people outside the window are just a random sample of the urban population, and the clothing choices of individuals in the sample are influenced by too many random factors (commuting methods, physical differences, psychological expectations, etc.).
As a result, the signal you obtain carries strong volatility and randomness.
When such judgments go wrong, the attribution of errors also becomes biased.
If you dress too lightly and freeze because you saw people in short sleeves, you might curse those wearing short sleeves as crazy. Similarly, if you fall into a trap recommended by the Street Sweep List, you might blame Amap for being unreliable.
Relying solely on navigation data for algorithmic analysis, Amap can hardly provide users with recommendations that meet expectations (or match their marketing claims). If Amap wants to avoid failure, it must inevitably introduce data beyond navigation to cross-validate the results.
Transaction Data Is the Most Indicative Metric
What truly reflects a shop's operational condition is "transaction data."
Under clear conditions such as unit operating area and product pricing, shops with "high transaction frequency, high transaction amounts, high repurchase rates, and a wide customer base" are more stable and attractive choices.
These metrics belong to "high signal strength" data, which can better reveal the true commercial quality.
However, the issue is that transaction data is more sensitive than navigation data, as it involves merchant privacy and platform compliance. Even though Amap is a wholly-owned subsidiary of Alibaba, it cannot openly use such data.
Therefore, if a product like the Street Sweep List can reflect real popularity to some extent, it is likely achieved by analyzing "proxy variables highly correlated with transaction behaviors," such as the frequency of navigation to the shop, users' repeat visit rates, and the proportion of long-distance navigation.
At the same time, based on the types of lists currently publicly displayed, it can be inferred that Amap uses data beyond navigation that is highly correlated with transaction data (or location data at the time of payment) and user location data in non-navigation states (such as obtaining real-time device location).
Although these behavioral indicators are not transaction data themselves, they are often highly correlated with consumption behavior statistically.
This approach is called "correlation inference" or "proxy modeling": when target data is unavailable, indirectly inferring the target value through observable correlated variables.
Thus, Amap's Street Sweep List is "likely constructed through correlation analysis between navigation behavior and consumption trends," rather than directly referencing transaction amounts.
This is also a typical "weak signal amplification" strategy: using enough indirect indicators to construct judgments close to reality.
The Boundary of the List Is the Greatest Common Divisor, Not the Optimal Solution
However, even the most advanced algorithms cannot eliminate the bias brought by "group averages."
Lists like the Street Sweep List essentially only reflect shops that "most people find good"—this is statistically close to a "majority consensus solution," like the McDonald's near your home.
It can significantly reduce the probability of falling into a trap but cannot guarantee that you will choose the "optimal" option.
Just like the results of a public vote, they are usually solutions that "can be accepted by most people or are not disliked by most people," rather than the most efficient ones.
Moreover, navigation data itself carries significant noise. Many "check-in-style" travel behaviors do not represent actual satisfaction—who hasn't fallen into the trap of a网红店 (internet-famous shop)?
Nevertheless, this methodology is still worth learning from.
When we cannot obtain firsthand data, we can construct a reasonable judgment system through a series of "indirect indicators."
For example:
- To gauge foot traffic in a certain area, you can look at the density of surrounding residential areas, the distribution of urban villages, and the number of subway entrances and exits.
- To determine whether a road section is prone to congestion, you can examine the distribution of nearby schools, office buildings, and the intersections of main roads.
This is essentially the thinking behind "feature engineering": when core variables are missing, approximate the operational laws of the real world by constructing reasonable combinations of proxy variables.
Speaking Nonsense with Reason and Evidence
Data in the real world is never perfect.
The people you see outside the window, the rankings on the Street Sweep List, and the heat maps on navigation apps are all signals with noise.
However, as long as you can understand this data with "logic and probabilistic thinking," you can extract relatively reliable judgments from it.
In statistics, this is called "Bayesian inference": in an uncertain environment, combining limited information with existing experience to gradually revise beliefs and approach the truth.
Therefore, when true data is unavailable, there is nothing wrong with judging the weather by "looking at people outside the window" or assessing commercial areas by using the "Street Sweep List."
The key lies in whether you are aware of its limitations and can identify signals amidst the noise.
After all, compared to being completely in the dark, speaking nonsense with reason and evidence often comes closer to the truth.