This paper considers city-wide air quality estimation with limited available monitoring stations which are geographically sparse. Since air pollution is highly spatio-temporal (S-T) dependent and considerably influenced by urban dynamics (e.g., meteorology and traffic), we can infer the air quality not covered by monitoring stations with S-T heterogeneous urban big data. However, estimating air quality using S-T heterogeneous big data poses two challenges. The first challenge is due to with the data diversity, i.e., there are different categories of urban dynamics and some may be useless and even detrimental for the estimation. To overcome this, we first propose an S-T extended Granger causality model to analyze all the causalities among urban dynamics in a consistent manner. Then by implementing non-causality test, we rule out the urban dynamics that do not “Granger” cause air pollution. The second challenge is due to the time complexity when processing the massive volume of data. We propose to discover the region of influence (ROI) by selecting data with the highest causality levels spatially and temporally. Results show that we achieve higher accuracy using “part” of the data than “all” of the data. This may be explained by the most influential data eliminating errors induced by redundant or noisy data. The causality model observation and the city-wide air quality map are illustrated and visualized using data from Shenzhen, China.