banner

Technologies

Technologies

Big Data Analysis

Big Data Technology, including data analysis, data mining, and data security, is the key in the information age for enterprises to enhance their core competitiveness. Based on voice and natural language processing, Fano Labs provides business customers with multi-domain and deep-level big data analytics solutions, conducts in-depth analysis and fully exploits the business value of massive data in various industries.


Features

  • Speech and text mining with AI
  • Ensure data security with reliable data processing method
  • Knowledge base of different industries for data analysis
  • Professional data generation and labeling tools

Application Scenarios

  • Data Labeling

    Data Labeling
  • User Portrait Analysis

    User Portrait Analysis
  • Sales Opportunity Mining

    Sales Opportunity Mining
  • Smart Business

    Smart Business

Research Papers

  • Deep-AIR: A Hybrid CNN-LSTM Framework forFine-Grained Air Pollution Forecast

    Q. Zhang, J.C.K. Lam, Victor O.K. Li, and Y. Han, arXiv:2001.11957 [eess.SP], Jan. 2020.

    Poor air quality has become an increasingly critical challenge for many metropolitan cities, which carries many catastrophicphysical and mental consequences on human health and quality of life. However, accurately monitoring and forecasting air qualityremains a highly challenging endeavour. Limited by geographically sparse data, traditional statistical models and newly emergingdata-driven methods of air quality forecasting mainly focused on the temporal correlation between the historical temporal datasets of airpollutants. However, in reality, both distribution and dispersion of air pollutants are highly location-dependant. In this paper, we proposea novel hybrid deep learning model that combines Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM)together to forecast air quality at high-resolution. Our model can utilize the spatial correlation characteristic of our air pollutant datasetsto achieve higher forecasting accuracy than existing deep learning models of air pollution forecast.

  • Deep Multi-Scale Convolutional LSTM Network for Travel Demand and Origin-Destination Predictions

    Kai Fung Chu, Albert Y.S. Lam, and Victor O.K. Li, to appear in IEEE Transactions on Intelligent Transportation Systems, 2019.

    Advancements in sensing and the Internet of Things (IoT) technologies generate a huge amount of data. Mobility on demand (MoD) service benefits from the availability of big data in the intelligent transportation system. Given the future travel demand or origin-destination (OD) flows prediction, service providers can pre-allocate unoccupied vehicles to the customers' origins of service to reduce waiting time. Traditional approaches on future travel demand and the OD flows predictions rely on statistical or machine learning methods. Inspired by deep learning techniques for image and video processing, through regarding localized travel demands as image pixels, a novel deep learning model called multi-scale convolutional long short-term memory network (MultiConvLSTM) is developed in this paper. Rather than using the traditional OD matrix which may lead to loss of geographical information, we propose a new data structure, called OD tensor to represent OD flows, and a manipulation method, called OD tensor permutation and matricization, is introduced to handle the high dimensionality features of OD tensor. MultiConvLSTM considers both temporal and spatial correlations to predict the future travel demand and OD flows. Experiments on real-world New York taxi data of around 400 million records are performed. Our results show that the MultiConvLSTM achieves the highest accuracy in both one-step and multiple-step predictions and it outperforms the existing methods for travel demand and OD flow predictions.

  • A five-layer architecture for big data processing and analytics

    J.Y. Zhu, B. Tang, and Victor O.K. Li, International Journal of Big Data Intelligence, Vol. 6, pp. 38-49, Nov. 2019.

    Big data technologies have attracted much attention in recent years. The academia and industry have reached a consensus, that is, the ultimate goal of big data is about transforming 'big data' to 'real value'. In this article, we discuss how to achieve this goal and propose five-layer architecture for big data processing and analytics (BDPA), including a collection layer, a storage layer, a processing layer, an analytics layer, and an application layer. The five-layer architecture targets to set up a de facto standard for current BDPA solutions, to collect, manage, process, and analyse the vast volume of both static data and online data streams, and make valuable decisions for all types of industries. Functionalities and challenges of the five-layers are illustrated, with the most recent technologies and solutions discussed accordingly. We conclude with the requirements for the future BDPA solutions, which may serve as a foundation for the future big data ecosystem.

  • Public Transport Waiting Time Estimation Using Semi-Supervised Graph Convolutional Networks

    Kai Fung Chu, Albert Y.S. Lam, Becky P.Y. Loo, and Victor O.K. Li, in Proceedings of the 22nd IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2019), Auckland New Zealand, Oct. 2019.

    An effective transportation system is important for supporting various human activities in a modern smart city. The waiting time at various stations has great impacts on the overall transportation system efficiency and people's health like stress and anxiety. Knowing the waiting time at different locations in advance can assist the travelers to plan their trips. However, such waiting time may depend on many factors like crowdedness and the collective travel behaviors of the travellers involved. In general, it is very expensive to collect all the required data at every location. In this paper, a deep learning approach is proposed for determining the waiting time levels at public transport stations based on some proxy data and limited historical waiting time data at some stations. We formulate the public transportation network as a graph and develop a semi-supervised classification model based on Graph Convolutional Networks which can operate directly on the graph-structured data with limited labelled data. We conduct experiments for the mass transit railway in Hong Kong with real data and our proposed approach can achieve 89% accuracy of classifying the waiting time levels.

  • Synchrophasor Recovery and Prediction: A Graph-Based Deep Learning Approach

    J. J. Q. Yu, D. J. Hill, V. O. K. Li and Y. Hou, in IEEE Internet of Things Journal, vol. 6, no. 5, pp. 7348-7359, Oct. 2019.

    Data integrity of power system states is critical to modern power grid operation and control due to communication latency, state measurements are not immediately available at the control center, rendering slow responses of time-sensitive applications. In this paper, a new graph-based deep learning approach is proposed to recover and predict the states ahead of time utilizing the power network topology and existing measurements. A graph-convolutional recurrent adversarial network is devised to process available information and extract graphical and temporal data correlations. This approach overcomes drawbacks of the existing synchrophasor recovery and prediction implementation to improve the overall system performance. Additionally, the approach offers an adaptive data processing method to handle power grids of various sizes. Case studies demonstrate the outstanding recovery and prediction accuracy of the proposed approach, and investigations are conducted to illustrate its robustness against bad communication conditions, measurement noise, and system topology changes.

  • Delay Aware Power System Synchrophasor Recovery and Prediction Framework

    James J.Q. Yu, Albert Y.S. Lam, David J. Hill, Yunhe Hou, and Victor O.K. Li. IEEE Transactions on Smart Grid, 2018.

    This paper presents a novel delay aware synchrophasor recovery and prediction framework to address the problem of missing power system state variables due to the existence of communication latency. This capability is particularly essential for dynamic power system scenarios where fast remedial control actions are required due to system events or faults. While a wide area measurement system can sample high-frequency system states with phasor measurement units, the control center cannot obtain them in real-time due to latency and data loss. In this work, a synchrophasor recovery and prediction framework and its practical implementation are proposed to recover the current system state and predict the future states utilizing existing incomplete synchrophasor data. The framework establishes an iterative prediction scheme, and the proposed implementation adopts recent machine learning advances in data processing. Simulation results indicate the superior accuracy and speed of the proposed framework, and investigations are made to study its sensitivity to various communication delay patterns for pragmatic applications.


  • Intelligent Fault Detection Scheme for Microgrids with Wavelet-based Deep Neural Networks

    James J.Q. Yu, Yunhe Hou, Albert Y.S. Lam, and Victor O.K. Li, to appear in IEEE Transactions on Smart Grid, 2017.

    Fault detection is essential in microgrid control and operation, as it enables the system to perform fast fault isolation and recovery. The adoption of inverter-interfaced distributed generation in microgrids makes traditional fault detection schemes inappropriate due to their dependence on significant fault currents. In this paper, we devise an intelligent fault detection scheme for microgrid based on wavelet transform and deep neural networks. The proposed scheme aims to provide fast fault type, phase, and location information for microgrid protection and service recovery. In the scheme, branch current measurements sampled by protective relays are pre-processed by discrete wavelet transform to extract statistical features. Then all available data is input into deep neural networks to develop fault information. Compared with previous work, the proposed scheme can provide significantly better fault type classification accuracy. Moreover, the scheme can also detect the locations of faults, which are unavailable in previous work. To evaluate the performance of the proposed fault detection scheme, we conduct a comprehensive evaluation study on the CERTS microgrid and IEEE 34-bus system. The simulation results demonstrate the efficacy of the proposed scheme in terms of detection accuracy, computation time, and robustness against measurement uncertainty.


  • Travel Demand Prediction using Deep Multi-Scale Convolutional LSTM Network

    Kai Fung Chu, Albert Y.S. Lam, and Victor O.K. Li. 21st IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2018), Maui, HI, Nov. 2018.

    Mobility on Demand transforms the way people travel in the city and facilitates real-time vehicle hiring services. Given the predicted future travel demand, service providers can coordinate their available vehicles such that they are pre- allocated to the customers’ origins of service in advance to reduce waiting time. Traditional approaches on future travel demand prediction rely on statistical or machine learning methods. Advancement in sensor technology generates huge amount of data, which enables the data-driven intelligent transportation system. In this paper, inspired by deep learning techniques for image and video processing, we propose a new deep learning model, called Multi-Scale Convolutional Long Short-Term Memory (MultiConvLSTM), by considering travel demand as image pixel values. MultiConvLSTM considers both temporal and spatial correlations to predict the future travel demand. Experiments on real-world New York taxi data with around 400 million records are performed. We show that MultiConvLSTM outperforms the existing prediction methods for travel demand prediction and achieves the highest accuracy among all in both one-step and multiple-step predictions.


  • Delay Aware Transient Stability Assessment with Synchrophasor Recovery and Prediction Framework

    James J.Q. Yu, David J. Hill, and Albert Y.S. Lam. Neurocomputing, 2018.

    Transient stability assessment is critical for power system operation and control. Existing related research makes a strong assumption that the data transmission time for system variable measurements to arrive at the control center is negligible, which is unrealistic. In this paper, we focus on investigating the impact of data transmission latency on synchrophasor-based transient stability assessment. In particular, we employ a recently proposed methodology named synchrophasor recovery and prediction framework to handle the latency issue and make up missing synchrophasors. Advanced deep learning techniques are adopted to utilize the processed data for assessment. Compared with existing work, our proposed mechanism can make accurate assessments with a significantly faster response speed.


  • Pg-Causality: Identifying Spatiotemporal Causal Pathways for Air Pollutants with Urban Big Data

    Zhu, J.Y., Zhang, C., Zhi, S., Li, V.O.K., Han, J., Zheng, Y., arXiv:1610.07045, 2016.

    Many countries are suffering from severe air pollution. Understanding how different air pollutants accumulate and propagate is critical to making relevant public policies. In this paper, we use urban big data (air quality data and meteorological data) to identify the \emph{spatiotemporal (ST) causal pathways} for air pollutants. This problem is challenging because: (1) there are numerous noisy and low-pollution periods in the raw air quality data, which may lead to unreliable causality analysis, (2) for large-scale data in the ST space, the computational complexity of constructing a causal structure is very high, and (3) the \emph{ST causal pathways} are complex due to the interactions of multiple pollutants and the influence of environmental factors. Therefore, we present \emph{p-Causality}, a novel pattern-aided causality analysis approach that combines the strengths of \emph{pattern mining} and \emph{Bayesian learning} to efficiently and faithfully identify the \emph{ST causal pathways}. First, \emph{Pattern mining} helps suppress the noise by capturing frequent evolving patterns (FEPs) of each monitoring sensor, and greatly reduce the complexity by selecting the pattern-matched sensors as "causers". Then, \emph{Bayesian learning} carefully encodes the local and ST causal relations with a Gaussian Bayesian network (GBN)-based graphical model, which also integrates environmental influences to minimize biases in the final results. We evaluate our approach with three real-world data sets containing 982 air quality sensors, in three regions of China from 01-Jun-2013 to 19-Dec-2015. Results show that our approach outperforms the traditional causal structure learning methods in time efficiency, inference accuracy and interpretability.


  • Intelligent Time-Adaptive Transient Stability Assessment System

    James J.Q. Yu, David J. Hill, Albert Y.S. Lam, Jiatao Gu, and Victor O.K. Li. IEEE Transactions on Power Systems, vol. 33, no. 1, pp. 1049–1058, Jan. 2018.

    Online identification of postcontingency transient stability is essential in power system control, as it facilitates the grid operator to decide and coordinate system failure correction control actions. Utilizing machine learning methods with synchrophasor measurements for transient stability assessment has received much attention recently with the gradual deployment of wide-area protection and control systems. In this paper, we develop a transient stability assessment system based on the long short-term memory network. By proposing a temporal self-adaptive scheme, our proposed system aims to balance the trade-off between assessment accuracy and response time, both of which may be crucial in real-world scenarios. Compared with previous work, the most significant enhancement is that our system learns from the temporal data dependencies of the input data, which contributes to better assessment accuracy. In addition, the model structure of our system is relatively less complex, speeding up the model training process. Case studies on three power systems demonstrate the efficacy of the proposed transient stability as sessment system.


  • Delay Aware Intelligent Transient Stability Assessment System

    James J.Q. Yu, Albert Y.S. Lam, David J. Hill, and Victor O.K. Li. IEEE Access, vol. 5, pp. 17230–17239, Dec. 2017.

    Transient stability assessment is a critical tool for power system design and operation. With the emerging advanced synchrophasor measurement techniques, machine learning methods are playing an increasingly important role in power system stability assessment. However, most existing research makes a strong assumption that the measurement data transmission delay is negligible. In this paper, we focus on investigating the influence of communication delay on synchrophasor-based transient stability assessment. In particular, we develop a delay aware intelligent system to address this issue. By utilizing an ensemble of multiple long short-term memory networks, the proposed system can make early assessments to achieve a much shorter response time by utilizing incomplete system variable measurements. Compared with existing work, our system is able to make accurate assessments with a significantly improved efficiency. We perform numerous case studies to demonstrate the superiority of the proposed intelligent system, in which accurate assessments can be developed with time one third less than state-of-the-art methodologies. Moreover, the simulations indicate that noise in the measurements has trivial impact on the assessment performance, demonstrating the robustness of the proposed system.


  • An Extended Spatio-temporal Granger Causality Model for Air Quality Estimation with Heterogeneous

    Zhu, J.Y., Sun, C., and Li, V.O.K., IEEE Transactions on Big Data, vol. 3, no. 3, pp. 307-319, Jul. 2017.

    This paper deals with city-wide air quality estimation with limited air quality monitoring stations which are geographically sparse. Since air pollution is influenced by urban dynamics (e.g., meteorology and traffic) which are available throughout the city, we can infer the air quality in regions without monitoring stations based on such spatial-temporal (ST) heterogeneous urban big data. However, big data-enabled estimation poses three challenges. The first challenge is data diversity, i.e., there are many different categories of urban data, some of which may be useless for the estimation. To overcome this, we extend Granger causality to the ST space to analyze all the causality relations in a consistent manner. The second challenge is the computational complexity due to processing the massive volume of data. To overcome this, we introduce the non-causality test to rule out urban dynamics that do not “Granger” cause air pollution, and the region of influence (ROI), which enables us to only analyze data with the highest causality levels. The third challenge is to adapt our grid-based algorithm to non-grid-based applications. By developing a flexible grid-based estimation algorithm, we can decrease the inaccuracies due to grid-based algorithm while maintaining computation efficiency.


  • A Four-Layer Architecture for Online and Historical Big Data Analytics

    Zhu, J. Y., Xu, J, and Li, V.O.K., Proc. IEEE DataCom, Oakland, New Zealand, Aug 2016.

    Big data processing and analytics technologies have drawn much attention in recent years. However, the recent explosive growth of online data streams brings new challenges to the existing technologies. These online data streams tend to be massive, continuously arriving, heterogeneous, time-varying and unbounded. Therefore, it is necessary to have an integrated approach to process both big static data and online big data streams. We call this integrated approach online and historical big data analytics (OHBDA). We propose a four-layer architecture of OHBDA, i.e. including the storage layer, online and historical data processing layer, analytics layer, and decision-making layer. Functionalities and challenges of the four layers are further discussed. We conclude with a discussion of the requirements for the future OHBDA solutions, which may serve as a foundation for future big data analytics research.


  • Granger-Causality-Based Air Quality Estimation with Spatio-Temporal (S-T) Heterogeneous Big Data

    Zhu, Y., Sun. C., and Li, V.O.K., Proc. IEEE INFOCOM Smart City Workshop, Hong Kong, China, April 2015.

    This paper considers city-wide air quality estimation with limited available monitoring stations which are geographically sparse. Since air pollution is highly spatio-temporal (S-T) dependent and considerably influenced by urban dynamics (e.g., meteorology and traffic), we can infer the air quality not covered by monitoring stations with S-T heterogeneous urban big data. However, estimating air quality using S-T heterogeneous big data poses two challenges. The first challenge is due to with the data diversity, i.e., there are different categories of urban dynamics and some may be useless and even detrimental for the estimation. To overcome this, we first propose an S-T extended Granger causality model to analyze all the causalities among urban dynamics in a consistent manner. Then by implementing non-causality test, we rule out the urban dynamics that do not “Granger” cause air pollution. The second challenge is due to the time complexity when processing the massive volume of data. We propose to discover the region of influence (ROI) by selecting data with the highest causality levels spatially and temporally. Results show that we achieve higher accuracy using “part” of the data than “all” of the data. This may be explained by the most influential data eliminating errors induced by redundant or noisy data. The causality model observation and the city-wide air quality map are illustrated and visualized using data from Shenzhen, China.


  • A Gaussian Bayesian Model to Identify Spatio-temporal Causalities for Air Pollution Based on Urban Big Data

    Zhu, J. Y., Zheng, Y., Yi, X., and Li, V.O.K., SmartCity16: The 2nd IEEE INFOCOM Workshop on Smart Cities and Urban Computing, San Francisco, California, USA, April 2016.

    Identifying the causalities for air pollutants and answering questions, such as, where do Beijing's air pollutants come from, are crucial to inform government decision-making. In this paper, we identify the spatio-temporal (ST) causalities among air pollutants at different locations by mining the urban big data. This is challenging for two reasons: 1) since air pollutants can be generated locally or dispersed from the neighborhood, we need to discover the causes in the ST space from many candidate locations with time efficiency; 2) the cause-and-effect relations between air pollutants are further affected by confounding variables like meteorology. To tackle these problems, we propose a coupled Gaussian Bayesian model with two components: 1) a Gaussian Bayesian Network (GBN) to represent the cause-and-effect relations among air pollutants, with an entropy-based algorithm to efficiently locate the causes in the ST space; 2) a coupled model that combines cause-and-effect relations with meteorology to better learn the parameters while eliminating the impact of confounding. The proposed model is verified using air quality and meteorological data from 52 cities over the period Jun 1st 2013 to May 1st 2015. Results show superiority of our model beyond baseline causality learning methods, in both time efficiency and prediction accuracy.


  • Spatio-temporal (S-T) similarity model for constructing WIFI-based RSSI fingerprinting map for indoor localization

    Zhu, Y., Zheng, X., Xu, J., and Li, V.O.K., Proc. Fifth International Conference on Indoor Positioning and Indoor Navigation (IPIN 2014), Busan, Korea, Oct 2014.

    WIFI-based received signal strength indicator (RSSI) fingerprinting is widely used for indoor localization due to desirable features such as universal availability, privacy protection, and low deployment cost. The key of RSSI fingerprinting is to construct a trustworthy RSSI map, which contains the measurements of received access point (AP) signal strengths at different calibration points. Location can be estimated by matching live RSSIs with the RSSI map. However, a fine-grained map requires much labor and time. This calls for developing efficient interpolation and approximation methods. Besides, due to environmental changes, the RSSI map requires periodical updates to guarantee localization accuracy. In this paper, we propose a spatio-temporal (S-T) similarity model which uses the S-T correlation to construct a fine-grained and up-to-date RSSI map. Five S-T correlation metrics are proposed, i.e., the spatial distance, signal similarity, similarity likelihood, RSSI vector distance, and the S-T reliability. This model is evaluated based on experiments in our indoor WIFI positioning system test bed. Results show improvements in both the interpolation accuracy (up to 7%) and localization accuracy (up to 32%), compared to four commonly used RSSI map construction methods, namely, linear interpolation, cubic interpolation, nearest neighbor interpolation, and compressive sensing.

  • Performance Models of Access Latency in Cloud Storage Systems.

    Shuai, Q., Li, V.O.K., and Zhu, Y., Proc. Fourth Workshop on Architectures and Systems for Big Data, Minneapolis, MN, US, June 14, 2014.

    Access latency is a key performance metric for cloud storage systems and has great impact on user experience, but most papers focus on other performance metrics such as storage overhead, repair cost and so on. Only recently do some models argue that coding can reduce access latency. However, they are developed for special scenarios, which may not reflect reality. To fill the gaps between existing work and practice, in this paper, we propose a more practical model to measure access latency. This model can also be used to compare access latency of different codes used by different companies. To the best of our knowledge, this model is the first to provide a general method to compare access latencies of different erasure codes.