Fano Labs appreciates the significance of artificial intelligence (AI), which can bring many positive and incredible influences to everybody in our community. It can improve our daily life, range from working, entertaining, to interacting with other people.

As a university spin-off, we believe in the importance of research, especially for a technology-based startup. It not only gives us insights and solutions to unsolved problems but also maintains our leading role in the market. We devote ourselves to developing the best solutions for our clients and advancing the knowledge of humans and society. We specialize in speech and natural language processing technologies, but we are also interested in all AI-related research. They will orchestrate, making our city self-aware and smarter.

Intelligent Fault Detection Scheme for Microgrids with Wavelet-based Deep Neural Networks
James J.Q. Yu, Yunhe Hou, Albert Y.S. Lam, and Victor O.K. Li, to appear in IEEE Transactions on Smart Grid, 2017.
Fault detection is essential in microgrid control and operation, as it enables the system to perform fast fault isolation and recovery. The adoption of inverter-interfaced distributed generation in microgrids makes traditional fault detection schemes inappropriate due to their dependence on significant fault currents. In this paper, we devise an intelligent fault detection scheme for microgrid based on wavelet transform and deep neural networks. The proposed scheme aims to provide fast fault type, phase, and location information for microgrid protection and service recovery. In the scheme, branch current measurements sampled by protective relays are pre-processed by discrete wavelet transform to extract statistical features. Then all available data is input into deep neural networks to develop fault information. Compared with previous work, the proposed scheme can provide significantly better fault type classification accuracy. Moreover, the scheme can also detect the locations of faults, which are unavailable in previous work. To evaluate the performance of the proposed fault detection scheme, we conduct a comprehensive evaluation study on the CERTS microgrid and IEEE 34-bus system. The simulation results demonstrate the efficacy of the proposed scheme in terms of detection accuracy, computation time, and robustness against measurement uncertainty.
Neural Machine Translation with Gumbel-Greedy Decoding
Gu, J., Im, J.D., and Li, V.O.K., arXiv: 1706.07518, Jun. 2017.
Previous neural machine translation models used some heuristic search algorithms (e.g., beam search) in order to avoid solving the maximum a posteriori problem over translation sentences at test time. In this paper, we propose the Gumbel-Greedy Decoding which trains a generative network to predict translation under a trained model. We solve such a problem using the GumbelSoftmax reparameterization, which makes our generative network differentiable and trainable through standard stochastic gradient methods. We empirically demonstrate that our proposed model is effective for generating sequences of discrete words.
Search Engine Guided Non-Parametric Neural Machine Translation
Gu, J., Wang, Y., Cho, K, and Li, V.O.K., arXiv: 1705.07267, May 2017.
In this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training. The proposed approach consists of two stages. In the first stage--retrieval stage--, an off-the-shelf, black-box search engine is used to retrieve a small subset of sentence pairs from a training set given a source sentence. These pairs are further filtered based on a fuzzy matching score based on edit distance. In the second stage--translation stage--, a novel translation model, called translation memory enhanced NMT (TM-NMT), seamlessly uses both the source sentence and a set of retrieved sentence pairs to perform the translation. Empirical evaluation on three language pairs (En-Fr, En-De, and En-Es) shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved.
An Extended Spatio-temporal Granger Causality Model for Air Quality Estimation with Heterogeneous Urban Big Data
Zhu, J.Y., Sun, C., and Li, V.O.K., IEEE Transactions on Big Data, to appear.
This paper deals with city-wide air quality estimation with limited air quality monitoring stations which are geographically sparse. Since air pollution is influenced by urban dynamics (e.g., meteorology and traffic) which are available throughout the city, we can infer the air quality in regions without monitoring stations based on such spatial-temporal (ST) heterogeneous urban big data. However, big data-enabled estimation poses three challenges. The first challenge is data diversity, i.e., there are many different categories of urban data, some of which may be useless for the estimation. To overcome this, we extend Granger causality to the ST space to analyze all the causality relations in a consistent manner. The second challenge is the computational complexity due to processing the massive volume of data. To overcome this, we introduce the non-causality test to rule out urban dynamics that do not “Granger” cause air pollution, and the region of influence (ROI), which enables us to only analyze data with the highest causality levels. The third challenge is to adapt our grid-based algorithm to non-grid-based applications. By developing a flexible grid-based estimation algorithm, we can decrease the inaccuracies due to grid-based algorithm while maintaining computation efficiency.
A Teacher-Student Framework for Zero-Resource Neural Machine Translation
Chen Y., Liu, Y., Cheng, Y., Li, V.O.K., arXiv:1705.00753, 2017.
While end-to-end neural machine translation (NMT) has made remarkable progress recently, it still suffers from the data scarcity problem for low-resource language pairs and domains. In this paper, we propose a method for zero-resource NMT by assuming that parallel sentences have close probabilities of generating a sentence in a third language. Based on this assumption, our method is able to train a source-to-target NMT model ("student") without parallel corpora available, guided by an existing pivot-to-target NMT model ("teacher") on a source-pivot parallel corpus. Experimental results show that the proposed method significantly improves over a baseline pivot-based model by +3.0 BLEU points across various language pairs.
Trainable Greedy Decoding for the Neural Machine Translation
Gu, J., Cho, K., Li, V.O.K., arXiv:1702.02429, 2017.
Recent research in neural machine translation has largely focused on two aspects; neural network architectures and end-to-end learning algorithms. The problem of decoding, however, has received relatively little attention from the research community. In this paper, we solely focus on the problem of decoding given a trained neural machine translation model. Instead of trying to build a new decoding algorithm for any specific decoding objective, we propose the idea of trainable decoding algorithm in which we train a decoding algorithm to find a translation that maximizes an arbitrary decoding objective. More specifically, we design an actor that observes and manipulates the hidden state of the neural machine translation decoder and propose to train it using a variant of deterministic policy gradient. We extensively evaluate the proposed algorithm using four language pairs and two decoding objectives and show that we can indeed train a trainable greedy decoder that generates a better translation (in terms of a target decoding objective) with minimal computational overhead.
Incorporating Copying Mechanism in Sequence-to-Sequence Learning
Gu, J., Lu, Z., Li, H., and Li, V.O.K., Proc. Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, Aug 2016.
We address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. A similar phenomenon is observable in human language communication. For example, humans tend to repeat entity names or even long phrases in conversation. The challenge with regard to copying in Seq2Seq is that new machinery is needed to decide when to perform the operation. In this paper, we incorporate copying into neural network-based Seq2Seq learning and propose a new model called CopyNet with encoder-decoder structure. CopyNet can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence. Our empirical study on both synthetic data sets and real world data sets demonstrates the efficacy of CopyNet. For example, CopyNet can outperform regular RNN-based model with remarkable margins on text summarization tasks.
p-Causality: Identifying Spatiotemporal Causal Pathways for Air Pollutants with Urban Big Data
Zhu, J.Y., Zhang, C., Zhi, S., Li, V.O.K., Han, J., Zheng, Y., arXiv:1610.07045, 2016.
Many countries are suffering from severe air pollution. Understanding how different air pollutants accumulate and propagate is critical to making relevant public policies. In this paper, we use urban big data (air quality data and meteorological data) to identify the \emph{spatiotemporal (ST) causal pathways} for air pollutants. This problem is challenging because: (1) there are numerous noisy and low-pollution periods in the raw air quality data, which may lead to unreliable causality analysis, (2) for large-scale data in the ST space, the computational complexity of constructing a causal structure is very high, and (3) the \emph{ST causal pathways} are complex due to the interactions of multiple pollutants and the influence of environmental factors. Therefore, we present \emph{p-Causality}, a novel pattern-aided causality analysis approach that combines the strengths of \emph{pattern mining} and \emph{Bayesian learning} to efficiently and faithfully identify the \emph{ST causal pathways}. First, \emph{Pattern mining} helps suppress the noise by capturing frequent evolving patterns (FEPs) of each monitoring sensor, and greatly reduce the complexity by selecting the pattern-matched sensors as "causers". Then, \emph{Bayesian learning} carefully encodes the local and ST causal relations with a Gaussian Bayesian network (GBN)-based graphical model, which also integrates environmental influences to minimize biases in the final results. We evaluate our approach with three real-world data sets containing 982 air quality sensors, in three regions of China from 01-Jun-2013 to 19-Dec-2015. Results show that our approach outperforms the traditional causal structure learning methods in time efficiency, inference accuracy and interpretability.
Learning to Translate in Real-time with Neural Machine Translation
Gu, J., Neubig, G., Cho, K., and Li, V.O.K., arXiv:1610.00388, 2016.
Translating in real-time, a.k.a. simultaneous translation, outputs translation words before the input sentence ends, which is a challenging problem for conventional machine translation methods. We propose a neural machine translation (NMT) framework for simultaneous translation in which an agent learns to make decisions on when to translate from the interaction with a pre-trained NMT environment. To trade off quality and delay, we extensively explore various targets for delay and design a method for beam-search applicable in the simultaneous MT setting. Experiments against state-of-the-art baselines on two language pairs demonstrate the efficacy of the proposed framework both quantitatively and qualitatively.
A Four-Layer Architecture for Online and Historical Big Data Analytics
Zhu, J. Y., Xu, J, and Li, V.O.K., Proc. IEEE DataCom, Oakland, New Zealand, Aug 2016.
Big data processing and analytics technologies have drawn much attention in recent years. However, the recent explosive growth of online data streams brings new challenges to the existing technologies. These online data streams tend to be massive, continuously arriving, heterogeneous, time-varying and unbounded. Therefore, it is necessary to have an integrated approach to process both big static data and online big data streams. We call this integrated approach online and historical big data analytics (OHBDA). We propose a four-layer architecture of OHBDA, i.e. including the storage layer, online and historical data processing layer, analytics layer, and decision-making layer. Functionalities and challenges of the four layers are further discussed. We conclude with a discussion of the requirements for the future OHBDA solutions, which may serve as a foundation for future big data analytics research.
A Gaussian Bayesian model to identify spatio-temporal causalities for air pollution based on urban big data
Zhu, J. Y., Zheng, Y., Yi, X., and Li, V.O.K., SmartCity16: The 2nd IEEE INFOCOM Workshop on Smart Cities and Urban Computing, San Francisco, California, USA, April 2016.
Identifying the causalities for air pollutants and answering questions, such as, where do Beijing's air pollutants come from, are crucial to inform government decision-making. In this paper, we identify the spatio-temporal (ST) causalities among air pollutants at different locations by mining the urban big data. This is challenging for two reasons: 1) since air pollutants can be generated locally or dispersed from the neighborhood, we need to discover the causes in the ST space from many candidate locations with time efficiency; 2) the cause-and-effect relations between air pollutants are further affected by confounding variables like meteorology. To tackle these problems, we propose a coupled Gaussian Bayesian model with two components: 1) a Gaussian Bayesian Network (GBN) to represent the cause-and-effect relations among air pollutants, with an entropy-based algorithm to efficiently locate the causes in the ST space; 2) a coupled model that combines cause-and-effect relations with meteorology to better learn the parameters while eliminating the impact of confounding. The proposed model is verified using air quality and meteorological data from 52 cities over the period Jun 1st 2013 to May 1st 2015. Results show superiority of our model beyond baseline causality learning methods, in both time efficiency and prediction accuracy.
Efficient Learning for Undirected Topic Models
Gu, J. and Li, V.O.K., Proc. ACL-IJCNLP, Beijing, China, July 2015.
Replicated Softmax model, a well-known undirected topic model, is powerful in extracting semantic representations of documents. Traditional learning strategies such as Contrastive Divergence are very inefficient. This paper provides a novel estimator to speed up the learning based on Noise Contrastive Estimate, extended for documents of variant lengths and weighted inputs. Experiments on two benchmarks show that the new estimator achieves great learning efficiency and high accuracy on document retrieval and classification.
Granger-Causality-based air quality estimation with spatio-temporal (S-T) heterogeneous big data
Zhu, Y., Sun. C., and Li, V.O.K., Proc. IEEE INFOCOM Smart City Workshop, Hong Kong, China, April 2015.
This paper considers city-wide air quality estimation with limited available monitoring stations which are geographically sparse. Since air pollution is highly spatio-temporal (S-T) dependent and considerably influenced by urban dynamics (e.g., meteorology and traffic), we can infer the air quality not covered by monitoring stations with S-T heterogeneous urban big data. However, estimating air quality using S-T heterogeneous big data poses two challenges. The first challenge is due to with the data diversity, i.e., there are different categories of urban dynamics and some may be useless and even detrimental for the estimation. To overcome this, we first propose an S-T extended Granger causality model to analyze all the causalities among urban dynamics in a consistent manner. Then by implementing non-causality test, we rule out the urban dynamics that do not “Granger” cause air pollution. The second challenge is due to the time complexity when processing the massive volume of data. We propose to discover the region of influence (ROI) by selecting data with the highest causality levels spatially and temporally. Results show that we achieve higher accuracy using “part” of the data than “all” of the data. This may be explained by the most influential data eliminating errors induced by redundant or noisy data. The causality model observation and the city-wide air quality map are illustrated and visualized using data from Shenzhen, China.
Spatio-temporal (S-T) similarity model for constructing WIFI-based RSSI fingerprinting map for indoor localization
Zhu, Y., Zheng, X., Xu, J., and Li, V.O.K., Proc. Fifth International Conference on Indoor Positioning and Indoor Navigation (IPIN 2014), Busan, Korea, Oct 2014.
WIFI-based received signal strength indicator (RSSI) fingerprinting is widely used for indoor localization due to desirable features such as universal availability, privacy protection, and low deployment cost. The key of RSSI fingerprinting is to construct a trustworthy RSSI map, which contains the measurements of received access point (AP) signal strengths at different calibration points. Location can be estimated by matching live RSSIs with the RSSI map. However, a fine-grained map requires much labor and time. This calls for developing efficient interpolation and approximation methods. Besides, due to environmental changes, the RSSI map requires periodical updates to guarantee localization accuracy. In this paper, we propose a spatio-temporal (S-T) similarity model which uses the S-T correlation to construct a fine-grained and up-to-date RSSI map. Five S-T correlation metrics are proposed, i.e., the spatial distance, signal similarity, similarity likelihood, RSSI vector distance, and the S-T reliability. This model is evaluated based on experiments in our indoor WIFI positioning system test bed. Results show improvements in both the interpolation accuracy (up to 7%) and localization accuracy (up to 32%), compared to four commonly used RSSI map construction methods, namely, linear interpolation, cubic interpolation, nearest neighbor interpolation, and compressive sensing.
Performance models of access latency in cloud storage systems
Shuai, Q., Li, V.O.K., and Zhu, Y., Proc. Fourth Workshop on Architectures and Systems for Big Data, Minneapolis, MN, US, June 14, 2014.
Access latency is a key performance metric for cloud storage systems and has great impact on user experience, but most papers focus on other performance metrics such as storage overhead, repair cost and so on. Only recently do some models argue that coding can reduce access latency. However, they are developed for special scenarios, which may not reflect reality. To fill the gaps between existing work and practice, in this paper, we propose a more practical model to measure access latency. This model can also be used to compare access latency of different codes used by different companies. To the best of our knowledge, this model is the first to provide a general method to compare access latencies of different erasure codes.