Title：Model-based Algorithms for Text Clustering
Text clustering is an important technique in datamining and machine learning, and widely used in applications such as topicdetection and tracking, document summarization, and search results clustering. Althoughmany studies have been done on text clustering, there are still manychallenging problems to be solved: (1) How to set the number of clusters? Canwe learn it from the dataset? (2) How to deal with the high-dimensional problemof text clustering? (3) How to deal with the sparse problem of short text? (4)How to obtain good representation of the clusters? (5) How to detect theoutlier documents? We will introduce several model-based text clusteringalgorithms which can cope with the above challenges. These algorithms are basedon papers we published at SIGKDD'14, ICDE'16, and SIGKDD'16.
Jianhua Yin is currently a doctoral candidate atTsinghua University in the laboratory of Prof. Jianyong Wang. He received hisBS degree in the department of computer science and technology, Xidian Universityin 2012. He visited the data mining research group in computer science,university of Illinois at Urbana-Champaign, under the supervision of Prof.Jiawei Han from October 2015 to April 2016. His research interests fall intothe fields of data mining and machine learning. He is particularly interestedin text mining and probabilistic graphical models.