Syntactic and Semantic Analysis and Visualization of Unstructured English Texts

01/07/2011 1:00 pm
01/07/2011 2:00 pm
Category: 
Ph.D. Dissertation Proposal
Advisor: 
Dr. Ying Zhu

People have complex thoughts, and they often express their thoughts with complex sentences using natural languages. This complexity may facilitate efficient communication among audiences with the same knowledge base. But on the other hand, for a different or new audience this composition becomes cumbersome to understand and analyze. Analysis of such compositions using syntactic or semantic measures is a challenging job and defines the base step for natural language processing.

In this proposal we explore and propose a number of new techniques to analyze and visualize the syntactic and semantic patterns of unstructured English texts.

The syntactic analysis is done through a proposed visualization technique which categorizes and compares different English compositions based on their different reading complexity metrics. For the semantic analysis we use Latent Semantic Analysis (LSA) to analyze the hidden patterns in complex compositions. We have used this technique to analyze comments from a social visualization web site to detect irrelevant ones (e.g., spam). The patterns of collaborations are also studied through statistical analysis.

A hybrid technique comprised of LSA and related methods could bring a potential solution to the understanding of unstructured texts. Word sense disambiguation is used to figure out the correct sense of a word in a sentence or composition. This may help make sense of a composition in a collaborative environment and reveal more hidden patterns of collaboration.

Committee
Dr. Ying Zhu (chair)
Dr. Rajshekhar Sunderraman
Dr. G. Scott Owen
Dr. Gengsheng Qin

Department Conference Room