This website uses cookies and similar technologies to understand visitors' experiences. By continuing to use this website, you accept our use of cookies and similar technologies,Terms of Use, and Privacy Policy.

Jun 03 2014 - 10:46 AM
Problem Solving in Research
My current research is evaluating topic coherence of discussions and their corresponding videos. Both YouTube and Vialogue discussions are related to the posted videos. However, discussions on YouTube's platform is a forum discussion while Vialogue enables moderated time-stamped discussions. We are trying to research whether on average, the same piece of educational video on the two platforms result in different levels of topic coherence. I am now in the process of solving this problem, and I am fascinated by the approaches. 1) Learn and discover, not reinvent the wheel In the past week, I have read extensively on topic modeling, text analysis and topic coherence measures. Topic model is a statistical model that summarizes topics in a set of documents with machine learning and natural language processing. Despite that, I have used several relevant R packages before to (e.g. tm, wordcloud, topicmodels), evaluate the topic coherence for YouTube and Vialogues respectively. It requires rapidly absorbing new knowledge and effective learning from experts.

2) Tailor the measurement Measurement of topic coherence (using PMI pointwise mutual information) was initially designed to evaluate the performance of topic models as it is not rare for any current available models to generate “junk topics”, which refers to topics that contain only meaningless information. In our Vialogues research, we decided to leverage UCIscore and UMass score to compare the coherence aspect of discussion quality for YouTube and Vialogue. As discussed above, topic coherence only points us to the right direction, but we still need to make necessary innovative modification to solve our problem.

3) Implement, aka, Write Code Once we are clear of the details of the PMI algorithm for discussion coherence evaluation, it comes to the essential part: coding. The dataset provided is already beautiful enough, but coding is still needed to format raw data and a clean dataset. With all the materials ready, then it is time to code the algorithm. Yesterday, I successfully obtained Vialogues and YouTube discussion coherence scores for one video entry. Prototype, or pilot trials is a critical stage for coding, which means the logic methodology designer has been programmed. Goal for this week: scale all the coding to 723 documents, solve this problem. Many articles talk about “making sense of data”. Nowadays, this problem solving process written above is exactly one of the many examples of making sense of real world data.

Posted in: Work ProgressResearch|By: Yang Yang|977 Reads