2) Tailor the measurement Measurement of topic coherence (using PMI pointwise mutual information) was initially designed to evaluate the performance of topic models as it is not rare for any current available models to generate “junk topics”, which refers to topics that contain only meaningless information. In our Vialogues research, we decided to leverage UCIscore and UMass score to compare the coherence aspect of discussion quality for YouTube and Vialogue. As discussed above, topic coherence only points us to the right direction, but we still need to make necessary innovative modification to solve our problem.
3) Implement, aka, Write Code Once we are clear of the details of the PMI algorithm for discussion coherence evaluation, it comes to the essential part: coding. The dataset provided is already beautiful enough, but coding is still needed to format raw data and a clean dataset. With all the materials ready, then it is time to code the algorithm. Yesterday, I successfully obtained Vialogues and YouTube discussion coherence scores for one video entry. Prototype, or pilot trials is a critical stage for coding, which means the logic methodology designer has been programmed. Goal for this week: scale all the coding to 723 documents, solve this problem. Many articles talk about “making sense of data”. Nowadays, this problem solving process written above is exactly one of the many examples of making sense of real world data.