Open Source Social Networking Tools

Submitted by Ankit Ranka on Sun, 01/31/2010 - 1:14am.
Ankit Ranka's picture

I have been looking at Open source social networking tools. Some of the interesting options are - Elgg, Mahara, XOPPS with YOGURT, Dolphin, Pinax. I was amazed to see how easy it is to develop our own social networking environment using these tools.
Following are some of the features of Elgg and Mahara -
Elgg -
1. Features
2. OpenId
3. 760 plugins available for various functionality.
4. GNU General Public License, version 2.
5. Relatively new project.

Mahara -

1. user centred environment with a permissions framework that enables different views of an e-portfolio to be easily managed. Mahara also features a weblog, resume builder and social networking system, connecting users and creating online learner communities.

 

ABBYY OCR

Submitted by Ankit Ranka on Mon, 12/21/2009 - 11:19am.
Ankit Ranka's picture

I have been experimenting with different OCR tools in order to convert scanned PDF documents into text for indexing and data mining.

I started with OCRopus which is a project supported by Google. But it was too slow and did not worked well on the PDF documents. Then I downloaded a trial version of ABBYY's FineReader, which actually worked pretty good. The best part was that it even retains the document structure, styling and font. Also, multiple pages were detected and separated.

 

Lie Detection ?

Submitted by Ankit Ranka on Tue, 12/01/2009 - 4:38pm.
Ankit Ranka's picture

Recently, I came across this survey - Charlatanry in forensic speech science: A problem to be taken seriously. It explains how current lie detection systems are no where near detecting lie but their market is increasingly growing. Here is a related video (worth watching) by ABC news in which they interviewed the owner of one of the lie detection systems CVSA, an "unstable" program with nearly 800 lines of code written in visual basic and its in use by many police departments -

 

Penn Discourse Tree Bank

Submitted by Ankit Ranka on Wed, 11/11/2009 - 4:59pm.
Ankit Ranka's picture

I just came from a talk on "annotating discourse meaning". For the people who dont know what a discourse analysis is, it's the analysis of language 'beyond the sentence', and the aim of the talk was to demonstrate Penn Discourse Tree Bank and how they 'annotate' discourse relation (connection) between sentences.
There are two ways discourse relation can be triggered -
1. Lexically
2. Adjacency

In Penn Discourse Tree Bank -
-each relation is annotated independently and dependency across difference not annotated.
-There are various argument labels and linear order:
-Any number of classes can be selected as arguments
-Only include as many clauses as are minimally required.
-Supplement to arguments - extra material

Implicit Connectives-
1. Due to adjacency
Example - Some have raised their cash position to record levels BECAUSE high cash position help buffer a fund when market falls.
In the above example, BECAUSE is an implicit connective between the two sentences.

2. Across paragraphs

 

The story of Human Rights

Submitted by Ankit Ranka on Thu, 10/29/2009 - 9:31pm.
Ankit Ranka's picture

 

A Reading Tutor that Listens

Submitted by Ankit Ranka on Mon, 10/12/2009 - 2:30pm.
Ankit Ranka's picture

Recently, I came across project LISTEN (Literacy Innovation that Speech Technology ENables) as part of my research for "Topics in speech processing" class. As stated on the project page - "Its an automated Reading Tutor that displays stories on a computer screen, and listens to children read aloud. The Reading Tutor intervenes when the reader makes mistakes, gets stuck, clicks for help, or is likely to encounter difficulty." This system has already been used by hundreds of children which generated a lot of data for "educational data mining".

 

Google tech talk @ columbia

Submitted by Ankit Ranka on Fri, 10/02/2009 - 2:27pm.
Ankit Ranka's picture

Following are my notes from the Google tech talk that i recently attended -

The talk was mainly about AdSense and Google search.

Following is the schematic view of how AdSense works -

google_tech_talk

In the above diagram A site represents advertisers site and P site represents publishers site and $ is an indicator of money involved.
There are three type of cookies that this type of architecture leaves on a user computer -
- website cookies
- P-site cookie
- A-site cookie

The P-site is responsible for frequency capping & reporting.

The A-site is responsible for yield management, storyboarding, frequency capping.

Distributed Budget problem -
- payment determined by an auction run at the time ad is shown.
- ads to be shown without overrunning the budget.

 

Social Navigation Support in a Course Recommendation System

Submitted by Ankit Ranka on Tue, 09/22/2009 - 1:11pm.
Ankit Ranka's picture

While working on the details of Course Recommendation System I found this paper - "Social Navigation Support in a Course Recommendation System" . The most interesting thing about this paper is Motivation for Providing Feedback. They follow the "do-it-for-yourself" approach.

The main theme of this approach is to encourage user's participation by turning their feedback into an activity that is important and meaningful to them. To implement this concept they used a Career Scope indicator which shows progress towards career goals as one fills out the evaluations. In my view feedback is one of the most important things for getting good results from a recommendation system. Also, users will be more interested in giving feedback if they are getting something out of it.

 

Games with a purpose

Submitted by Ankit Ranka on Tue, 07/14/2009 - 12:09pm.
Ankit Ranka's picture

Last semester while taking a Search Engine Technology class, I came across this website - www.gwap.com. The website defines itself as "Games With A Purpose (GWAP)". The games are very basic but they help computers learn about the human decision process. The following is an example from the About page of the website -

1. You and a partner see the same image and are asked to type in a tag for it. When you agree on a tag, you move on and are awarded points. After just a minute of play, you've agreed on six or seven tags.

2. We record those six or seven tags and associate them with the images.

3. Now a search engine will have a better idea of what's in those images.

I found this interesting because its very difficult to gather human labeled data using forms and surveys. But if you make a game out of it then the data collection becomes fun and people are ready to spend time on it.

Here is an interesting promotional video for GWAP -

 

Course Recommendation System

Submitted by Ankit Ranka on Thu, 07/09/2009 - 12:35pm.
Ankit Ranka's picture

Hi All,

I guess every one of us have been through the pain of the course selection process. I always wish if someone could suggest to me the right combination of courses to take. Of course, seniors are there to advice us but they only know about the courses they took. There are websites like this one which has the feedback of the students on the various courses and the professors who taught them. But these resources only have the information and does not recommend the course and professor combination on the basis of a person's history and interests. I would like to know what people in the lab think about such a tool. The following is a rough description of the tool -

Course Selection:
This expert system would maintain a knowledge base of different courses, their level of difficulty, specialization tracks, and combinations required (like prerequisites, mandatory courses, and so on). The knowledge base would be generated based on survey, experience and standard available information at Columbia. This system would be a helpful tool for incoming students by suggesting to them what would be an ideal choice of courses for a term. The student would input his preferences to the system based on the following criteria:

 
XML feed