DataGotham - Empire State of Data

Submitted by Sharon Hsiao on Thu, 09/13/2012 - 11:31am.
Sharon Hsiao's picture

I've previously talked a bit about predictive analytics, the post is here.
The amount of data nowadays is changing our daily lives, from individuals to business organizations to politics. From mining data, interpreting data, decision making, and managing it, what's beyond the hype?

DataGotham is a two-day event (today and tomorrow) organized by New York City's data community that will bring together professionals from finance, fashion, education and startups to the Fortune 500.

It features numbers of big shots and companies, including Etsy, Foursquare, Tumblr, LivingSocial, NYU Stern, Knewton, Kickstarter etc. The data scientists from these companies/institutes are going to share their experiences and strategies in coping with the Big Data.

A complete speakers list can be found here.

Live stream begins on Friday, September 14th at 8:45 AM EDT. But it's just right downtown, maybe we can just go check it out and ask questions :)



Sharon Hsiao's picture
Sharon Hsiao Says:
Wed, 09/19/2012 - 9:16am

all the talks are now available here


Kate Meersschaert's picture
Kate Meersschaert Says:
Wed, 09/19/2012 - 1:12pm

Thanks so much Sharon! Unfortunately I can't find talks from the individuals I successfully pitched for NLT! ;)


Hui Soo Chae's picture
Hui Soo Chae Says:
Sun, 09/16/2012 - 11:37pm

Kate, do you think any of features speakers might be good for Profiles?

I also wonder if we should have produced a NL Sector and Seen in NY about this.


Kate Meersschaert's picture
Kate Meersschaert Says:
Tue, 09/18/2012 - 12:07pm

Hi Hui Soo, we are on the same wavelength! See my initial comment at the beginning of the thread below... I think some of the speakers might be great profiles and will research this further. I will also research re: Sector.


Sharon Hsiao's picture
Sharon Hsiao Says:
Fri, 09/14/2012 - 5:30pm

SeatGeek

I find it interesting that, Steve said the data is extremely noisy.
people always come in with having something in mind(go to the event & how to decide). which are something the company doesn't know.
however the algorithm runs, generates...ppl still buy it.
they r learning how to maximize the big gap between clicks to actual sales.

(assumption: get it out there sooner and make users stay, there r always sales, it's only the matter of small or big?)

* the more informed of the data they are presented, the more users r willing to provide it back.
make small data stand tall.
create Painless experience.


Sharon Hsiao's picture
Sharon Hsiao Says:
Fri, 09/14/2012 - 5:03pm

Finally....Knewton

the amount of data is scaling up, test scores, logins, etc.

visualize the groups of students learning with concept maps (here we go).

personalized learning: try to make it time-effective. enable greater flexibility and accessibility.

zzzzz......


Sharon Hsiao's picture
Sharon Hsiao Says:
Fri, 09/14/2012 - 3:15pm

LivingSocial:
Bryce talks about cold start problem of recommender systems.

They map new items to old items, where the items share the same features.
ML again, to classified items(the deals).
cos they r not neflix, u view the movie but aren't willing to provide review.
livingsocial can't know where readers spam the mail or on vacation. (no persistent items)
*Very important to tune the feature extractors.
they use backprop and bring it back to regression.


Sharon Hsiao's picture
Sharon Hsiao Says:
Fri, 09/14/2012 - 2:40pm

Jeremy: (CTO of Collective)

- visualization helps u build intuition to understand the data.
& find errors, find outliers, sell the science.

- time series of visualization might reveal a different story (such as causal inferences)

they use flume, hadoop, R, tableau


Sharon Hsiao's picture
Sharon Hsiao Says:
Fri, 09/14/2012 - 2:12pm

DataKind

non-profit organization that uses data in services of humanity.
sign up as data scientists or provide data.

(maybe ResearchBroker share some same characteristics here)


Sharon Hsiao's picture
Sharon Hsiao Says:
Fri, 09/14/2012 - 12:17pm

Deborah Berebichez (MSCI)

with all these high frequency data, do we really learn more about our health, market, retail?

"big data" needs to be complimented by "Big Judgement" (Harvard Business Review)

Deciphering the Markets with Data:
basic idea is to use standard deviation of something to measure Risk.
-simple time series model will be very computationally expensive.
-CAPM model: use a resembling sample(big) + a residual error term (correlation)
-Barra's multi-factor model: each industry + a style factor, by performing a regression
more and more models.

risk management, basically help the customer to create portfolios by using all the "factors" to deciphering each customers' insight.


Sharon Hsiao's picture
Sharon Hsiao Says:
Fri, 09/14/2012 - 11:53am

Panel:
- Adam Laiacano (tumblr)
- Fred Benenson (Kickstarter)
- Roberto Medri (Etsy)

*good tip from Etsy, Help page is the highest hit page of the complete site.
* all three panelists use R. Tumblr has hive structure. most of them are batch processing. not real-time analytics.
*Roberto's definition of data scientist is to be able to join acquitted data and engineering structures.

*how do u make business people to trust ur analyses? and how to make engineers to trust ur algorithms?
-make the explanation transparent.
-communication.
-to business ppl, do not be afraid of explaining things in simpler manners. use visualizations etc.
(computer scientists are taking over the world?!)

*advise for students who want to be data scientists?
-Do Things, Make Things.
-pick an idea and run with it.
-also, don't pass out programming, databases classes. those are essential skill sets.

Looks like data scientists work closely with CEO & product teams, as the communicator.
mainly, blurring the lines with technical and non-technical people. and that's important.


Sharon Hsiao's picture
Sharon Hsiao Says:
Fri, 09/14/2012 - 10:38am

Baratunde Thurston (Cultivated Wit)

They use a lot of social media, like tweets, foursquare checkins etc. to crowdsource the wisdom and incorporate the materials into joke, comedy, shows.

cute fun facts:
tuesday is the least racism day of the week, according to people's shopping pattern.

(crowdsourcing the hot topic by social media doesn't seem like a new concept, i'm curious, is there a tool, framework, for teachers to do that, and easily create educational content?)


Sharon Hsiao's picture
Sharon Hsiao Says:
Fri, 09/14/2012 - 10:07am

Blake Shaw (Foursquare)

Blake is Data Scientist at Foursquare. he's CU alum :)
He shows several "time" and "places" patterns based on the check-in data.
pretty neat.
He talks about Livehood too.see more detail here

- NYC is strongly correlated with temperature and ice-cream consumption. (amazing!)

He gave a lot examples on the location driven data based friendship, connections among people. He then posed a great question here, everything is social, when u open a coffee shop now, u should be thinking about how this place connects people, how this place socially exposed to people. (not how much traffic of this location, or what kind of beans u r carrying)


Megha Agarwala's picture
Megha Agarwala Says:
Fri, 09/14/2012 - 10:24am

What we can learn about NYC from millions of checkins on FourSquare

Data classification by time of the day: each place in NYC is busy at a different time

We can get important information on Neighborhoods

For e.g. in Soho we have more checks in Offices and Clothing Stores

In East Village, People check in more in bars

We can also get answer to which neighborhoods are similar ?

How city behaves when certain external conditions are applied ?

Ice-cream is related to temp increase (A very strong correlation)

Checkins are correlated with weather: more art store checkins in winter

Shows data viz of: What happens when a new coffee shop opens in east village ?

What are the best places ? Figure that out through different parameters:

Popularity
Rating
Sentiment
Loyalty - did people go back to those places ?
Expertise - identify experts from this data, who influences a lot of people from this ?

Talks about "Explore" in FourSquare which is a built in check in data recommendation tool. Recommendations delivered on the basis of

loc
time of day
personal check in history
friend recommendations etc


Megha Agarwala's picture
Megha Agarwala Says:
Fri, 09/14/2012 - 9:46am

Harlan Harris:

VC's are throwing money at big data companies.

Companies are confused with the expectations and skill sets to look for in a data scientist.

Some skills people self identified as a data scientist: bayseian stats, product dev, data manipulation, data viz, maths


Sharon Hsiao's picture
Sharon Hsiao Says:
Fri, 09/14/2012 - 9:27am

1st up:
Harlan Harris (Data Community, DC)

He talks about the "growing pains" of big data.
could apply the old school way to conduct survey study: consisting 3 goals
- define sub-groups
- highlight the variety
- can we use the data to improve what we do now

skills: programming,statistics, math, business,ML/big data.
use "T shape skill set" ( one skill in depth, and the others at breadth) to measure the variety of data scientists.


Kate Meersschaert's picture
Kate Meersschaert Says:
Fri, 09/14/2012 - 8:59am

Sharon, Great find! I wonder if this would make a good NL Sector piece?