This post was originally published by me on The Next Web
Humans are walking data centers and our interactions and behaviors, no matter how minuscule, are building a near infinite pool of aggregate data over the course of our lifetime.
This data can be used to improve the experience of our lives through analysis of our consumption, interactions and behaviors; in business today, data has become a competitive advantage and necessary component of product development.
Considered one of the sexiest fields to pursue (InformationWeek’s words, not mine), the data scientist is now assisting companies of all shapes and sizes make sense of these massive data sets to better inform business outcomes.
The field has become much more mainstream in recent years and forecasts are just as bullish on its growth.
General Assembly told TNW that the number of data science students enrolling at GA will more than double from 2013 to 2014, and the number of completed applications to the program has already tripled in 2014 vs. 2013.
In a 2011 report, McKinsey Global Institute estimated that by 2018 there will be 4 million big data related positions in the US.
And companies aren’t just actively recruiting, they’re opening up their pocketbooks. According to data from Glassdoor, the median salary for datascientists in the United States is $115,000.
So why data scientists, and why now?
I spoke to a number of data professionals across multiple industries and found some of my own trends in the reasons why their profession is in such high demand.
Accessibility to data is greater than ever before
“We are now curating data at a much faster speed than we can curate the people who know how to work with it,” says Edward Podojil, Data Scientist at online home cleaning service Homejoy.
The rise of online communities, e-commerce, mobile and the overall digitization of society has created new data sets that go beyond just number crunching.
“Anyone who has a website instantly is going to have something like Apache web servers, with something as simple as a logging feature,” Matthew Ruttley,Data Scientist at Mozilla, tells TNW.
“It’s very easy for even a new software engineer to log a certain action on the site, such as if someone filled out a form or not.”
Companies look at the different interactions by different users within the site, from how they came in, to the activity taken on the site, to how they completed a particular activity. With the data, teams can look at the optimal characteristics of users who have satisfied their goals.
Aside from pure transactional data, we’ve evolved in our digital lives such that we’re publishing more contextual information online than ever before.
The tweets we post, the reviews we leave on Yelp, the blog posts we write, the searches we make… they’re all adding an additional layer on the data puzzle that really hasn’t been seen before, revealing new relationships about groups of people and their behaviors.
Data is now crucial to product development
Data’s influence on product development is easily seen in nearly every company today. Optimization based on this influx of data helps drive actionable business decisions in real time, whereas before data scarcity forced iteration based on gut, or unreliable and outdated information.
Still today, for all the science involved in applying data still requires a human intelligence to craft appropriate strategies to take advantage of this data.
“Maybe you can’t write an algorithm to automatically increase conversions on your site,” says Ruttley. “But you could possibly write an algorithm that makes it easy for a human to understand what’s happening on the site, then the human can make the change in their behavior.“
That change can come in a number of forms depending on what the data is telling you. Companies might try changing messaging in ad copy, serving up different content within an app based on a user’s selections, geographic region, or changing the interface to inspire more action.
“One time we tried changing the color of a button in the app and it didn’t change the experience much,” John Sandall, Data Scientist at YPlan, tells us.
“Then we changed the copy on one of the buttons and that drove up conversions 15 percent.”
With so many data points, it’s important to be able to experiment and continually test until you find the optimal mix that satisfies specific business KPIs. Data scientists help look at data across a number of different sources to help find that optimal mix.
We live in a balancing act of privacy
Today’s free-flowing exchange of ideas, goods and services is punctuated by one overarching relationship that’s constantly at the center of debate: the right todata.
As mentioned before nearly every interaction is measured, giving these companies more access to map personas of the people using their technology. For the most part, this is a good thing – it helps the user experience improve and makes the things we love even better.
But a big part of that is also making these companies sustainable in the marketplace for the future, and that means experimenting and monetizing.
Earlier this year Facebook came under fire for just that. A team of Facebook datascientists published a research study conducted by altering 689,003 users’ feeds over a week to test their emotional reaction.
Entitled, Experimental evidence of massive-scale emotional contagion through social networks, the report came under serious scrutiny and caused so much of a stir that even US Senator Mark R. Warner (D-VA) asked privacy and consumer protection experts at the Federal Trade Commission to look into the Facebook study.
But who is to say what uses of data are good or bad?
Facebook is a seemingly easier target for negative attention because it’s already polarizing and so much larger than most other social networks. When things happen to Facebook, it wakes up a larger audience to the ramifications of what they’re giving up in order to interact online.
But other companies have received praise for their use of data as not invasive, but interesting and valuable forms of content marketing.
Take for example OKCupid, a company that has regularly used its user dataunabashedly to draw insights about the psychology of human interaction and attraction.
In a blog post entitled “We Experiment On Human Beings!” OkCupid President Christian Rudder expressed, “if you use the Internet, you’re the subject of hundreds of experiments at any given time, on every site. That’s how websites work.” So why not use this groundswell of information to your advantage?
Rudder wrote an entire book based on his years of study of OKCupid users in an attempt to use personal data for good instead of “selling us stuff we don’t need.”
So for this new wave of data scientists, just having access to the data sets isn’t enough, it’s knowing how to make informed decisions that respect the user along the way and don’t disrupt their trust in the companies they interact with.
Of the five data scientists I spoke with, none seemed to express much concern about threatening user privacy when it came to their data collecting methods. The common attitude was they absolutely respected users, indicating that they are much more than just numbers.
A more complex role than you think
Perhaps one of the main reasons that data scientists are in such high demand is that they require an incredibly diverse skill set, which can sometimes be hard to find in people.
“The data scientist is someone who has fantastic communication and empathy ability, as well a lot of mathematical skill, and then the engineering skill they need to do the math that they want to,” says Hilary Mason, Founder at Fast Forward Labs.
The data scientist spends a lot of time working on statistical analysis, developing machine learning algorithms, hacking and mining for data, but also managing and communicating with multiple different teams within an organization.
For all businesses to achieve their goals together, there must be a sense of shared ownership of the data informing decisions between these departments. To help in aiding this organization-wide goal, scientists often sit among various groups to help facilitate discussion about how to best solve the answers to problems they’re individually facing as a department collectively facing as a company.
“I find myself working across multiple teams almost as a consultant,” says Podojil. “Data Scientists help extenuate the communication across different teams, as well as helping each team figure out what it is they need to be looking for to drive their goals.”
Much of the data scientist’s role is not just to provide the data or answers to questions, but to educate these departments on how to ask the right questions. Perhaps that’s why the role is becoming so crucial and outlook so bright for the field.
“There’s a habit of data science thinking in which, when faced with difficult decisions, you ask ‘Could data help me answer this question?’ or simply ‘What data are available to help me here?’” says Chris Wiggins, Chief Data Scientist with the New York Times.
“That habit of thinking, and the willingness to bring data into decisions,” says Wiggins “…is the most important change in demand at corporations.”