Survey Shows Data Scientists Spend Most of Their Time Cleaning Data - DATAVERSITY (original) (raw)
by Angela Guess
Gil Press reports in Forbes, “A new survey of data scientists found that they spend most of their time massaging rather than mining or modeling data. Still, most are happy with having the sexiest job of the 21st century. The survey of about 80 data scientists was conducted for the second year in a row by CrowdFlower, provider of a ‘data enrichment’ platform for data scientists. Here are the highlights: Data preparation accounts for about 80% of the work of data scientists. Data scientists spend 60% of their time on cleaning and organizing data. Collecting data sets comes second at 19% of their time, meaning data scientists spend around 80% of their time on preparing and managing data for analysis. 76% of data scientists view data preparation as the least enjoyable part of their work. 57% of data scientists regard cleaning and organizing data as the least enjoyable part of their work and 19% say this about collecting data sets.”
Press continues, “These findings are yet another confirmation of a very widely known and lamented fact of the data scientist’s work experience. In 2009, data scientist Mike Driscoll popularized the term ‘data munging,’ describing the ‘painful process of cleaning, parsing, and proofing one’s data’ as one of the three sexy skills of data geeks. In 2013, Josh Wills (then director of Data Science at Cloudera, now Director of Data Engineering at Slack ) told Technology Review ‘I’m a data janitor. That’s the sexiest job of the 21st century. It’s very flattering, but it’s also a little baffling.’ And Big Data Borat tweeted that ‘Data Science is 99% preparation, 1% misinterpretation’.”
Photo credit: Forbes