The first question a blog like this needs to answer is really, “What is data science?” This is particularly important because there are a lot of definitions out there, and the community hasn’t coalesced on one yet. The definition I’ll be using is that data science is an overarching discipline that includes elements from several fields and focuses on integrating them together from a systematic perspective. The major disciplines that are used by a data scientist are:
- Analytics and Statistics– Mathematical techniques focused on quantifying performance and predicting future trends. This field has become quite broad in recent years, expanding significantly from its roots in linear regression to include text, video, and audio analytics. One of the more interesting trends in modern data science is the application of statistical and analytical trends to cases where data on all or nearly all of the population being studied is available, rather than simply small samples, many consider this the threshold where “big data” begins.
- Business Intelligence – The easiest way to describe business intelligence is that it’s the next layer on top of analytics and statistics. Once the data has been analyzed and trends are found, they need to be turned into useful information. Once that information has been developed, it needs to be presented in both textual and graphical forms so that it can inform and educate the consumers of the data.
- Data Visualization – The field of study associated with presenting information in a visual medium, and increasingly in interactive media. Good data visualizations simplify and focus the information being presented and tell a story. Great data visualizations enable the consumers of the information to make new links between information they already have and understand higher dimension data that would not normally be accessible without specialized training.
- Data Modeling – Data scientists are often called upon to develop logical data models describing real world scenarios and events. While the techniques used for this often overlap with the relational models built for traditional database development, the needs of analytics system require special tools, such as Kimball’s Bus Matrix.
- Computer Science – Almost every data scientist will be drawn on to perform some programming. Beyond this programming, there are entire fields being developed in computer science that overlap and integrate with data science. With the help of machine learning, stream processing and multi-threaded programming, data scientists are able to do more and more every month.
With these tools, data scientist are making an impact in almost every area of human society – from medicine to marketing to law enforcement. It’s an amazing field, and I’m excited to talk about it here and its intersections with K-12 education here.