2015 GEOINT Foreword: Riding the Data Science Wave
I had the pleasure of attending the 2015 GEOINT Symposium in Washington D.C. last week. This is the fifth GEOINT symposium I have had the opportunity to attend and the fourth GEOINT Foreword I have attended. I’ve always appreciated the mission that the GEOINT Foreword agenda has sought to fulfill by focusing on science and technology from tradecraft and practitioner perspectives. I’ve been impressed with the increasing amount of participation from academia at this event and this year was no different with some renowned geographers in the crowd and on panels. GEOINT Foreword topics have always followed major trends in the discipline and this year’s hot topic was data science.
There were three panels explicitly focused on data science with a range of panelists from academia, industry, and government. The session on “How to Train Your Data Scientists” provided a good mix of industry practitioners and academic perspectives, but left something to be desired in terms of the unique challenges those of us in the geospatial intelligence community face with incorporating data science techniques into our current analytical workflows. A member of the audience asked the panel how the current trends in data science specifically apply to spatial data and the short answer was that data science techniques were well suited for spatiotemporal data.
I agree that data science techniques have been used in other fields on spatiotemporal data with success, but I would add that as we incorporate these techniques into our analyses we have to continue to be mindful of some of the pitfalls of using spatial data, namely the violation of independence between observations, MAUP, and ecological inference fallacy. As geographers have done in the past (Spatial Regression, GWR, etc.), we must incorporate these mainly aspatial techniques while being mindful of what makes our data unique.
I found the session on “Data Science Acquisition Models” to be particularly informative due to the superbly moderated discussion and quality input from the panelists. Will Cukierski, whose company kaggle hosts data science competitions on the web, provided some excellent insight into how competitions and gamification can improve the quality and quantity of analytical models and products. Will’s input during the session exemplifies how perspectives from outside of our domain can help shape our path forward.
Saurin Shah, the moderator for the session whose company published the well received Field Guide to Data Science, pointed out the need for increased use of inductive reasoning in our field. I agree with Saurin’s comments and believe that as our data sets increase in size due to the increasing use of streaming sensor data and the further incorporation of open source data, the need to utilize inductive reasoning to derive hypotheses and questions from data will increase. Unsupervised learning techniques such as PCA and hierarchical clustering should become an essential part of the geospatial intelligence analyst’s tool belt as soon as possible.
GEOINT Foreword 2015 continued the tradition of complimenting the main GEOINT Symposium with a day focused on innovation and academic advancement. The focus on data science is a promising start to the use of the tools and techniques the term data science describes and I look forward to seeing the maturation of these techniques within our discipline.