Description
Perspectives on Data Science for Software Engineering presents the best practices of seasoned data miners in software engineering. The idea for this book was created during the 2014 conference at Dagstuhl, an invitation-only gathering of leading computer scientists who meet to identify and discuss cutting-edge informatics topics.At the 2014 conference, the concept of how to transfer the knowledge of experts from seasoned software engineers and data scientists to newcomers in the field highlighted many discussions. While there are many books covering data mining and software engineering basics, they present only the fundamentals and lack the perspective that comes from real-world experience. This book offers unique insights into the wisdom of the community's leaders gathered to share hard-won lessons from the trenches.Ideas are presented in digestible chapters designed to be applicable across many domains. Topics included cover data collection, data sharing, data mining, and how to utilize these techniques in successful software projects. Newcomers to software engineering data science will learn the tips and tricks of the trade, while more experienced data scientists will benefit from war stories that show what traps to avoid.- Presents the wisdom of community experts, derived from a summit on software analytics- Provides contributed chapters that share discrete ideas and technique from the trenches- Covers top areas of concern, including mining security and social data, data visualization, and cloud-based data- Presented in clear chapters designed to be applicable across many domains
Table of Contents
IntroductionPerspectives on data science for software engineeringSoftware analytics and its application in practiceSeven principles of inductive software engineering: What we do is differentThe need for data analysis patterns (in software engineering)From software data to software theory: The path less traveledWhy theory mattersSuccess Stories/ApplicationsMining apps for anomaliesEmbrace dynamic artifactsMobile app store analyticsThe naturalness of softwareAdvances in release readinessHow to tame your online servicesMeasuring individual productivityStack traces reveal attack surfacesVisual analytics for software engineering dataGameplay data plays nicer when divided into cohortsA success story in applying data science in practiceThere's never enough time to do all the testing you wantThe perils of energy mining: measure a bunch, compare just onceIdentifying fault-prone files in large industrial software systemsA tailored suit: The big opportunity in personalizing issue trackingWhat counts is decisions, not numbers—Toward an analytics design sheetA large ecosystem study to understand the effect of programming languages on code qualityCode reviews are not for finding defects—Even established tools need occasional evaluationTechniquesInterviewsLook for state transitions in temporal dataCard-sorting: From text to themesTools! Tools! We need tools!Evidence-based software engineeringWhich machine learning method do you need?Structure your unstructured data first!: The case of summarizing unstructured data with tag cloudsParse that data! Practical tips for preparing your raw data for analysisNatural language processing is no free lunchAggregating empirical evidence for more trustworthy decisionsIf it is software engineering, it is (probably) a Bayesian factorBecoming Goldilocks: Privacy and data sharing in "just right conditionsThe wisdom of the crowds in predictive modeling for software engineeringCombining quantitative and qualitative methods (when mining software data)A process for surviving survey design and sailing through survey deploymentWisdomLog it all?Why provenance mattersOpen from the beginningReducing time to insightFive steps for success: How to deploy data science in your organizationsHow the release process impacts your software analyticsSecurity cannot be measuredGotchas from mining bug reportsMake visualization part of your analysis processDon't forget the developers! (and be careful with your assumptions)Limitations and context of researchActionable metrics are better metricsReplicated results are more trustworthyDiversity in software engineering researchOnce is not enough: Why we need replicationMere numbers aren't enough: A plea for visualizationDon't embarrass yourself: Beware of bias in your dataOperational data are missing, incorrect, and decontextualizedData science revolution in process improvement and assessment?Correlation is not causation (or, when not to scream "Eureka!)Software analytics for small software companies: More questions than answersSoftware analytics under the lamp post (or what star trek teaches us about the importance of asking the right questions)What can go wrong in software engineering experiments?One size does not fit allWhile models are good, simple explanations are betterThe white-shirt effect: Learning from failed expectationsSimpler questions can lead to better insightsContinuously experiment to assess values early onLies, damned lies, and analytics: Why big data needs thick dataThe world is your test suite