Tap into the power of data science with this comprehensive resource for non-technical professionals
Data Science: The Executive Summary - A Technical Book for Non-Technical Professionals is a comprehensive resource for people in non-engineer roles who want to fully understand data science and analytics concepts. Accomplished data scientist and author Field Cady describes both the "business side" of data science, including what problems it solves and how it fits into an organization, and the technical side, including analytical techniques and key technologies.
Data Science: The Executive Summary covers topics like:
* Assessing whether your organization needs data scientists, and what to look for when hiring them
* When Big Data is the best approach to use for a project, and when it actually ties analysts' hands
* Cutting edge Artificial Intelligence, as well as classical approaches that work better for many problems
* How many techniques rely on dubious mathematical idealizations, and when you can work around them
Perfect for executives who make critical decisions based on data science and analytics, as well as mangers who hire and assess the work of data scientists, Data Science: The Executive Summary also belongs on the bookshelves of salespeople and marketers who need to explain what a data analytics product does. Finally, data scientists themselves will improve their technical work with insights into the goals and constraints of the business situation.
Data Science: The Executive Summary - A Technical Book for Non-Technical Professionals
Table of contents
1 Introduction 1
1.1 Why Managers Need to Know About Data Science 1
1.2 The New Age of Data Literacy 2
1.3 Data-Driven Development 3
1.4 How to Use this Book 4
2 The Business Side of Data Science 7
2.1 What Is Data Science? 7
2.1.1 What Data Scientists Do 7
2.1.2 History of Data Science 9
2.1.3 Data Science Roadmap 12
2.1.4 Demystifying the Terms: Data Science, Machine Learning, Statistics, and Business Intelligence 13
2.1.4.1 Machine Learning 13
2.1.4.2 Statistics 14
2.1.4.3 Business Intelligence 15
2.1.5 What Data Scientists Don't (Necessarily) Do 15
2.1.5.1 Working Without Data 16
2.1.5.2 Working with Data that Can't Be Interpreted 17
2.1.5.3 Replacing Subject Matter Experts 17
2.1.5.4 Designing Mathematical Algorithms 18
2.2 Data Science in an Organization 19
2.2.1 Types of Value Added 19
2.2.1.1 Business Insights 19
2.2.1.2 Intelligent Products 19
2.2.1.3 Building Analytics Frameworks 20
2.2.1.4 Offline Batch Analytics 21
2.2.2 One-Person Shops and Data Science Teams 21
2.2.3 Related Job Roles 22
2.2.3.1 Data Engineer 22
2.2.3.2 Data Analyst 22
2.2.3.3 Software Engineer 23
2.3 Hiring Data Scientists 25
2.3.1 Do I Even Need Data Science? 26
2.3.2 The Simplest Option: Citizen Data Scientists 27
2.3.3 The Harder Option: Dedicated Data Scientists 28
2.3.4 Programming, Algorithmic Thinking, and Code Quality 28
2.3.5 Hiring Checklist 31
2.3.6 Data Science Salaries 32
2.3.7 Bad Hires and Red Flags 32
2.3.8 Advice with Data Science Consultants 34
2.4 Management Failure Cases 36
2.4.1 Using Them as Devs 36
2.4.2 Inadequate Data 36
2.4.3 Using Them as Graph Monkeys 37
2.4.4 Nebulous Questions 37
2.4.5 Laundry Lists of Questions Without Prioritization 38
3 Working with Modern Data 41
3.1 Unstructured Data and Passive Collection 41
3.2 Data Types and Sources 42
3.3 Data Formats 43
3.3.1 CSV Files 43
3.3.2 JSON Files 44
3.3.3 XML and HTML 46
3.4 Databases 47
3.4.1 Relational Databases and Document Stores 48
3.4.2 Database Operations 49
3.5 Data Analytics Software Architectures 50
3.5.1 Shared Storage 51
3.5.2 Shared Relational Database 52
3.5.3 Document Store+Analytics RDB 52
3.5.4 Storage+Parallel Processing 53
4 Telling the Story, Summarizing Data 55
4.1 Choosing What to Measure 56
4.2 Outliers, Visualizations, and the Limits of Summary Statistics: A Picture IsWorth a Thousand Numbers 58
4.3 Experiments, Correlation, and Causality 60
4.4 Summarizing One Number 62
4.5 Key Properties to Assess: Central Tendency, Spread, and Heavy Tails 63
4.5.1 Measuring Central Tendency 63
4.5.1.1 Mean 63
4.5.1.2 Median 64
4.5.1.3 Mode 65
4.5.2 Measuring Spread 65
4.5.2.1 Standard Deviation 65
4.5.2.2 Percentiles 66
4.5.3 Advanced Material: Managing Heavy Tails 67
4.6 Summarizing Two Numbers: Correlations and Scatterplots 68
4.6.1 Correlations 68
4.6.1.1 Pearson Correlation 71
4.6.1.2 Ordinal Correlations 71
4.6.2 Mutual Information 72
4.7 Advanced Material: Fitting a Line or Curve 72
4.7.1 Effects of Outliers 75
4.7.2 Optimization and Choosing Cost Functions 76
4.8 Statistics: How to Not Fool Yourself 77
4.8.1 The Central Concept: The p-Value 78
4.8.2 Reality Check: Picking a Null Hypothesis and Modeling Assumptions 80
4.8.3 Advanced Material: Parameter Estimation and Confidence Intervals 81