ABE-IPSABE HOLDINGABE BOOKS
English Polski
On-line access

Bookstore

0.00 PLN
Bookshelf (0) 
Your bookshelf is empty
Machine Learning: Hands-On for Developers and Technical Professionals

Machine Learning: Hands-On for Developers and Technical Professionals

Authors
Publisher John Wiley & Sons Inc
Year 2020
Pages 432
Version paperback
Readership level Professional and scholarly
Language English
ISBN 9781119642145
Categories Mathematics
$48.89 (with VAT)
217.35 PLN / €46.60 / £40.45
Qty:
Delivery to United States

check shipping prices
Product to order
Delivery 3-4 weeks
Add to bookshelf

Book description

Dig deep into the data with a hands-on guide to machine learning with updated examples and more!
Machine Learning: Hands-On for Developers and Technical Professionals provides hands-on instruction and fully-coded working examples for the most common machine learning techniques used by developers and technical professionals. The book contains a breakdown of each ML variant, explaining how it works and how it is used within certain industries, allowing readers to incorporate the presented techniques into their own work as they follow along. A core tenant of machine learning is a strong focus on data preparation, and a full exploration of the various types of learning algorithms illustrates how the proper tools can help any developer extract information and insights from existing data. The book includes a full complement of Instructor's Materials to facilitate use in the classroom, making this resource useful for students and as a professional reference.


At its core, machine learning is a mathematical, algorithm-based technology that forms the basis of historical data mining and modern big data science. Scientific analysis of big data requires a working knowledge of machine learning, which forms predictions based on known properties learned from training data. Machine Learning is an accessible, comprehensive guide for the non-mathematician, providing clear guidance that allows readers to:





Learn the languages of machine learning including Hadoop, Mahout, and Weka

Understand decision trees, Bayesian networks, and artificial neural networks

Implement Association Rule, Real Time, and Batch learning

Develop a strategic plan for safe, effective, and efficient machine learning



By learning to construct a system that can learn from data, readers can increase their utility across industries. Machine learning sits at the core of deep dive data analysis and visualization, which is increasingly in demand as companies discover the goldmine hiding in their existing data. For the tech professional involved in data science, Machine Learning: Hands-On for Developers and Technical Professionals provides the skills and techniques required to dig deeper.

Machine Learning: Hands-On for Developers and Technical Professionals

Table of contents

Introduction xxvii





Chapter 1 What is Machine Learning? 1





History of Machine Learning 1





Alan Turing 1





Arthur Samuel 2





Tom M. Mitchell 2





Summary Definition 3





Algorithm Types for Machine Learning 3





Supervised Learning 3





Unsupervised Learning 4





The Human Touch 4





Uses for Machine Learning 4





Software 4





Stock Trading 5





Robotics 6





Medicine and Healthcare 6





Advertising 7





Retail and E-commerce 7





Gaming Analytics 9





The Internet of Things 10





Languages for Machine Learning 10





Python 10





R 11





Matlab 11





Scala 11





Ruby 11





Software Used in This Book 11





Checking the Java Version 12





Weka Toolkit 12





DeepLearning4J 13





Kafka 13





Spark and Hadoop 13





Text Editors and IDEs 13





Data Repositories 14





UC Irvine Machine Learning Repository 14





Kaggle 14





Summary 14





Chapter 2 Planning for Machine Learning 15





The Machine Learning Cycle 15





It All Starts with a Question 16





I Don't Have Data! 16





Starting Local 17





Transfer Learning 17





Competitions 17





One Solution Fits All? 18





Defining the Process 18





Planning 18





Developing 19





Testing 19





Reporting 19





Refining 19





Production 20





Avoiding Bias 20





Building a Data Team 20





Mathematics and Statistics 20





Programming 21





Graphic Design 21





Domain Knowledge 21





Data Processing 22





Using Your Computer 22





A Cluster of Machines 22





Cloud-Based Services 22





Data Storage 23





Physical Discs 23





Cloud-Based Storage 23





Data Privacy 23





Cultural Norms 24





Generational Expectations 24





The Anonymity of User Data 25





Don't Cross the "Creepy Line" 25





Data Quality and Cleaning 26





Presence Checks 26





Type Checks 27





Length Checks 27





Range Checks 28





Format Checks 28





The Britney Dilemma 28





What's in a Country Name? 31





Dates and Times 33





Final Thoughts on Data Cleaning 33





Thinking About Input Data 34





Raw Text 34





Comma-Separated Variables 34





JSON 35





YAML 37





XML 37





Spreadsheets 38





Databases 39





Thinking About Output Data 39





Don't Be Afraid to Experiment 40





Summary 40





Chapter 3 Data Acquisition Techniques 43





Scraping Data 43





Copy and Paste 44





Google Sheets 46





Using an API 47





Acquiring Weather Data 48





Migrating Data 50





Installing Embulk 51





Using the Quick Run 51





Installing Plugins 52





Migrating Files to Database 53





Bulk Converting CSV to JSON 55





Summary 56





Chapter 4 Statistics, Linear Regression, and Randomness 57





Working with a Basic Dataset 57





Loading and Converting the Dataset 58





Introducing Basic Statistics 59





Minimum and Maximum Values 60





Sum 61





Mean 62





Arithmetic Mean 62





Harmonic Mean 62





Geometric Mean 63





The Relationship Between the Three Averages 63





Mode 65





Median 66





Range 67





Interquartile Ranges 67





Variance 68





Standard Deviation 69





Using Simple Linear Regression 70





Using Your Spreadsheet 70





Writing a Program 73





Embracing Randomness 75





Finding Pi with Random Numbers 76





Using Monte Carlo Pi in Clojure 77





Summary 80





Chapter 5 Working with Decision Trees 81





The Basics of Decision Trees 81





Uses for Decision Trees 81





Advantages of Decision Trees 82





Limitations of Decision Trees 82





Different Algorithm Types 82





How Decision Trees Work 84





Decision Trees in Weka 88





The Requirement 88





Training Data 89





Using Weka to Create a Decision Tree 90





Creating Java Code from the Classification 94





Testing the Classifier Code 99





Thinking About Future Iterations 101





Summary 101





Chapter 6 Clustering 103





What is Clustering? 103





Where is Clustering Used? 104





The Internet 104





Business and Retail 104





Law Enforcement 105





Computing 105





Clustering Models 105





How the K-Means Works 106





Calculating the Number of Clusters in a Dataset 108





K-Means Clustering with Weka 110





Preparing the Data 110





The Workbench Method 111





The Command-Line Method 116





Converting CSV File to ARFF 116





The Coded Method 120





Summary 128





Chapter 7 Association Rules Learning 129





Where is Association Rules Learning Used? 129





Web Usage Mining 130





Beer and Diapers 130





How Association Rules Learning Works 131





Support 133





Confidence 133





Lift 134





Conviction 134





Defining the Process 134





Algorithms 135





Apriori 135





FP-Growth 136





Mining the Baskets-A Walk-Through 136





The Raw Basket Data 136





Using the Weka Application 137





Inspecting the Results 141





Summary 142





Chapter 8 Support Vector Machines 143





What is a Support Vector Machine? 143





Where are Support Vector Machines Used? 144





The Basic Classification Principles 144





Binary and Multiclass Classification 144





Linear Classifiers 146





Confidence 147





Maximizing and Minimizing to Find the Line 147





How Support Vector Machines Approach Classification 148





Using Linear Classification 148





Using Non-Linear Classification 150





Using Support Vector Machines in Weka 151





Installing LibSVM 151





A Classification Walk-Through 152





Implementing LibSVM with Java 158





Summary 164





Chapter 9 Artificial Neural Networks 165





What is a Neural Network? 165





Artificial Neural Network Uses 166





High-Frequency Trading 166





Credit Applications 167





Data Center Management 167





Robotics 167





Medical Monitoring 168





Trusting the Black Box 168





Breaking Down the Artificial Neural Network 169





Perceptrons 169





Activation Functions 170





Multilayer Perceptrons 171





Back Propagation 173





Data Preparation for Artificial Neural Networks 174





Artificial Neural Networks with Weka 175





Generating a Dataset 175





Loading the Data into Weka 177





Configuring the Multilayer Perceptron 178





Training the Network 180





Altering the Network 182





Increasing the Test Data Size 183





Implementing a Neural Network in Java 183





Creating the Project 183





Writing the Code 185





Converting from CSV to Arff 188





Running the Neural Network 188





Developing Neural Networks with DeepLearning4J 189





Modifying the Data 189





Viewing Maven Dependencies 190





Handling the Training Data 191





Normalizing Data 191





Building the Model 192





Evaluating the Model 193





Saving the Model 193





Building and Executing the Program 194





Summary 195





Chapter 10 Machine Learning with Text Documents 197





Preparing Text for Analysis 198





Apache Tika 198





Cleaning the Text Data 203





Stopwords 205





Stemming 206





N-grams 206





TF/IDF 207





Loading the Documents 207





Calculating the Term Frequency 208





Calculating the Inverse Document Frequency 208





Computing the TF/IDF Score 209





Reviewing the Final Code Listing 209





Word2Vec 211





Loading the Raw Text Data 212





Tokenizing the Strings 212





Creating the Model 212





Evaluating the Model 213





Reviewing the Final Code 214





Basic Sentiment Analysis 216





Loading Positive and Negative Words 216





Loading Sentences 217





Calculating the Sentiment Score 217





Reviewing the Final Code 218





Performing a Test Run 220





Further Development 220





Summary 221





Chapter 11 Machine Learning with Images 223





What is an Image? 223





Introducing Color Depth 224





Images in Machine Learning 225





Basic Classifi cation with Neural Networks 226





Basic Settings 226





Loading the MNIST Images 226





Model Configuration 227





Model Training 228





Model Evaluation 228





Convolutional Neural Networks 228





How CNNs Work 228





CNN Demonstration 231





Downloading the Image Data 231





Basic Setup 232





Handling the Training and Test Data 233





Image Preparation 233





CNN Model Configuration 234





Model Training 236





Model Evaluation 236





Saving the Model 237





Transfer Learning 237





Summary 238





Chapter 12 Machine Learning Streaming with Kafka 239





What You Will Learn in This Chapter 239





From Machine Learning to Machine Learning Engineer 240





From Batch Processing to Streaming Data Processing 241





What is Kafka? 241





How Does It Work? 241





Fault Tolerance 243





Further Reading 243





Installing Kafka 243





Kafka as a Single-Node Cluster 244





Kafka as a Multinode Cluster 245





Topics Management 247





Creating Topics 248





Finding Out Information About Existing Topics 248





Deleting Topics 249





Sending Messages from the Command Line 249





Receiving Messages from the Command Line 250





Kafka Tool UI 250





Writing Your Own Producers and Consumers 251





Producers in Java 251





Consumers in Java 255





Building and Running the Applications 258





The Streaming API 260





Building a Streaming Machine Learning System 262





Planning the System 263





Continuous Training 265





Determining Which Models to Use for Predictions 266





Determining Which Algorithms to Use 268





Simple Linear Regression 271





Neural Network 274





Kafka Topics 281





Creating the Topics 281





Kafka Connect 283





Why Persist the Event Data? 283





The REST API Microservice 285





Processing Commands and Events 287





Finding Kafka Brokers 288





A Command or an Event? 289





Making Predictions 293





Prediction Streaming API 293





Prediction Functions 296





Predicting Linear Regression 298





Predicting the Neural Network Model 299





Running the Project 301





Run MySQL 301





Run Zookeeper 301





Run Kafka 301





Create the Topics 301





Run Kafka Connect 301





Model Builds 302





Run Events Streaming Application 302





Run Prediction Streaming Application 302





Start the API 302





Send JSON Training Data 302





Train a Model 302





Make a Prediction 303





Summary 303





Chapter 13 Apache Spark 305





Spark: A Hadoop Replacement? 305





Java, Scala, or Python? 306





Downloading and Installing Spark 306





A Quick Intro to Spark 306





Starting the Shell 307





Data Sources 307





Testing Spark 308





Spark Monitor 309





Comparing Hadoop MapReduce to Spark 310





Writing Stand-Alone Programs with Spark 313





Spark Programs in Java 313





Spark Program Summary 318





Spark SQL 318





Basic Concepts 318





Wrapping Up SparkSQL 323





Spark Streaming 323





Basic Concepts 323





Creating Your First Spark Stream 324





Spark Streams from Kafka 326





MLib: The Machine Learning Library 327





Dependencies 328





Decision Trees 328





Clustering 330





Association Rules with FP-Growth 332





Summary 335





Chapter 14 Machine Learning with R 337





Installing R 337





macOS 337





Windows 338





Linux 338





Your First Run 338





Installing R-Studio 339





The R Basics 340





Variables and Vectors 340





Matrices 341





Lists 342





Data Frames 343





Installing Packages 344





Loading in Data 345





Plotting Data 347





Simple Statistics 350





Simple Linear Regression 350





Creating the Data 351





The Initial Graph 351





Regression with the Linear Model 351





Making a Prediction 352





Basic Sentiment Analysis 353





Using Functions to Load in Word Lists 353





Writing a Function to Score Sentiment 354





Testing the Function 354





Apriori Association Rules 355





Installing the arules Package 355





Gathering the Training Data 356





Importing the Transaction Data 356





Running the Apriori Algorithm 357





Inspecting the Results 358





Accessing R from Java 358





Installing the rJava Package 358





Creating Your First Java Code in R 359





Calling R from Java Programs 359





Setting Up an Eclipse Project 360





Creating the Java/R Class 361





Running the Example 361





Extending Your R Implementations 363





Connecting to Social Media with R 364





Summary 366





Appendix A Kafka Quick Start 367





Installing Kafka 367





Starting Zookeeper 367





Starting Kafka 368





Creating Topics 368





Listing Topics 369





Describing a Topic 369





Deleting Topics 369





Running a Console Producer 370





Running a Console Consumer 370





Appendix B The Twitter API Developer Application Configuration 371





Appendix C Useful Unix Commands 375





Using Sample Data 375





Showing the Contents: cat, more, and less 376





Example Command 376





Expected Output 376





Filtering Content: grep 377





Example Command for Finding Text 377





Example Output 377





Sorting Data: sort 378





Example Command for Basic Sorting 378





Example Output 378





Finding Unique Occurrences: uniq 380





Showing the Top of a File: head 381





Counting Words: wc 381





Locating Anything: find 382





Combining Commands and Redirecting Output 383





Picking a Text Editor 383





Colon Frenzy: Vi and Vim 383





Nano 384





Emacs 384





Appendix D Further Reading 385





Machine Learning 385





Statistics 386





Big Data and Data Science 386





Visualization 387





Making Decisions 387





Datasets 388





Blogs 388





Useful Websites 389





The Tools of the Trade 389





Index 391

We also recommend books

Strony www Białystok Warszawa
801 777 223