Social Media Data Mining and Analytics

Autorzy	Gabor Szabo, Gungor Polatkan, P. Oscar Boykin,
Wydawnictwo	Wiley & Sons
Data wydania
Liczba stron	352
Forma publikacji	książka w miękkiej oprawie
Język	angielski
ISBN	9781118824856
Kategorie	Bazy danych

Zapytaj o ten produkt

Do schowka

Opis książki

Harness the power of social media to predict customer behavior and improve salesSocial media is the biggest source of Big Data. Because of this, 90% of Fortune 500 companies are investing in Big Data initiatives that will help them predict consumer behavior to produce better sales results. Social Media Data Mining and Analytics shows analysts how to use sophisticated techniques to mine social media data, obtaining the information they need to generate amazing results for their businesses.Social Media Data Mining and Analytics isn't just another book on the business case for social media. Rather, this book provides hands-on examples for applying state-of-the-art tools and technologies to mine social media - examples include Twitter, Wikipedia, Stack Exchange, LiveJournal, movie reviews, and other rich data sources. In it, you will learn:* The four key characteristics of online services-users, social networks, actions, and content* The full data discovery lifecycle-data extraction, storage, analysis, and visualization* How to work with code and extract data to create solutions* How to use Big Data to make accurate customer predictions* How to personalize the social media experience using machine learningUsing the techniques the authors detail will provide organizations the competitive advantage they need to harness the rich data available from social media platforms.

Social Media Data Mining and Analytics

Spis treści

Introduction xviiChapter 1 Users: TheWho of Social Media 1Measuring Variations in User Behavior in Wikipedia 2The Diversity of User Activities 3The Origin of the User Activity Distribution 12The Consequences of the Power Law 20The Long Tail in Human Activities 25Long Tails Everywhere: The 80/20 Rule (p/q Rule) 28Online Behavior on Twitter 32Retrieving Tweets for Users 33Logarithmic Binning 36User Activities on Twitter 37Summary 39Chapter 2 Networks: The How of Social Media 41Types and Properties of Social Networks 42When Users Create the Connections: Explicit Networks 43Directed Versus Undirected Graphs 45Node and Edge Properties 45Weighted Graphs 46Creating Graphs from Activities: Implicit Networks 48Visualizing Networks 51Degrees: The Winner Takes All 55Counting the Number of Connections 57The Long Tail in User Connections 58Beyond the Idealized Network Model 62Capturing Correlations: Triangles, Clustering, and Assortativity 64Local Triangles and Clustering 64Assortativity 70Summary 75Chapter 3 Temporal Processes: The When of Social Media 77What Traditional Models Tell You About Events in Time 77When Events Happen Uniformly in Time 79Inter-Event Times 81Comparing to a Memoryless Process 86Autocorrelations 89Deviations from Memorylessness 91Periodicities in Time in User Activities 93Bursty Activities of Individuals 99Correlations and Bursts 105Reservoir Sampling 106Forecasting Metrics in Time 110Finding Trends 112Finding Seasonality 115Forecasting Time Series with ARIMA 117The Autoregressive Part ("AR") 118The Moving Average Part ("MA") 119The Full ARIMA(p, d, q) Model 119Summary 121Chapter 4 Content: The What of Social Media 123Defining Content: Focus on Text and Unstructured Data 123Creating Features from Text: The Basics of Natural Language Processing 125The Basic Statistics of Term Occurrences in Text 128Using Content Features to Identify Topics 129The Popularity of Topics 138How Diverse Are Individual Users' Interests? 141Extracting Low-Dimensional Information from High-Dimensional Text 144Topic Modeling 145Unsupervised Topic Modeling 147Supervised Topic Modeling 155Relational Topic Modeling 162Summary 169Chapter 5 Processing Large Datasets 171Map Reduce: Structuring Parallel and Sequential Operations 172Counting Words 174Skew: The Curse of the Last Reducer 177Multi-Stage MapReduce Flows 179Fan-Out 180Merging Data Streams 181Joining Two Data Sources 183Joining Against Small Datasets 186Models of Large-Scale MapReduce 187Patterns in MapReduce Programming 188Static MapReduce Jobs 188Iterative MapReduce Jobs 195PageRank for Ranking in Graphs 195K-means Clustering 199Incremental MapReduce Jobs 203Temporal MapReduce Jobs 204Rollups and Data Cubing 205Expanding Rollup Jobs 211Challenges with Processing Long-Tailed Social Media Data 212Sampling and Approximations: Getting Results with Less Computation 214HyperLogLog 217HyperLogLog Example 219HyperLogLog on the Stack Exchange Dataset 221Performance of HLL on Large Datasets 222Bloom Filters 223A Bloom Filter Example 226Bloom Filter as Pre-Computed Membership Knowledge 228Bloom Filters on Large Social Datasets 229Count-Min Sketch 231Count-Min Sketch--Heavy Hitters Example 233Count-Min Sketch--Top Percentage Example 235Aggregating Approximate Data Structures 235Summary of Approximations 236Executing on a Hadoop Cluster (Amazon EC2) 237Installing a CDH Cluster on Amazon EC2 237Providing IAM Access to Collaborators 241Adding On-Demand Cluster Capabilities 242Summary 243Chapter 6 Learn, Map, and Recommend 245Social Media Services Online 246Search Engines 246Content Engagement 246Interactions with the Real World 248Interactions with People 249Problem Formulation 251Learning and Mapping 253Matrix Factorization 255Learning, Training 257Under- and Overfitting 257Regularizing in Matrix Factorization 259Non-Negative Matrix Factorization and Sparsity 260Demonstration on Movie Ratings 261Interpreting the Learned Stereotypes 265Exploratory Analysis 269Prediction and Recommendation 274Evaluation 277Overview of Methodologies 278Nearest Neighbor-Based Approaches 278Approaches Based on Supervised Learning 280Predicting Movie Ratings with Logistic Regression 280Common Issues with Features 288Domain-Specific Applications 289Summary 290Chapter 7 Conclusions 293The Surprising Stability of Human Interaction Patterns 293Averages, Standard Deviations, and Sampling 296Removing Outliers 303Index 309

Książki