Hadoop : the definitive guide
(eBook)

Book Cover
Average Rating
Author
Published
Sebastopol, CA : O'Reilly Media, 2015.
Format
eBook
Edition
4th edition.
ISBN
1491901632, 9781491901632, 9781491901687, 1491901683
Physical Desc
1 online resource (1 volume) : illustrations
Status

Description

Loading Description...

Also in this Series

Checking series information...

More Like This

Loading more titles like this title...

Syndetics Unbound

More Details

Language
English
UPC
9781491901687

Notes

General Note
"4th Edition Revised & Updated"--Cover.
General Note
Includes index.
Description
Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you'll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You'll learn about recent changes to Hadoop, and explore new case studies on Hadoop's role in healthcare systems and genomics data processing. Learn fundamental components such as MapReduce, HDFS, and YARN Explore MapReduce in depth, including steps for developing applications with it Set up and maintain a Hadoop cluster running HDFS and MapReduce on YARN Learn two data formats: Avro for data serialization and Parquet for nested data Use data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer) Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with Hadoop Learn the HBase distributed database and the ZooKeeper distributed configuration service.
Local note
O'Reilly,O'Reilly Online Learning: Academic/Public Library Edition

Reviews from GoodReads

Loading GoodReads Reviews.

Citations

APA Citation, 7th Edition (style guide)

White, T. (2015). Hadoop: the definitive guide (4th edition.). O'Reilly Media.

Chicago / Turabian - Author Date Citation, 17th Edition (style guide)

White, Tom. 2015. Hadoop: The Definitive Guide. O'Reilly Media.

Chicago / Turabian - Humanities (Notes and Bibliography) Citation, 17th Edition (style guide)

White, Tom. Hadoop: The Definitive Guide O'Reilly Media, 2015.

MLA Citation, 9th Edition (style guide)

White, Tom. Hadoop: The Definitive Guide 4th edition., O'Reilly Media, 2015.

Note! Citations contain only title, author, edition, publisher, and year published. Citations should be used as a guideline and should be double checked for accuracy. Citation formats are based on standards as of August 2021.

Staff View

Grouped Work ID
70037ac4-68d3-87d1-6eb7-09fa1d9e1e2b-eng
Go To Grouped Work

Grouping Information

Grouped Work ID70037ac4-68d3-87d1-6eb7-09fa1d9e1e2b-eng
Full titlehadoop the definitive guide
Authorwhite tom
Grouping Categorybook
Last Update2024-04-16 12:23:35PM
Last Indexed2024-04-20 03:33:39AM

Book Cover Information

Image Sourcesyndetics
First LoadedJun 17, 2022
Last UsedApr 20, 2024

Marc Record

First DetectedNov 09, 2022 03:46:10 PM
Last File Modification TimeApr 16, 2024 12:32:57 PM

MARC Record

LEADER13840cam a2200733 i 4500
001ocn907477295
003OCoLC
00520240405112445.0
006m     o  d        
007cr unu||||||||
008150416s2015    caua    o     001 0 eng d
010 |a  2015473822
019 |a 948565689|a 1008958279|a 1066536525|a 1103272971|a 1105789519|a 1112559413|a 1112991507|a 1129366779|a 1153017917|a 1159656044|a 1228510559
020 |z 9781491901632
020 |a 1491901632|q (paperback)
020 |a 9781491901632
020 |a 9781491901687
020 |a 1491901683
0248 |a 9781491901687
0291 |a DEBBG|b BV042682532
0291 |a GBVCP|b 835869334
035 |a (OCoLC)907477295|z (OCoLC)948565689|z (OCoLC)1008958279|z (OCoLC)1066536525|z (OCoLC)1103272971|z (OCoLC)1105789519|z (OCoLC)1112559413|z (OCoLC)1112991507|z (OCoLC)1129366779|z (OCoLC)1153017917|z (OCoLC)1159656044|z (OCoLC)1228510559
037 |a CL0500000578|b Safari Books Online
040 |a UMI|b eng|e rda|e pn|c UMI|d CUS|d OCLCO|d DEBBG|d OCLCF|d CEF|d UAB|d ERF|d UHL|d CNCEN|d VT2|d C6I|d WYU|d OCLCO|d UKBTH|d OCLCO|d CZL|d TOH|d OCLCQ|d OCLCO|d OCLCQ|d OCLCO|d OCLCL|d OCLCQ
049 |a TKLA
050 4|a QA76.9.D5
08204|a 005.74|2 23
1001 |a White, Tom|q (Tom E.),|e author.|1 https://id.oclc.org/worldcat/entity/E39PCjBywKbFqPpxtxkGBJXb8d|0 http://id.loc.gov/authorities/names/nb2011023278
24510|a Hadoop :|b the definitive guide /|c Tom White.
250 |a 4th edition.
264 1|a Sebastopol, CA :|b O'Reilly Media,|c 2015.
300 |a 1 online resource (1 volume) :|b illustrations
336 |a text|b txt|2 rdacontent
337 |a computer|b c|2 rdamedia
338 |a online resource|b cr|2 rdacarrier
347 |a text file
500 |a "4th Edition Revised & Updated"--Cover.
500 |a Includes index.
5050 |a Cover -- Copyright -- Table of Contents -- Foreword -- Preface -- Administrative Notes -- What's New in the Fourth Edition? -- What's New in the Third Edition? -- What's New in the Second Edition? -- Conventions Used in This Book -- Using Code Examples -- Safari® Books Online -- How to Contact Us -- Acknowledgments -- Part I. Hadoop Fundamentals -- Chapter 1. Meet Hadoop -- Data! -- Data Storage and Analysis -- Querying All Your Data -- Beyond Batch -- Comparison with Other Systems -- Relational Database Management Systems -- Grid Computing -- Volunteer Computing -- A Brief History of Apache Hadoop -- What's in This Book? -- Chapter 2. MapReduce -- A Weather Dataset -- Data Format -- Analyzing the Data with Unix Tools -- Analyzing the Data with Hadoop -- Map and Reduce -- Java MapReduce -- Scaling Out -- Data Flow -- Combiner Functions -- Running a Distributed MapReduce Job -- Hadoop Streaming -- Ruby -- Python -- Chapter 3. The Hadoop Distributed Filesystem -- The Design of HDFS -- HDFS Concepts -- Blocks -- Namenodes and Datanodes -- Block Caching -- HDFS Federation -- HDFS High Availability -- The Command-Line Interface -- Basic Filesystem Operations -- Hadoop Filesystems -- Interfaces -- The Java Interface -- Reading Data from a Hadoop URL -- Reading Data Using the FileSystem API -- Writing Data -- Directories -- Querying the Filesystem -- Deleting Data -- Data Flow -- Anatomy of a File Read -- Anatomy of a File Write -- Coherency Model -- Parallel Copying with distcp -- Keeping an HDFS Cluster Balanced -- Chapter 4. YARN -- Anatomy of a YARN Application Run -- Resource Requests -- Application Lifespan -- Building YARN Applications -- YARN Compared to MapReduce 1 -- Scheduling in YARN -- Scheduler Options -- Capacity Scheduler Configuration -- Fair Scheduler Configuration -- Delay Scheduling -- Dominant Resource Fairness -- Further Reading.
5058 |a Chapter 5. Hadoop I/O -- Data Integrity -- Data Integrity in HDFS -- LocalFileSystem -- ChecksumFileSystem -- Compression -- Codecs -- Compression and Input Splits -- Using Compression in MapReduce -- Serialization -- The Writable Interface -- Writable Classes -- Implementing a Custom Writable -- Serialization Frameworks -- File-Based Data Structures -- SequenceFile -- MapFile -- Other File Formats and Column-Oriented Formats -- Part II. MapReduce -- Chapter 6. Developing a MapReduce Application -- The Configuration API -- Combining Resources -- Variable Expansion -- Setting Up the Development Environment -- Managing Configuration -- GenericOptionsParser, Tool, and ToolRunner -- Writing a Unit Test with MRUnit -- Mapper -- Reducer -- Running Locally on Test Data -- Running a Job in a Local Job Runner -- Testing the Driver -- Running on a Cluster -- Packaging a Job -- Launching a Job -- The MapReduce Web UI -- Retrieving the Results -- Debugging a Job -- Hadoop Logs -- Remote Debugging -- Tuning a Job -- Profiling Tasks -- MapReduce Workflows -- Decomposing a Problem into MapReduce Jobs -- JobControl -- Apache Oozie -- Chapter 7. How MapReduce Works -- Anatomy of a MapReduce Job Run -- Job Submission -- Job Initialization -- Task Assignment -- Task Execution -- Progress and Status Updates -- Job Completion -- Failures -- Task Failure -- Application Master Failure -- Node Manager Failure -- Resource Manager Failure -- Shuffle and Sort -- The Map Side -- The Reduce Side -- Configuration Tuning -- Task Execution -- The Task Execution Environment -- Speculative Execution -- Output Committers -- Chapter 8. MapReduce Types and Formats -- MapReduce Types -- The Default MapReduce Job -- Input Formats -- Input Splits and Records -- Text Input -- Binary Input -- Multiple Inputs -- Database Input (and Output) -- Output Formats -- Text Output -- Binary Output.
5058 |a Multiple Outputs -- Lazy Output -- Database Output -- Chapter 9. MapReduce Features -- Counters -- Built-in Counters -- User-Defined Java Counters -- User-Defined Streaming Counters -- Sorting -- Preparation -- Partial Sort -- Total Sort -- Secondary Sort -- Joins -- Map-Side Joins -- Reduce-Side Joins -- Side Data Distribution -- Using the Job Configuration -- Distributed Cache -- MapReduce Library Classes -- Part III. Hadoop Operations -- Chapter 10. Setting Up a Hadoop Cluster -- Cluster Specification -- Cluster Sizing -- Network Topology -- Cluster Setup and Installation -- Installing Java -- Creating Unix User Accounts -- Installing Hadoop -- Configuring SSH -- Configuring Hadoop -- Formatting the HDFS Filesystem -- Starting and Stopping the Daemons -- Creating User Directories -- Hadoop Configuration -- Configuration Management -- Environment Settings -- Important Hadoop Daemon Properties -- Hadoop Daemon Addresses and Ports -- Other Hadoop Properties -- Security -- Kerberos and Hadoop -- Delegation Tokens -- Other Security Enhancements -- Benchmarking a Hadoop Cluster -- Hadoop Benchmarks -- User Jobs -- Chapter 11. Administering Hadoop -- HDFS -- Persistent Data Structures -- Safe Mode -- Audit Logging -- Tools -- Monitoring -- Logging -- Metrics and JMX -- Maintenance -- Routine Administration Procedures -- Commissioning and Decommissioning Nodes -- Upgrades -- Part IV. Related Projects -- Chapter 12. Avro -- Avro Data Types and Schemas -- In-Memory Serialization and Deserialization -- The Specific API -- Avro Datafiles -- Interoperability -- Python API -- Avro Tools -- Schema Resolution -- Sort Order -- Avro MapReduce -- Sorting Using Avro MapReduce -- Avro in Other Languages -- Chapter 13. Parquet -- Data Model -- Nested Encoding -- Parquet File Format -- Parquet Configuration -- Writing and Reading Parquet Files.
5058 |a Avro, Protocol Buffers, and Thrift -- Parquet MapReduce -- Chapter 14. Flume -- Installing Flume -- An Example -- Transactions and Reliability -- Batching -- The HDFS Sink -- Partitioning and Interceptors -- File Formats -- Fan Out -- Delivery Guarantees -- Replicating and Multiplexing Selectors -- Distribution: Agent Tiers -- Delivery Guarantees -- Sink Groups -- Integrating Flume with Applications -- Component Catalog -- Further Reading -- Chapter 15. Sqoop -- Getting Sqoop -- Sqoop Connectors -- A Sample Import -- Text and Binary File Formats -- Generated Code -- Additional Serialization Systems -- Imports: A Deeper Look -- Controlling the Import -- Imports and Consistency -- Incremental Imports -- Direct-Mode Imports -- Working with Imported Data -- Imported Data and Hive -- Importing Large Objects -- Performing an Export -- Exports: A Deeper Look -- Exports and Transactionality -- Exports and SequenceFiles -- Further Reading -- Chapter 16. Pig -- Installing and Running Pig -- Execution Types -- Running Pig Programs -- Grunt -- Pig Latin Editors -- An Example -- Generating Examples -- Comparison with Databases -- Pig Latin -- Structure -- Statements -- Expressions -- Types -- Schemas -- Functions -- Macros -- User-Defined Functions -- A Filter UDF -- An Eval UDF -- A Load UDF -- Data Processing Operators -- Loading and Storing Data -- Filtering Data -- Grouping and Joining Data -- Sorting Data -- Combining and Splitting Data -- Pig in Practice -- Parallelism -- Anonymous Relations -- Parameter Substitution -- Further Reading -- Chapter 17. Hive -- Installing Hive -- The Hive Shell -- An Example -- Running Hive -- Configuring Hive -- Hive Services -- The Metastore -- Comparison with Traditional Databases -- Schema on Read Versus Schema on Write -- Updates, Transactions, and Indexes -- SQL-on-Hadoop Alternatives -- HiveQL -- Data Types.
5058 |a Operators and Functions -- Tables -- Managed Tables and External Tables -- Partitions and Buckets -- Storage Formats -- Importing Data -- Altering Tables -- Dropping Tables -- Querying Data -- Sorting and Aggregating -- MapReduce Scripts -- Joins -- Subqueries -- Views -- User-Defined Functions -- Writing a UDF -- Writing a UDAF -- Further Reading -- Chapter 18. Crunch -- An Example -- The Core Crunch API -- Primitive Operations -- Types -- Sources and Targets -- Functions -- Materialization -- Pipeline Execution -- Running a Pipeline -- Stopping a Pipeline -- Inspecting a Crunch Plan -- Iterative Algorithms -- Checkpointing a Pipeline -- Crunch Libraries -- Further Reading -- Chapter 19. Spark -- Installing Spark -- An Example -- Spark Applications, Jobs, Stages, and Tasks -- A Scala Standalone Application -- A Java Example -- A Python Example -- Resilient Distributed Datasets -- Creation -- Transformations and Actions -- Persistence -- Serialization -- Shared Variables -- Broadcast Variables -- Accumulators -- Anatomy of a Spark Job Run -- Job Submission -- DAG Construction -- Task Scheduling -- Task Execution -- Executors and Cluster Managers -- Spark on YARN -- Further Reading -- Chapter 20. HBase -- HBasics -- Backdrop -- Concepts -- Whirlwind Tour of the Data Model -- Implementation -- Installation -- Test Drive -- Clients -- Java -- MapReduce -- REST and Thrift -- Building an Online Query Application -- Schema Design -- Loading Data -- Online Queries -- HBase Versus RDBMS -- Successful Service -- HBase -- Praxis -- HDFS -- UI -- Metrics -- Counters -- Further Reading -- Chapter 21. ZooKeeper -- Installing and Running ZooKeeper -- An Example -- Group Membership in ZooKeeper -- Creating the Group -- Joining a Group -- Listing Members in a Group -- Deleting a Group -- The ZooKeeper Service -- Data Model -- Operations -- Implementation.
520 |a Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you'll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You'll learn about recent changes to Hadoop, and explore new case studies on Hadoop's role in healthcare systems and genomics data processing. Learn fundamental components such as MapReduce, HDFS, and YARN Explore MapReduce in depth, including steps for developing applications with it Set up and maintain a Hadoop cluster running HDFS and MapReduce on YARN Learn two data formats: Avro for data serialization and Parquet for nested data Use data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer) Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with Hadoop Learn the HBase distributed database and the ZooKeeper distributed configuration service.
542 |f Copyright © 2015 Tom White
542 |f Copyright © O'Reilly Media, Inc.
588 |a Description based on print version record.
590 |a O'Reilly|b O'Reilly Online Learning: Academic/Public Library Edition
63000|a Apache Hadoop.|0 http://id.loc.gov/authorities/names/n2013024279
63007|a Apache Hadoop|2 fast
650 0|a Electronic data processing|x Distributed processing.|0 http://id.loc.gov/authorities/subjects/sh85042293
650 0|a File organization (Computer science)|0 http://id.loc.gov/authorities/subjects/sh85048195
650 0|a Cloud computing.|0 http://id.loc.gov/authorities/subjects/sh2008004883
650 6|a Traitement réparti.
650 6|a Fichiers (Informatique)|x Organisation.
650 6|a Infonuagique.
650 7|a Cloud computing|2 fast
650 7|a Electronic data processing|x Distributed processing|2 fast
650 7|a File organization (Computer science)|2 fast
650 7|a Hadoop|2 gnd
758 |i has work:|a Hadoop (Text)|1 https://id.oclc.org/worldcat/entity/E39PCGkCfWr9dXxm9KMgTVDkj3|4 https://id.oclc.org/worldcat/ontology/hasWork
77608|i Print version:|a White, Tom (Tom E.).|t Hadoop.|b 4th edition|z 9781491901717|w (OCoLC)905696072
85640|u https://ezproxy.knoxlib.org/login?url=https://learning.oreilly.com/library/view/~/9781491901687/?ar
994 |a 92|b TKL