Hadoop : the definitive guide
(eBook)

Name: Hadoop : the definitive guide /
Availability: OnlineOnly
Author: White, Tom

Average Rating

Author

White, Tom

Published

Sebastopol, CA : O'Reilly Media, 2015.

Format

eBook

Edition

4th edition.

ISBN

1491901632, 9781491901632, 9781491901687, 1491901683

Physical Desc

1 online resource (1 volume) : illustrations

Status

Copies

OReilly (ebooks & videos)

Description

Loading Description...

Also in this Series

Checking series information...

More Like This

Loading more titles like this title...

Syndetics Unbound

Subjects

LC Subjects

Apache Hadoop.
Cloud computing.
Electronic data processing -- Distributed processing.
File organization (Computer science)

OCLC Fast Subjects

Apache Hadoop
Cloud computing
Electronic data processing -- Distributed processing
File organization (Computer science)

Other Subjects

Fichiers (Informatique) -- Organisation.
Hadoop
Infonuagique.
Traitement réparti.

More Details

Language

English

UPC

9781491901687

Notes

General Note

"4th Edition Revised & Updated"--Cover.

General Note

Includes index.

Description

Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you'll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You'll learn about recent changes to Hadoop, and explore new case studies on Hadoop's role in healthcare systems and genomics data processing. Learn fundamental components such as MapReduce, HDFS, and YARN Explore MapReduce in depth, including steps for developing applications with it Set up and maintain a Hadoop cluster running HDFS and MapReduce on YARN Learn two data formats: Avro for data serialization and Parquet for nested data Use data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer) Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with Hadoop Learn the HBase distributed database and the ZooKeeper distributed configuration service.

Local note

O'Reilly,O'Reilly Online Learning: Academic/Public Library Edition

Published Reviews

Reviews from GoodReads

Loading GoodReads Reviews.

Citations

APA Citation, 7th Edition (style guide)

White, T. (2015). Hadoop: the definitive guide (4th edition.). O'Reilly Media.

Chicago / Turabian - Author Date Citation, 17th Edition (style guide)

White, Tom. 2015. Hadoop: The Definitive Guide. O'Reilly Media.

Chicago / Turabian - Humanities (Notes and Bibliography) Citation, 17th Edition (style guide)

White, Tom. Hadoop: The Definitive Guide O'Reilly Media, 2015.

MLA Citation, 9th Edition (style guide)

White, Tom. Hadoop: The Definitive Guide 4th edition., O'Reilly Media, 2015.

Note! Citations contain only title, author, edition, publisher, and year published. Citations should be used as a guideline and should be double checked for accuracy. Citation formats are based on standards as of August 2021.

Staff View

Grouped Work ID

70037ac4-68d3-87d1-6eb7-09fa1d9e1e2b-eng

Go To Grouped Work

Grouping Information

Grouped Work ID	70037ac4-68d3-87d1-6eb7-09fa1d9e1e2b-eng
Full title	hadoop the definitive guide
Author	white tom
Grouping Category	book
Last Update	2024-04-16 12:23:35PM
Last Indexed	2024-04-20 03:33:39AM

Book Cover Information

Image Source	syndetics
First Loaded	Jun 17, 2022
Last Used	Apr 20, 2024

Marc Record

First Detected	Nov 09, 2022 03:46:10 PM
Last File Modification Time	Apr 16, 2024 12:32:57 PM

MARC Record

LEADER	13840cam a2200733 i 4500
001	ocn907477295
003	OCoLC
005	20240405112445.0
006	m o d
007	cr unu\|\|\|\|\|\|\|\|
008	150416s2015 caua o 001 0 eng d
010			\|a 2015473822
019			\|a 948565689\|a 1008958279\|a 1066536525\|a 1103272971\|a 1105789519\|a 1112559413\|a 1112991507\|a 1129366779\|a 1153017917\|a 1159656044\|a 1228510559
020			\|z 9781491901632
020			\|a 1491901632\|q (paperback)
020			\|a 9781491901632
020			\|a 9781491901687
020			\|a 1491901683
024	8		\|a 9781491901687
029	1		\|a DEBBG\|b BV042682532
029	1		\|a GBVCP\|b 835869334
035			\|a (OCoLC)907477295\|z (OCoLC)948565689\|z (OCoLC)1008958279\|z (OCoLC)1066536525\|z (OCoLC)1103272971\|z (OCoLC)1105789519\|z (OCoLC)1112559413\|z (OCoLC)1112991507\|z (OCoLC)1129366779\|z (OCoLC)1153017917\|z (OCoLC)1159656044\|z (OCoLC)1228510559
037			\|a CL0500000578\|b Safari Books Online
040			\|a UMI\|b eng\|e rda\|e pn\|c UMI\|d CUS\|d OCLCO\|d DEBBG\|d OCLCF\|d CEF\|d UAB\|d ERF\|d UHL\|d CNCEN\|d VT2\|d C6I\|d WYU\|d OCLCO\|d UKBTH\|d OCLCO\|d CZL\|d TOH\|d OCLCQ\|d OCLCO\|d OCLCQ\|d OCLCO\|d OCLCL\|d OCLCQ
049			\|a TKLA
050		4	\|a QA76.9.D5
082	0	4	\|a 005.74\|2 23
100	1		\|a White, Tom\|q (Tom E.),\|e author.\|1 https://id.oclc.org/worldcat/entity/E39PCjBywKbFqPpxtxkGBJXb8d\|0 http://id.loc.gov/authorities/names/nb2011023278
245	1	0	\|a Hadoop :\|b the definitive guide /\|c Tom White.
250			\|a 4th edition.
264		1	\|a Sebastopol, CA :\|b O'Reilly Media,\|c 2015.
300			\|a 1 online resource (1 volume) :\|b illustrations
336			\|a text\|b txt\|2 rdacontent
337			\|a computer\|b c\|2 rdamedia
338			\|a online resource\|b cr\|2 rdacarrier
347			\|a text file
500			\|a "4th Edition Revised & Updated"--Cover.
500			\|a Includes index.
505	0		\|a Cover -- Copyright -- Table of Contents -- Foreword -- Preface -- Administrative Notes -- What's New in the Fourth Edition? -- What's New in the Third Edition? -- What's New in the Second Edition? -- Conventions Used in This Book -- Using Code Examples -- Safari® Books Online -- How to Contact Us -- Acknowledgments -- Part I. Hadoop Fundamentals -- Chapter 1. Meet Hadoop -- Data! -- Data Storage and Analysis -- Querying All Your Data -- Beyond Batch -- Comparison with Other Systems -- Relational Database Management Systems -- Grid Computing -- Volunteer Computing -- A Brief History of Apache Hadoop -- What's in This Book? -- Chapter 2. MapReduce -- A Weather Dataset -- Data Format -- Analyzing the Data with Unix Tools -- Analyzing the Data with Hadoop -- Map and Reduce -- Java MapReduce -- Scaling Out -- Data Flow -- Combiner Functions -- Running a Distributed MapReduce Job -- Hadoop Streaming -- Ruby -- Python -- Chapter 3. The Hadoop Distributed Filesystem -- The Design of HDFS -- HDFS Concepts -- Blocks -- Namenodes and Datanodes -- Block Caching -- HDFS Federation -- HDFS High Availability -- The Command-Line Interface -- Basic Filesystem Operations -- Hadoop Filesystems -- Interfaces -- The Java Interface -- Reading Data from a Hadoop URL -- Reading Data Using the FileSystem API -- Writing Data -- Directories -- Querying the Filesystem -- Deleting Data -- Data Flow -- Anatomy of a File Read -- Anatomy of a File Write -- Coherency Model -- Parallel Copying with distcp -- Keeping an HDFS Cluster Balanced -- Chapter 4. YARN -- Anatomy of a YARN Application Run -- Resource Requests -- Application Lifespan -- Building YARN Applications -- YARN Compared to MapReduce 1 -- Scheduling in YARN -- Scheduler Options -- Capacity Scheduler Configuration -- Fair Scheduler Configuration -- Delay Scheduling -- Dominant Resource Fairness -- Further Reading.
505	8		\|a Chapter 5. Hadoop I/O -- Data Integrity -- Data Integrity in HDFS -- LocalFileSystem -- ChecksumFileSystem -- Compression -- Codecs -- Compression and Input Splits -- Using Compression in MapReduce -- Serialization -- The Writable Interface -- Writable Classes -- Implementing a Custom Writable -- Serialization Frameworks -- File-Based Data Structures -- SequenceFile -- MapFile -- Other File Formats and Column-Oriented Formats -- Part II. MapReduce -- Chapter 6. Developing a MapReduce Application -- The Configuration API -- Combining Resources -- Variable Expansion -- Setting Up the Development Environment -- Managing Configuration -- GenericOptionsParser, Tool, and ToolRunner -- Writing a Unit Test with MRUnit -- Mapper -- Reducer -- Running Locally on Test Data -- Running a Job in a Local Job Runner -- Testing the Driver -- Running on a Cluster -- Packaging a Job -- Launching a Job -- The MapReduce Web UI -- Retrieving the Results -- Debugging a Job -- Hadoop Logs -- Remote Debugging -- Tuning a Job -- Profiling Tasks -- MapReduce Workflows -- Decomposing a Problem into MapReduce Jobs -- JobControl -- Apache Oozie -- Chapter 7. How MapReduce Works -- Anatomy of a MapReduce Job Run -- Job Submission -- Job Initialization -- Task Assignment -- Task Execution -- Progress and Status Updates -- Job Completion -- Failures -- Task Failure -- Application Master Failure -- Node Manager Failure -- Resource Manager Failure -- Shuffle and Sort -- The Map Side -- The Reduce Side -- Configuration Tuning -- Task Execution -- The Task Execution Environment -- Speculative Execution -- Output Committers -- Chapter 8. MapReduce Types and Formats -- MapReduce Types -- The Default MapReduce Job -- Input Formats -- Input Splits and Records -- Text Input -- Binary Input -- Multiple Inputs -- Database Input (and Output) -- Output Formats -- Text Output -- Binary Output.
505	8		\|a Multiple Outputs -- Lazy Output -- Database Output -- Chapter 9. MapReduce Features -- Counters -- Built-in Counters -- User-Defined Java Counters -- User-Defined Streaming Counters -- Sorting -- Preparation -- Partial Sort -- Total Sort -- Secondary Sort -- Joins -- Map-Side Joins -- Reduce-Side Joins -- Side Data Distribution -- Using the Job Configuration -- Distributed Cache -- MapReduce Library Classes -- Part III. Hadoop Operations -- Chapter 10. Setting Up a Hadoop Cluster -- Cluster Specification -- Cluster Sizing -- Network Topology -- Cluster Setup and Installation -- Installing Java -- Creating Unix User Accounts -- Installing Hadoop -- Configuring SSH -- Configuring Hadoop -- Formatting the HDFS Filesystem -- Starting and Stopping the Daemons -- Creating User Directories -- Hadoop Configuration -- Configuration Management -- Environment Settings -- Important Hadoop Daemon Properties -- Hadoop Daemon Addresses and Ports -- Other Hadoop Properties -- Security -- Kerberos and Hadoop -- Delegation Tokens -- Other Security Enhancements -- Benchmarking a Hadoop Cluster -- Hadoop Benchmarks -- User Jobs -- Chapter 11. Administering Hadoop -- HDFS -- Persistent Data Structures -- Safe Mode -- Audit Logging -- Tools -- Monitoring -- Logging -- Metrics and JMX -- Maintenance -- Routine Administration Procedures -- Commissioning and Decommissioning Nodes -- Upgrades -- Part IV. Related Projects -- Chapter 12. Avro -- Avro Data Types and Schemas -- In-Memory Serialization and Deserialization -- The Specific API -- Avro Datafiles -- Interoperability -- Python API -- Avro Tools -- Schema Resolution -- Sort Order -- Avro MapReduce -- Sorting Using Avro MapReduce -- Avro in Other Languages -- Chapter 13. Parquet -- Data Model -- Nested Encoding -- Parquet File Format -- Parquet Configuration -- Writing and Reading Parquet Files.
505	8		\|a Avro, Protocol Buffers, and Thrift -- Parquet MapReduce -- Chapter 14. Flume -- Installing Flume -- An Example -- Transactions and Reliability -- Batching -- The HDFS Sink -- Partitioning and Interceptors -- File Formats -- Fan Out -- Delivery Guarantees -- Replicating and Multiplexing Selectors -- Distribution: Agent Tiers -- Delivery Guarantees -- Sink Groups -- Integrating Flume with Applications -- Component Catalog -- Further Reading -- Chapter 15. Sqoop -- Getting Sqoop -- Sqoop Connectors -- A Sample Import -- Text and Binary File Formats -- Generated Code -- Additional Serialization Systems -- Imports: A Deeper Look -- Controlling the Import -- Imports and Consistency -- Incremental Imports -- Direct-Mode Imports -- Working with Imported Data -- Imported Data and Hive -- Importing Large Objects -- Performing an Export -- Exports: A Deeper Look -- Exports and Transactionality -- Exports and SequenceFiles -- Further Reading -- Chapter 16. Pig -- Installing and Running Pig -- Execution Types -- Running Pig Programs -- Grunt -- Pig Latin Editors -- An Example -- Generating Examples -- Comparison with Databases -- Pig Latin -- Structure -- Statements -- Expressions -- Types -- Schemas -- Functions -- Macros -- User-Defined Functions -- A Filter UDF -- An Eval UDF -- A Load UDF -- Data Processing Operators -- Loading and Storing Data -- Filtering Data -- Grouping and Joining Data -- Sorting Data -- Combining and Splitting Data -- Pig in Practice -- Parallelism -- Anonymous Relations -- Parameter Substitution -- Further Reading -- Chapter 17. Hive -- Installing Hive -- The Hive Shell -- An Example -- Running Hive -- Configuring Hive -- Hive Services -- The Metastore -- Comparison with Traditional Databases -- Schema on Read Versus Schema on Write -- Updates, Transactions, and Indexes -- SQL-on-Hadoop Alternatives -- HiveQL -- Data Types.
505	8		\|a Operators and Functions -- Tables -- Managed Tables and External Tables -- Partitions and Buckets -- Storage Formats -- Importing Data -- Altering Tables -- Dropping Tables -- Querying Data -- Sorting and Aggregating -- MapReduce Scripts -- Joins -- Subqueries -- Views -- User-Defined Functions -- Writing a UDF -- Writing a UDAF -- Further Reading -- Chapter 18. Crunch -- An Example -- The Core Crunch API -- Primitive Operations -- Types -- Sources and Targets -- Functions -- Materialization -- Pipeline Execution -- Running a Pipeline -- Stopping a Pipeline -- Inspecting a Crunch Plan -- Iterative Algorithms -- Checkpointing a Pipeline -- Crunch Libraries -- Further Reading -- Chapter 19. Spark -- Installing Spark -- An Example -- Spark Applications, Jobs, Stages, and Tasks -- A Scala Standalone Application -- A Java Example -- A Python Example -- Resilient Distributed Datasets -- Creation -- Transformations and Actions -- Persistence -- Serialization -- Shared Variables -- Broadcast Variables -- Accumulators -- Anatomy of a Spark Job Run -- Job Submission -- DAG Construction -- Task Scheduling -- Task Execution -- Executors and Cluster Managers -- Spark on YARN -- Further Reading -- Chapter 20. HBase -- HBasics -- Backdrop -- Concepts -- Whirlwind Tour of the Data Model -- Implementation -- Installation -- Test Drive -- Clients -- Java -- MapReduce -- REST and Thrift -- Building an Online Query Application -- Schema Design -- Loading Data -- Online Queries -- HBase Versus RDBMS -- Successful Service -- HBase -- Praxis -- HDFS -- UI -- Metrics -- Counters -- Further Reading -- Chapter 21. ZooKeeper -- Installing and Running ZooKeeper -- An Example -- Group Membership in ZooKeeper -- Creating the Group -- Joining a Group -- Listing Members in a Group -- Deleting a Group -- The ZooKeeper Service -- Data Model -- Operations -- Implementation.
520			\|a Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you'll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You'll learn about recent changes to Hadoop, and explore new case studies on Hadoop's role in healthcare systems and genomics data processing. Learn fundamental components such as MapReduce, HDFS, and YARN Explore MapReduce in depth, including steps for developing applications with it Set up and maintain a Hadoop cluster running HDFS and MapReduce on YARN Learn two data formats: Avro for data serialization and Parquet for nested data Use data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer) Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with Hadoop Learn the HBase distributed database and the ZooKeeper distributed configuration service.
542			\|f Copyright © 2015 Tom White
542			\|f Copyright © O'Reilly Media, Inc.
588			\|a Description based on print version record.
590			\|a O'Reilly\|b O'Reilly Online Learning: Academic/Public Library Edition
630	0	0	\|a Apache Hadoop.\|0 http://id.loc.gov/authorities/names/n2013024279
630	0	7	\|a Apache Hadoop\|2 fast
650		0	\|a Electronic data processing\|x Distributed processing.\|0 http://id.loc.gov/authorities/subjects/sh85042293
650		0	\|a File organization (Computer science)\|0 http://id.loc.gov/authorities/subjects/sh85048195
650		0	\|a Cloud computing.\|0 http://id.loc.gov/authorities/subjects/sh2008004883
650		6	\|a Traitement réparti.
650		6	\|a Fichiers (Informatique)\|x Organisation.
650		6	\|a Infonuagique.
650		7	\|a Cloud computing\|2 fast
650		7	\|a Electronic data processing\|x Distributed processing\|2 fast
650		7	\|a File organization (Computer science)\|2 fast
650		7	\|a Hadoop\|2 gnd
758			\|i has work:\|a Hadoop (Text)\|1 https://id.oclc.org/worldcat/entity/E39PCGkCfWr9dXxm9KMgTVDkj3\|4 https://id.oclc.org/worldcat/ontology/hasWork
776	0	8	\|i Print version:\|a White, Tom (Tom E.).\|t Hadoop.\|b 4th edition\|z 9781491901717\|w (OCoLC)905696072
856	4	0	\|u https://ezproxy.knoxlib.org/login?url=https://learning.oreilly.com/library/view/~/9781491901687/?ar
994			\|a 92\|b TKL

Navigation

Hadoop : the definitive guide
(eBook)

Copies

Description

Also in this Series

More Like This

Syndetics Unbound

Table of Contents

Excerpt

Author Notes

Subjects

More Details

Notes

Similar Series From NoveList

Similar Titles From NoveList

Similar Authors From NoveList

Published Reviews

Reviews from GoodReads

Citations

Staff View

Grouping Information

Book Cover Information

Marc Record

MARC Record

Hadoop : the definitive guide(eBook)

Copies

Description

Also in this Series

More Like This

Syndetics Unbound

Table of Contents

Excerpt

Author Notes

Subjects

More Details

Notes

Similar Series From NoveList

Similar Titles From NoveList

Similar Authors From NoveList

Published Reviews

Reviews from GoodReads

Citations

Staff View

Grouping Information

Book Cover Information

Marc Record

MARC Record

Hadoop : the definitive guide
(eBook)