You are here

Back to top

Principles of Data Wrangling: Practical Techniques for Data Preparation (Paperback)

Principles of Data Wrangling: Practical Techniques for Data Preparation Cover Image
$44.19
Usually Ships in 1-5 Days

Description


A key task that any aspiring data-driven organization needs to learn is data wrangling, the process of converting raw data into something truly useful. This practical guide provides business analysts with an overview of various data wrangling techniques and tools, and puts the practice of data wrangling into context by asking, "What are you trying to do and why?"

Wrangling data consumes roughly 50-80% of an analyst's time before any kind of analysis is possible. Written by key executives at Trifacta, this book walks you through the wrangling process by exploring several factors--time, granularity, scope, and structure--that you need to consider as you begin to work with data. You'll learn a shared language and a comprehensive understanding of data wrangling, with an emphasis on recent agile analytic processes used by many of today's data-driven organizations.

Appreciate the importance--and the satisfaction--of wrangling data the right way.

  • Understand what kind of data is available
  • Choose which data to use and at what level of detail
  • Meaningfully combine multiple sources of data
  • Decide how to distill the results to a size and shape that can drive downstream analysis

About the Author


Tye Rattenbury is Trifacta's lead data scientist. He holds a Ph.D. in Computer Science from UC Berkeley. Prior to Trifacta, he was a Data Scientist at Facebook and the Director of Data Science Strategy at R/GA.Joe Hellerstein is Trifacta's Chief Strategy Officer and a Professor of Computer Science at Berkeley. His career in research and industry has focused on data-centric systems and the way they drive computing. In 2010, Fortune Magazine included him in their list of 50 smartest people in technology, and MIT Technology Review magazine included his Bloom language for cloud computing on their TR10 list of the 10 technologies "most likely to change our world".Jeffrey Heer is Trifacta's Chief Experience Officer and a Professor of Computer Science at the University of Washington, where he directs the Interactive Data Lab. Jeffrey's passion is the design of novel user interfaces for exploring, managing and communicating data. The data visualization tools developed by his lab (D3.js, Protovis, Prefuse) are used by thousands of data enthusiasts around the world. In 2009, Jeffrey was named in MIT Technology Review's list of "Top Innovators under 35".Sean Kandel is Trifacta's Chief Technical Officer. He completed his Ph.D. at Stanford University, where his research focused on user interfaces for database systems. At Stanford, Sean led development of new tools for data transformation and discovery, such as Data Wrangler. He previously worked as a data analyst at Citadel Investment Group.Connor Carreras is Trifacta's Manager for Customer Success, Americas, where she helps customers use cutting-edge data wrangling techniques in support of their big data initiatives. Connor brings her prior experience in the data integration space to help customers understand how to adopt self-service data preparation as part of an analytics process. She holds a B.A. from Princeton University.

Product Details
ISBN: 9781491938928
ISBN-10: 1491938927
Publisher: O'Reilly Media
Publication Date: August 8th, 2017
Pages: 92
Language: English