Providing key information on how to work with research data, Introduction to Data Technologies presents ideas and techniques for performing critical, behind-the-scenes tasks that take up so much time and effort yet typically receive little attention in formal education. With a focus on computational tools, the book shows readers how to improve their awareness of what tasks can be achieved and describes the correct approach to perform these tasks.
Practical examples demonstrate the most important points
The author first discusses how to write computer code using HTML as a concrete example. He then covers a variety of data storage topics, including different file formats, XML, and the structure and design issues of relational databases. After illustrating how to extract data from a relational database using SQL, the book presents tools and techniques for searching, sorting, tabulating, and manipulating data. It also introduces some very basic programming concepts as well as the R language for statistical computing. Each of these topics has supporting chapters that offer reference material on HTML, CSS, XML, DTD, SQL, R, and regular expressions.
One-stop shop of introductory computing information
Written by a member of the R Development Core Team, this resource shows readers how to apply data technologies to tasks within a research setting. Collecting material otherwise scattered across many books and the web, it explores how to publish information via the web, how to access information stored in different formats, and how to write small programs to automate simple, repetitive tasks.
Introduction
Case Study: Point Nemo
Writing Computer Code
Case Study: Point Nemo (continued)
Syntax
Semantics
Writing Code
Checking Code
Running Code
The DRY Principle
HTML Reference
HTML Syntax
HTML Semantics
CSS Reference
CSS Syntax
CSS Semantics
Linking CSS to HTML
CSS Tips
Data Storage
Case Study: YBC 7289
Plain Text Formats
Binary Formats
Spreadsheets
XML
Databases
XML Reference
XML Syntax
Document Type Definitions
Data Queries
Case Study: The Data Expo (continued)
Querying Databases
Querying XML
SQL Reference
SQL Syntax
SQL Queries
Other SQL Commands
Data Processing
Case Study: The Population Clock
The R Environment
The R Language
Data Types and Data Structures
Subsetting
More on Data Structures
Data Import/Export
Data Manipulation
Text Processing
Data Display
Programming
Other Software
R Reference
R Syntax
Data Types and Data Structures
Functions
Getting Help
Packages
Searching for Functions
Regular Expressions Reference
Literals
Metacharacters
Conclusion
Attributions
Bibliography
Index
Further Reading appears at the end of each chapter.
Biography
Paul Murrell is a Senior Lecturer in the Department of Statistics at the University of Auckland, New Zealand. Author of the bestselling R Graphics (2006), he is also part of the development team for the R and Omegahat statistical computing projects. Dr. Murrell’s research interests include computational and graphical statistics.
Paul Murrell, best known for his R Graphics book, has delivered a second masterpiece for people who have the difficult task to clean and prepare raw data for further use in common statistical software packages. … provides the perfect basis for a course on data literacy … Moreover, the book also is an excellent basis for advanced M.S. and Ph.D. students as well as practitioners in academia and industry who are confronted with the task to clean and preprocess their own or their colleagues’ data.
—Jürgen Symanzik, Technometrics, May 2011Introduction to Data Technologies introduces various computer-related topics, including markup languages, statistical computing languages, coding, storage, and querying, in a systematic manner. … the book may serve as an introduction to readers with general interest who plan to supplement their knowledge in specific computer-related topics, in addition to R programming.
—Journal of the American Statistical Association, Vol. 105, No. 492, December 2010This is a very gentle book. It enables students and statisticians, particularly those just entering the profession, to begin to familiarize themselves with important concepts and tools from the world of databases … it is encouraging that such topics are finding their way into statistics courses at all. … I found the style of the book very engaging … . It has the Paul Murrell light touch, first evident to me in his eminently readable and comprehensive book on R graphics. Like that one, the present book has interesting, occasionally slightly unusual examples and an easy and elegant writing style. The book does not hesitate to offer plain, direct advice in contexts in which other authors might simply let readers exercise their personal preferences. For students, particularly, I think this is a good thing. …
—Bill Venables, CSIRO, Australian & New Zealand Journal of Statistics, 2010