100% found this document useful (1 vote)

31 views

Text Processing with Ruby Extract Value from the Data That Surrounds You 1st Edition Rob Miller pdf download

The document provides information about various programming books available for download, focusing on text processing with Ruby, social media mining with R, and other data-related topics. It highlights the importance of text processing skills for programmers and introduces the book 'Text Processing with Ruby' by Rob Miller, which serves as a practical guide for working with text using Ruby. The content includes sections on acquiring, processing, and writing text, along with practical examples and techniques for effective text manipulation.

Uploaded by

savcimetzke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

31 views

Text Processing with Ruby Extract Value from the Data That Surrounds You 1st Edition Rob Miller pdf download

Uploaded by

savcimetzke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Text Processing with Ruby Extract Value from the

Data That Surrounds You 1st Edition Rob Miller

pdf download

https://ebookgate.com/product/text-processing-with-ruby-extract-
value-from-the-data-that-surrounds-you-1st-edition-rob-miller/

Get Instant Ebook Downloads – Browse at https://ebookgate.com

Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Value Maps Valuation Tools That Unlock Business Wealth 1st

Edition Warren D. Miller

https://ebookgate.com/product/value-maps-valuation-tools-that-unlock-
business-wealth-1st-edition-warren-d-miller/

ebookgate.com

Mastering Social Media Mining with R Extract valuable data

from your social media sites and make better business
decisions using R 1st Edition Ravindran
https://ebookgate.com/product/mastering-social-media-mining-with-r-
extract-valuable-data-from-your-social-media-sites-and-make-better-
business-decisions-using-r-1st-edition-ravindran/
ebookgate.com

Data and the City 1st Edition Rob Kitchin

https://ebookgate.com/product/data-and-the-city-1st-edition-rob-
kitchin/

ebookgate.com

Text Processing with JavaScript 1 (Version: P1.0) /

converted Edition Faraz Kelhini

https://ebookgate.com/product/text-processing-with-
javascript-1-version-p1-0-converted-edition-faraz-kelhini/

ebookgate.com
Probability and random processes with applications to
signal processing and communications 1st Edition Scott
Miller
https://ebookgate.com/product/probability-and-random-processes-with-
applications-to-signal-processing-and-communications-1st-edition-
scott-miller/
ebookgate.com

From Com to Profit Inventing Business Models That Deliver

Value and Profit Nick Earle

https://ebookgate.com/product/from-com-to-profit-inventing-business-
models-that-deliver-value-and-profit-nick-earle/

ebookgate.com

Value Leadership The 7 Principles that Drive Corporate

Value in Any Economy 1st Edition Peter S. Cohan

https://ebookgate.com/product/value-leadership-the-7-principles-that-
drive-corporate-value-in-any-economy-1st-edition-peter-s-cohan/

ebookgate.com

Corporate Boards that Create Value 1st Edition John Carver

https://ebookgate.com/product/corporate-boards-that-create-value-1st-
edition-john-carver/

ebookgate.com

Programming Cocoa with Ruby Create Compelling Mac Apps

Using RubyCocoa The Facets of Ruby Series 1st Edition
Brian Marick
https://ebookgate.com/product/programming-cocoa-with-ruby-create-
compelling-mac-apps-using-rubycocoa-the-facets-of-ruby-series-1st-
edition-brian-marick/
ebookgate.com
More books at 1Bookcase.com
More books at 1Bookcase.com
Early praise for Text Processing with Ruby

It is rare that a programming language can be unequivocally stated to be the right

tool for a job. But when it comes to scanning, extracting, and transforming text,
Ruby is that tool, and Rob Miller is the right guide to instruct you in the most ef-
fective and efficient application of it.
➤ Avdi Grimm
Author, Confident Ruby; Head Chef, RubyTapas.com

This is a fun, readable, and very useful book. I’d recommend it to anyone who
needs to deal with text—which is probably everyone.
➤ Paul Battley
Developer, maintainer of text gem

While Ruby has become established as a Web development language, thanks to

Rails, it’s an excellent language for working with text as well. Text Processing with
Ruby covers the nuts and bolts of what I believe is a natural domain for Ruby, all
the way from bringing text into the environment via files, the Web, and other
means through to parsing what it says and sending it back out again.
➤ Peter Cooper
Editor of Ruby Weekly, Cooper Press

I’d recommend this book to anyone who wants to get started with text processing.
Ruby has powerful tools and libraries for the whole ETL workflow, and this book
describes everything you need to get started and succeed in learning.
➤ Hajba Gábor László
Developer

A lot of people get into Ruby via Rails. This book is really well suited to anyone
who knows Rails, but wants to know more Ruby.
➤ Drew Neil
Director, Studio Nelstrom, and author of Practical Vim

More books at 1Bookcase.com

We've left this page blank to
make the page numbers the
same in the electronic and
paper books.

We tried just leaving it out,

but then people wrote us to
ask about the missing pages.

Anyway, Eddy the Gerbil

wanted to say “hello.”

More books at 1Bookcase.com

Text Processing with Ruby
Extract Value from the Data That Surrounds You

Rob Miller

The Pragmatic Bookshelf

Dallas, Texas • Raleigh, North Carolina

More books at 1Bookcase.com

Many of the designations used by manufacturers and sellers to distinguish their products
are claimed as trademarks. Where those designations appear in this book, and The Pragmatic
Programmers, LLC was aware of a trademark claim, the designations have been printed in
initial capital letters or in all capitals. The Pragmatic Starter Kit, The Pragmatic Programmer,
Pragmatic Programming, Pragmatic Bookshelf, PragProg and the linking g device are trade-
marks of The Pragmatic Programmers, LLC.
Every precaution was taken in the preparation of this book. However, the publisher assumes
no responsibility for errors or omissions, or for damages that may result from the use of
information (including program listings) contained herein.
Our Pragmatic courses, workshops, and other products can help you and your team create
better software and have more fun. For more information, as well as the latest Pragmatic
titles, please visit us at https://pragprog.com.

The team that produced this book includes:

Jacquelyn Carter (editor)
Potomac Indexing, LLC (index)
Cathleen Small; Liz Welch (copyedit)
Dave Thomas (layout)
Janet Furlow (producer)
Ellie Callahan (support)

For international rights, please contact rights@pragprog.com.

Copyright © 2015 The Pragmatic Programmers, LLC.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted,

in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise,
without the prior consent of the publisher.

Printed in the United States of America.

ISBN-13: 978-1-68050-070-7
Encoded using the finest acid-free high-entropy binary digits.
Book version: P1.0—September 2015

More books at 1Bookcase.com

Contents

Acknowledgments . . . . . . . . . . . ix
Introduction . . . . . . . . . . . . . xi

Part I — Extract: Acquiring Text

1. Reading from Files . . . . . . . . . . . 3
Opening a File 3
Reading from a File 4
Treating Files as Streams 7
Reading Fixed-Width Files 12
Wrapping Up 18

2. Processing Standard Input . . . . . . . . . 19

Redirecting Input from Other Processes 19
Example: Extracting URLs 22
Concurrency and Buffering 25
Wrapping Up 27

3. Shell One-Liners . . . . . . . . . . . 29
Arguments to the Ruby Interpreter 30
Prepending and Appending Code 35
Example: Parsing Log Files 37
Wrapping Up 39

4. Flexible Filters with ARGF . . . . . . . . . 41

Reading from ARGF as a Stream 42
Modifying Files 45
Manipulating ARGV 47
Wrapping Up 49

More books at 1Bookcase.com

Contents • vi

5. Delimited Data . . . . . . . . . . . . 51
Parsing a TSV 52
Delimited Data and the Command Line 56
The CSV Format 58
Wrapping Up 62

6. Scraping HTML . . . . . . . . . . . . 63
The Right Tool for the Job: Nokogiri 63
Searching the Document 64
Working with Elements 72
Exploring a Page 77
Example: Reading a League Table 80
Wrapping Up 88

7. Encodings . . . . . . . . . . . . . 89
A Brief Introduction to Character Encodings 90
Ruby’s Support for Character Encodings 92
Detecting Encodings 98
Wrapping Up 99

Part II — Transform: Modifying and

Manipulating Text
8. Regular Expressions Basics . . . . . . . . 103
A Gentle Introduction 104
Pattern Syntax 105
Regular Expressions in Ruby 108
Wrapping Up 114

9. Extraction and Substitution with Regular Expressions . . 115

Matching Against Patterns 115
Global Match Variables 117
Extracting Multiple Matches 119
Transforming Text 122
Wrapping Up 126

10. Writing Parsers . . . . . . . . . . . . 127

Simple Parsers with StringScanner 128
Example: Parsing a Config File 132
Rule-Based Parsers 135

More books at 1Bookcase.com

Contents • vii

Example: Parsing RTF Files 143

Wrapping Up 153

11. Natural Language Processing . . . . . . . . 155

What Is Natural Language Processing? 155
Example: Extracting Keywords from Articles 156
Example: Fuzzy Searching 161
Wrapping Up 169

Part III — Load: Writing Text

12. Standard Output and Standard Error . . . . . . 173
Simple Output 173
Formatting Output with printf 178
Redirecting Standard Output 182
Wrapping Up 187

13. Writing to Other Processes and to Files . . . . . . 189

Writing to Other Processes 189
Writing to Files 193
Temporary Files 195
Wrapping Up 198

14. Serialization and Structure: JSON, XML, CSV . . . . 199

JSON 200
XML 205
CSV 207
Wrapping Up 211

15. Templating Output with ERB . . . . . . . . 213

Writing Templates 214
Example: Generating a Purchase Ledger 217
Evaluating Templates 218
Passing Data to Templates 221
Controlling Presentation with Decorators 224
Wrapping Up 226

More books at 1Bookcase.com

Contents • viii

Part IV — Appendices
A1. A Shell Primer . . . . . . . . . . . . 229
Running Commands 229
Controlling Output 230
Exit Statuses and Flow Control 232

A2. Useful Shell Commands . . . . . . . . . 235

Index . . . . . . . . . . . . . . 245

More books at 1Bookcase.com

Acknowledgments
Thanks to my long-suffering partner, Gillian, for enduring a year of lost
weekends, late nights, and generally having a sullen and distracted boyfriend
who woke up in the middle of the night in a cold sweat, having had another
nightmare about character encodings. Who knew writing a book could be so
stressful?

Many thanks to Alessandro Bahgat, Paul Battley, Jacob Chae, Peter Cooper,
Iris Faraway, Kevin Gisi, Derek Graham, James Edward Gray II, Avdi Grimm,
Hajba Gábor László, Jeremy Hinegardner, Kerri Miller, and Drew Neil for their
helpful technical review comments, questions, and suggestions—all of which
shaped this book for the better.

Thanks to Rob Griffiths, Mark Rogerson, Samuel Ryzycki, David Webb, Lewis
Wilkinson, Alex Windett, and Mike Wright for ensuring there was no chance
I got too big for my football boots.

Finally, the amazing folks at Pragmatic. Thanks to Susannah Davidson

Pfalzer for taking a chance on me and my daft idea. Thanks to Jackie Carter
for her incredible patience in guiding a first-time author through the editing
process, and for contributing much to the structure and readability of the
book. And thanks to Andy and Dave for creating a truly brilliant publisher
that I’m proud to be even a tiny a part of.

More books at 1Bookcase.com report erratum • discuss

Introduction
Text is everywhere. Newspaper articles, database dumps, spreadsheets, the
output of shell commands, keyboard input; it’s all text, and it can all be pro-
cessed in the same fundamental way. Text has been called “the universal
interface,” and since the early days of Unix in the 1960s this universal inter-
face has survived and flourished—and with good reason.

Unlike binary formats, text has the pleasing quality of being readable by
humans as well as computers, making it easy to debug and requiring no
distinction between output that’s for human consumption and output that’s
to be used as the input for another step in a process.

Processing text, then, is a valuable skill for any programmer today—just as

it was fifty years ago, and just as it’s likely to be fifty years hence. In this book
I hope to provide a practical guide to all the major aspects of working with
text, viewed through the lens of the Ruby programming language—a language
that I think is ideally suited to this task.

About This Book

Processing text is generally concerned with three things. The first concern is
acquiring the text to be processed and getting it into your program. This is
the subject of Part I of this book, which deals with reading from plain text
files, standard input, delimited files, and binary files such as PDFs and Word
documents.

This first part is fundamentally an exploration of Ruby’s core and standard

library, and what’s possible with IO and its derived classes like File. Ruby’s
history and design, and the high-level nature of these tasks, mean that we
don’t need to dip into third-party libraries much, but we’ll use one in partic-
ular—Nokogiri—when looking at scraping data from web pages.

The second concern is with actually processing the text once we’ve got it into
the program. This usually means either extracting data from within the text,
parsing it into a Ruby data structure, or transforming it into another format.

More books at 1Bookcase.com report erratum • discuss

Introduction • xii

The most important subject in this second stage is, without a doubt, regular
expressions. We’ll look at regular expression syntax, how Ruby uses regular
expressions in particular, and, importantly, when not to use them and instead
reach for solutions such as parsers.

We’ll also look at the subject of natural language processing in this part of
the book, and how we can use tools from computational linguistics to make
our programs smarter and to process data that we otherwise couldn’t.

The final step is outputting the transformed text or the extracted data some-
where—to a file, to a network service, or just to the screen. Part of this process
is concerned with the actual writing process, and part of it is concerned with
the form of the written data. We’ll look at both of these aspects in the third
part of the book.

Together, these three steps are often described as “extract, transform, and
load” (ETL). It’s a term especially popular with the “big data” folks. Many text
processing tasks, even ones that seem on the surface to be very different from
one another, fall into this pattern of three steps, so I’ve tried to mirror that
structure in the book.

In general, we’re going to explore why Ruby is an excellent tool to reach for
when working with text. I also hope to persuade you that you might reach
for Ruby sooner than you think—not necessarily just for more complex tasks,
but also for quick one-liners.

Most of all, I hope this book offers you some useful techniques that help you
in your day-to-day programming tasks. Where possible, I’ve erred toward the
practical rather than the theoretical: if it does anything, I’d like this book to
point you in the direction of practical solutions to real-world problems. If your
day job is anything like mine, you probably find yourself trawling through
text files, CSVs, and command-line output more often than you might like.
Helping to make that process quick and—dare I say it?—fun would be fantas-
tic.

Who This Book Is For

Throughout the book, I try not to assume an advanced understanding of
Ruby. If you’re familiar with Ruby’s syntax—perhaps after having dabbled
with Rails a little—then that should be enough to get by. Likewise, if Ruby is
your first programming language and you’re looking to learn about data pro-
cessing, you should be able to pick things up as you go along—though natu-
rally this book is about teaching text processing more than it is about teaching
Ruby.

More books at 1Bookcase.com report erratum • discuss

About This Book • xiii

While the book starts with material likely to be familiar to anyone who’s
written a command-line application in Ruby, there’s still something here for
the more advanced user. Even people who’ve worked with Ruby a lot aren’t
necessarily aware of the material covered in Chapter 3, Shell One-Liners, on
page 29, for example, and I see far too many developers reaching for regular
expressions to parse HTML rather than using the techniques outlined in
Chapter 6, Scraping HTML, on page 63.

Even experienced developers might not have written parsers before (covered
in Chapter 10, Writing Parsers, on page 127), or dabbled in natural language
processing (as we do in Chapter 11, Natural Language Processing, on page
155)—so hopefully those subjects will be interesting regardless of your level of
experience.

How to Read This Book

Although the book follows a structure of extractions first, transformations
second, and loading third, the chapters are relatively self-contained and can
be read in any order you wish—so feel free to dive into a later chapter if you’re
particularly interested in the material it covers.

I’ve tried to include in each of the chapters material of interest even to more
advanced Rubyists, so there aren’t any chapters that are obvious candidates
to skip if you’re at that end of the skill spectrum.

If you’re not familiar with how to use the command line, there’s a beginner’s
tutorial in Appendix 1, A Shell Primer, on page 229, and a guide to various
commands in Appendix 2, Useful Shell Commands, on page 235. These
appendixes will give you more than enough command-line knowledge to follow
all of the examples in the book.

About the Code

All of the code samples in the book can be downloaded from the book’s web-
site.1 They’ve been tested in Ruby 2.2 running on OS X, Linux, and Cygwin
on Microsoft Windows, but they should run just fine on any version of Ruby
after 2.0 (released in February 2013).

The book assumes that you’re running in a Unix-like environment. Users of

Mac OS X, Linux, and BSD will be right at home. Microsoft Windows users,
though, will only be able to get the most out of some sections of the book by

1. https://pragprog.com/book/rmtpruby/text-processing-with-ruby

More books at 1Bookcase.com report erratum • discuss

Introduction • xiv

installing Cygwin.2 Cygwin provides a Unix-like environment on Windows,

including a full command-line environment and Unix shell. This gives Windows
users access to the core text processing utilities referenced in this book. This
is particularly true of the chapters on shell one-liners, writing flexible filters
with ARGF, and writing to other processes.

Online Resources
The page for this book on the Pragmatic Bookshelf website3 contains a discus-
sion forum, where you can post any comments or questions you might have
about the book and make suggestions for any changes or expansions you’d
like to see in future editions. If you discover any errors in the book, you can
submit them there, too.

Rob Miller
August 2015

2. https://www.cygwin.com/
3. https://pragprog.com/book/rmtpruby/text-processing-with-ruby

More books at 1Bookcase.com report erratum • discuss

Part I

Extract: Acquiring Text

The first part of our text processing journey is concerned with getting text into our
program. This text might reside in files, might be entered by the user, or might come
from other processes; wherever it comes from, we’ll learn how to read it.

We’ll also look at taking structure from the text that we read, learning how to parse
CSV files and even scrape information from web pages.

More books at 1Bookcase.com

CHAPTER 1

Reading from Files

Our first concern when processing text is to get the text into our program,
and perhaps the most common place to source text is from the humble file.
Whether it’s log files from a server, exports from database, or text you’ve
written yourself, there’s lots of information that lives on the filesystem.
Learning to read from files effectively opens up a world of text to process.

Throughout the course of this chapter, we’ll look at how we can use Ruby to
reach text that resides in files. We’ll look at the basics you might expect, with
some methods to straightforwardly read files in one go. We’ll then look at a
technique that will allow us to read even the biggest files in a memory-efficient
way, by treating files as streams, and look at how this can give us random
access into even the largest files. Let’s take a look.

Opening a File
Before we can do something with a file, we need to open it. This signals our
intent to read from or write to the file, allowing Ruby to do the low-level that
make that intention actually happen on the filesystem. Once it’s done those
things, Ruby gives us a File object that we can use to manipulate the file.

Once we have this File object, we can do all sorts of things with it: read from
the file, write to it, inspect its permissions, find out its path on the filesystem,
check when it was last modified, and much more.

To open a file in Ruby, we use the open method of the File class, telling it the
path to the file we’d like to open. We pass a block to the open method, in which
we can do whatever we like with the file. Here’s an example:
File.open("file.txt") do |file|
# ...
end

More books at 1Bookcase.com report erratum • discuss

Chapter 1. Reading from Files •4

Because we passed a block to open, Ruby will automatically close the file for
us after the block finishes, freeing us from doing that cleanup work ourselves.
The argument that open passes to our block, which in this example I’ve called
file, is a File object that points to the file we’ve requested access to (in this case,
file.txt). Unless we tell Ruby otherwise, it will open files in read-only mode, so
we can’t write to them accidentally—a safe default.

Kernel#open
In the real world, it’s common to see people using the global open method rather than
explicitly using File.open:

open("file.txt") do |file|
# ...
end

As well as being shorter, which is always nice, this convenient method is actually a
wrapper for a number of different types of IO objects, not just files. You can use it to
open URLs, other processes, and more. We’ll cover some more uses of open later; for
now, use either File.open or regular open as you prefer.

There’s nothing in our block yet, so this code isn’t very useful; it doesn’t
actually do anything with the file once it’s opened. Let’s take a look at how
we can read content from the file.

Reading from a File

Once we’ve opened a file, the next step is to read its contents. We’ll start with
the simplest way to do this—reading the whole file into a string, allowing us
to perform many kinds of processing with the text contained in the file. We’ll
then look at how we can break the file’s content up into lines and loop through
them, a task that’s frequently necessary when processing log files, when
processing text written by people, and in many other situations.

Reading a Whole File at Once

The easiest way to access the contents of a file in Ruby is to read the entire
file in one go. It’s not always the right solution, especially when working with
bigger files, but it makes sense in many cases.

We can achieve this by using the read method on our File object:
File.open("file.txt") do |file|
contents = file.read
end

More books at 1Bookcase.com report erratum • discuss

Reading from a File •5

The read method returns for us a string containing the file’s contents, no
matter how large they might be.

Alternatively, if all we’re doing is reading the file and we have no further use
for the File object once we’ve done so, Ruby offers us a shortcut. There’s a read
method on the File class itself, and if we pass it the name of a file, then it will
open the file, read it, and close it for us, returning the contents:
contents = File.read("file.txt")

Whichever method we use, the result is that we have the entire contents of
the file stored in a string. This is useful if we want to blindly pass those con-
tents over to something else for processing—to a Markdown parser, for
example, or to insert it into a database, or to parse it as JSON. These are all
very common things to want to do, so read is a widely used method.

For example, if our file contained some JSON data, we could parse it using
Ruby’s built-in JSON library:
require "json"

json = File.read("file.json")
data = JSON.parse(json)

Often, though, we want to do something with the contents ourselves. The

most common task we’re likely to face is to split the file into lines and do
something with each line. Let’s look at a simple way to achieve this.

Line-by-line Processing
Lots of plain-text formats—log files, for instance—use the lines of a file as a
way of structuring the content within them. In files like this, each line repre-
sents a distinct item or record. It’s about the simplest way to separate data,
but this kind of structure is more than enough for many use cases, so it’s
something you’ll run into frequently when processing text.

One example of this sort of log file that you might have encountered before
is from the popular web server Apache. For each request made to it, Apache
will log some information: things like the IP address the request came from,
the date and time that the request was made, the URL that was requested,
and so on. The end result looks like this:
127.0.0.1 - [10/Oct/2014:13:55:36] "GET / HTTP/1.1" 200 561
127.0.0.1 - [10/Oct/2014:13:55:36] "GET /images/logo.png HTTP/1.1" 200 23260
192.168.0.42 - [10/Oct/2014:14:10:21] "GET / HTTP/1.1" 200 561
192.168.0.91 - [10/Oct/2014:14:20:51] "GET /person.jpg HTTP/1.1" 200 46780
192.168.0.42 - [10/Oct/2014:14:20:54] "GET /about.html HTTP/1.1" 200 483

More books at 1Bookcase.com report erratum • discuss

Chapter 1. Reading from Files •6

Let’s imagine we wanted to process this log file so that we could see all the
requests made by a certain IP address. Because each line in the file represents
one request, we need some way to loop over the lines in the file and check
whether each one matches our conditions—that is, whether the IP address
at the start of the line is the one we’re interested in.

One way to do this would be to use the readlines method on our File object. This
method reads the file in its entirety, breaking the content up into individual
lines and returning an array:
File.open("access_log") do |log_file|
requests = log_file.readlines
end

At this point, we’ve got an array—requests—that contains every line in the file.
The next step is to loop over those lines and only output the ones that match
our conditions:
File.open("access_log") do |log_file|
requests = log_file.readlines

requests.each do |request|
if request.start_with?("127.0.0.1 ")
puts request
end
end
end

Using each, we loop over each request. We then ask the request if it starts
with 127.0.0.1, and if the response is true, we output it. Lines that don’t start
with 127.0.0.1 will simply be ignored.

While this solution works, it has a problem. Because it reads the whole file
at once, it consumes an amount of memory at least equal to the size of the
file. This will hold up okay for small files, but as our log file grows, so will the
memory consumed by our script.

If you think about it, though, we don’t actually need to have the whole file in
memory to solve our problem. We’re only ever dealing with one line of the file
at any given moment, so we only really need to have that particular line in
memory. For some problems it’s necessary to read the whole file at once, but
this isn’t one of them. Let’s look at how can we rework this example so that
we only read one line at a time.

More books at 1Bookcase.com report erratum • discuss

Treating Files as Streams •7

Treating Files as Streams

We’ve seen that reading the entire contents of a file isn’t always the best
solution. For a start, it forces us to keep the entire contents of the file in
memory. This might merely be wasteful with smaller files, but it can turn out
to be plain impossible with much larger ones. Imagine wanting to process a
50GB file on a computer that has only 4GB of memory; it would be impossible
for us to read the entire file at once.

The solution is to treat the file as a stream. Instead of reading from the
beginning of the file to the end in one go, and keeping all of that information
in memory, we read only a small amount at a time. We might read the first
line, for example, then discard it and move onto the second, then discard that
and move onto the third, and so on until we reach the end of the file. Or we
might instead read it character by character, or word by word. The important
thing is that at no point do we have the full file in memory: we only ever store
the little bit that we’re processing.

This enables us to work with enormous files—gigabytes in size, if neces-

sary—without consuming anywhere near that much memory. By varying
exactly what that “bit by bit” is, we can also step through the file in a way
that reflects its structure. If we know that the file has many lines, each of
which represents a record, then we can read one line at a time. If we know
that the file is one enormous line, but that fields are separated by commas,
we can read up to the next comma each time, processing the text one field at
a time.

Streaming Files Line by Line

Let’s revisit our web server log example, where we outputted only those
requests that came from a certain IP address, and see how we can adapt it
to use streaming. Luckily, the solution is straightforward—in fact, it’s actually
easier than the method that reads the whole file into memory.

The File object yielded to our block has a method called each_line. This method
accepts a block and will step through the file one line at a time, executing
that block once for each line.
File.open("access_log") do |log_file|
log_file.each_line do |request|
if request.start_with?("127.0.0.1 ")
puts request
end
end
end

More books at 1Bookcase.com report erratum • discuss

Chapter 1. Reading from Files •8

That’s it. The each_line method allows us to step through each line in the file
without ever having more than one line of the file in memory at a time. This
method will consume the same amount of memory no matter how large the
file is, unlike our first solution.

Just like with File.read, Ruby offers us a shortcut that doesn’t require us to
open the file ourselves: File.foreach. Using it trims the previous example down
a little:
File.foreach("access_log") do |request|
if request.start_with?("127.0.0.1 ")
puts request
end
end

On my machine, working on a 5,000,000-line, 315MB file, the stream method

uses 18MB of memory, while the non-streaming version uses 706MB—an
increase of almost forty times. As the size of the file you’re dealing with
increases, the streaming method won’t use any more memory, whereas the
readlines method will. So if you’re dealing with files that are more than a few
kilobytes in size, and if the processing that you’re doing doesn’t require you
to have the whole file in memory at once, each_line will result in a noticeably
more efficient solution.

Enumerable Streams
The each_line method of the File class is aliased to each. This might not seem
particularly remarkable, but it’s actually tremendously useful. This is because
Ruby has a module called Enumerable that defines methods like map, find_all,
count, reduce, and many more. The purpose of Enumerable is to make it easy to
search within, add to, delete from, iterate over, and otherwise manipulate
collections. (You’ve probably used methods like these when working with
arrays, for example.)

Well, a file is a collection too. By default, Ruby considers the elements of that
collection to be the lines within the file, so because File includes the Enumerable
module, we can use all of its methods on those lines. This can make many
processing operations simple and expressive, and because many of Enumerable’s
methods don’t require us to consume the whole file—they’re lazy, in other
words—we often retain the performance benefits of streaming, too.

To explore what this means, we can revisit our log example. Let’s imagine you
wanted to group all of the requests made by each IP address, and within that
group them by the URL requested. In other words, you want to end up with
a data structure that looks something like this:

More books at 1Bookcase.com report erratum • discuss

Treating Files as Streams •9

{
"127.0.0.1" => [
"/",
"/images/logo.png"
],
"192.168.0.42" => [
"/",
"/about.html"
],
"192.168.0.91" => [
"/person.jpg"
]
}

Here’s a script that uses the methods offered by Enumerable to achieve this:
requests-by-ip.rb
requests =
File.open("data/access_log") do |file|
file
.map { |line| { ip: line.split[0], url: line.split[5] } }
.group_by { |request| request[:ip] }
.each { |ip, requests| requests.map! { |r| r[:url] } }
end

We open the file just like we did previously. But instead of using each_line to
iterate over the lines of the file, we use map. This loops over the lines of the
file, building up an array as it does so by taking the return value of the block
we pass to it. Here our block is using split to separate the lines on whitespace.
The first of these whitespace-separated fields contains the IP, and the sixth
contains the URL that was requested, so the block returns a hash. The result
of our map operation is therefore an array of hashes that contain only the
information about the request that we’re interested in—the IP address and
the URL.

Next, we use the group_by method. This transforms our array of hashes into a
single hash. It does so by checking the return value of the block that we pass
to it; all the elements of the array that return the same value will be grouped
together. In this case, our block returns the IP part of the request, which
means that all of the requests made by the same IP address will be grouped
together.

The data structure after the group_by operation looks something like this:
{
"127.0.0.1" => [
{:ip=>"127.0.0.1", :url=>"/"},
{:ip=>"127.0.0.1", :url=>"/images/logo.png"}
],

More books at 1Bookcase.com report erratum • discuss

Chapter 1. Reading from Files • 10

"192.168.0.42" => [
{:ip=>"192.168.0.42", :url=>"/"},
{:ip=>"192.168.0.42", :url=>"/about.html"}
],
"192.168.0.91" => [
{:ip=>"192.168.0.91", :url=>"/person.jpg"}
]
}

This is almost what we were after. The problem is that we have both the IP
address and the URL of the request, rather than just the URL. So the next
step in our chain uses each to loop over these IP address and request combi-
nations. It then uses map! to replace the array of hashes with just the URL
portion, leaving us with an array of strings.

The final result is exactly what we wanted:

{
"127.0.0.1" => [
"/",
"/images/logo.png"
],
"192.168.0.42" => [
"/",
"/about.html"
],
"192.168.0.91" => [
"/person.jpg"
]
}

This transformation is relatively complex, but it was easily achieved with

Ruby’s enumerable methods. Each step in the chain of methods performs
one particular transformation on the data, getting us closer and closer to the
final structure that we’re after. If you can break your problem down into small
steps like these, then you’ll find that you can get Ruby to do much of the
work for you. When processing text, you’ll find yourself writing collection
pipelines like this fairly frequently, so it’s definitely worth getting acquainted
with Enumerable and the functionality that it offers you.

Other Streaming Methods

Although it’s most common to want to stream files line by line, it’s not your
only option. As well as each_line, Ruby also offers a general streaming method
in the form of each.

More books at 1Bookcase.com report erratum • discuss

Treating Files as Streams • 11

By default, each behaves exactly like each_line, looping over the lines in the file.
But it also accepts an argument allowing you to change the character on
which it will split, from a newline to anything else you might like.

Let’s imagine we had a file with only a single line in it, but that contained
many different records separated by commas:
this is field one,this is field two,this is field three

To process this file as a stream, we could pass a comma character to each,

thereby telling it to give us a new record each time it encountered a comma:
File.open("commas.txt") do |file|
file.each(",") do |record|
puts record
# >> "this is field one,"
# "this is field two,"
# "this is field three"
end
end

Instead of giving us a new record whenever it encountered a new line, as

each_line did and as is the default behavior of each, we now get a new record
each time Ruby sees a comma. This allows us to process this type of file with
all the benefits of streaming.

Another commonly used streaming method is each_char, which will yield to us

each character in the file. So if we wanted to see how many b characters were
in a file, we could use each_char:
n = 0
File.open("file.txt") do |file|
file.each_char do |char|
n += 1 if char == "b"
end
end

puts "#{n} b characters in file.txt"

Again, this method has all of the benefits of other streaming examples; we
only ever have a single character in memory at one time, so we can process
even the largest of files.

Like many enumerating methods in Ruby, if we don’t pass a block to methods

like each_char and each, they’ll return for us an Enumerator. That means we can
also use these other streaming methods with all of the collection-related
methods Enumerable offers us simply by calling them on the Enumerator that’s
returned for us.

More books at 1Bookcase.com report erratum • discuss

Chapter 1. Reading from Files • 12

For example, we could rewrite the previous example, where we quite verbosely
initialized our n variable and incremented it manually, by using Enumerable’s
count method:

character-count.rb
n =
File.open("file.txt") do |file|
file.each_char.count { |char| char == "b" }
end

puts "#{n} b characters in file.txt"

The count method accepts a block and will return the number of values for
which the block returned true. This is exactly what our previous code was
doing, but this way is a little shorter and a little neater, and reveals our
intentions more clearly.

IO: Ruby’s Input/Output Class

All of the methods we’ve seen so far—read, each_line, each_char, and so on—aren’t actually
defined by the File class itself. They’re defined by the class that File inherits from: IO.

This might seem like an academic distinction, but it has an important benefit: it
means that other types of IO in Ruby have those same methods, too. Files, streams,
sockets, Unix pipelines—all of these things are fundamentally similar, and it’s in IO
that these similarities are gathered into one abstraction. In the words of Ruby’s own
documentation, IO is “the basis for all input and output in Ruby.” By learning to read
from files, then, you’ll learn both principles and concrete methods that will translate
to all the other places from which you might want to acquire text.

If you know how to write output to the screen, then—using puts—you already know
how to write to a file: by calling puts on the file object. Our screen and a file are both
IO objects—of two different kinds—so the way we interact with them is the same.
This similarity will be very useful throughout our text processing journey.

Reading Fixed-Width Files

Another way of processing files without consuming them whole is to consume
a fixed number of bytes. It’s much like the streaming examples that we’ve
seen, where we read from the file until we encountered a newline or until we
encountered a comma. But instead of reading until we hit a particular char-
acter, we read, say, ten bytes from the current position, receiving a string
containing those ten bytes.

This might seem an inflexible and impractical way of doing things. After all,
how can we know at how many bytes from the start of the file we’ll find the

More books at 1Bookcase.com report erratum • discuss

Reading Fixed-Width Files • 13

information we want? But this sort of processing has several real-world

applications and has an important performance characteristic that gives it
many of the benefits of a “proper” database system, so it’s worth exploring.
Let’s see how we can use it.

The Data File

In this section, we’ll be playing with some scientific data. The National
Oceanic and Atmospheric Administration (NOAA) releases data on the surface
temperature of four regions in the Pacific Ocean, as measured every week
since 1990, offering for download a text file that looks like this:1
03JAN1990 23.4-0.4 25.1-0.3 26.6 0.0 28.6 0.3
10JAN1990 23.4-0.8 25.2-0.3 26.6 0.1 28.6 0.3
17JAN1990 24.2-0.3 25.3-0.3 26.5-0.1 28.6 0.3

…and so on for many hundreds more rows.

Let’s imagine we wanted to dig into this data. We might want to find out what
the warmest week was, or plot the results on a graph, or just show what the
temperature of a particular region was last week. To do any of these things,
we need to parse the data and get it into our script.

First, a quick explanation of the data. The first column contains the date of
the week in which the measurements were taken. The other four columns
represent different regions of the ocean. For each of them we have two num-
bers: the first representing the recorded temperature, and the second repre-
senting the departure from the expected temperature that this recording
represents (the “sea surface temperature anomaly”). In the first row, then,
the first region recorded a temperature in the week of 3 January 1990 of 23.4
degrees, which is an anomaly of -0.4 degrees.

The pleasing visual quality that this data has—the fact that all the columns
in the table line up neatly—will help us in this task. If we were to count the
characters across each line, we’d see that each field started at exactly the
same place in each row. The first column, containing the date of the week in
question, is always twelve characters long. The next number is nine characters
long, always, and the following number is always four characters, regardless
of whether it has a negative sign. This nine/four pattern repeats three more
times for the other three regions.

In trying to get this data into our script, let’s look at how to read the first row
of data.

1. The data is available on the NOAA website: http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for.

More books at 1Bookcase.com report erratum • discuss

Chapter 1. Reading from Files • 14

Reading a Fixed Number of Bytes

We know that in the file we’re looking at, each column is a fixed size. That
means to read each one, we just need to read a different number of bytes,
and to do this we need to revisit a method that we’ve already looked at: read.

Previously, we used read in its basic form, without any arguments, which read
the entire file into memory. But if we pass an integer as the first argument,
read will read only that many bytes forward from the current position in the
file.

So, from the start of the file, we could read in each field in the first row as
follows:
noaa-first-row-simple.rb
File.open("data/wksst8110.for") do |file|
puts file.read(10)
4.times do
puts file.read(9)
puts file.read(4)
end
end
# >> 03JAN1990
# >> 23.4
# >> -0.4
# >> 25.1
# >> -0.3
# >> 26.6
# >> 0.0
# >> 28.6
# >> 0.3

We first read ten bytes, to get the name of the week. Then we read nine bytes
followed by four bytes to extract the numbers, doing this four times so that
we extract all four regions.

From here, it’s not much work to have our script continue through the rest
of the file, slurping up all of the data within and converting it into a Ruby
data structure—in this case, a hash:
noaa-all-rows.rb
File.open("data/wksst8110.for") do |file|
weeks = []

until file.eof?
week = {
date: file.read(10).strip,
temps: {}
}

More books at 1Bookcase.com report erratum • discuss

Reading Fixed-Width Files • 15

[:nino12, :nino3, :nino34, :nino4].each do |region|

week[:temps][region] = {
temp: file.read(9).to_f,
change: file.read(4).to_f
}
end

file.read(1)

weeks << week

end

weeks
# => [{:date=>"03JAN1990",
# :temps=>
# {:nino12=>{:temp=>23.4, :change=>-0.4},
# :nino3=>{:temp=>25.1, :change=>-0.3},
# :nino34=>{:temp=>26.6, :change=>0.0},
# :nino4=>{:temp=>28.6, :change=>0.3}}},
# {:date=>"10JAN1990",
# :temps=>
# {:nino12=>{:temp=>23.4, :change=>-0.8},
# :nino3=>{:temp=>25.2, :change=>-0.3},
# :nino34=>{:temp=>26.6, :change=>0.1},
# :nino4=>{:temp=>28.6, :change=>0.3}}},
# {:date=>"17JAN1990",
# :temps=>
# {:nino12=>{:temp=>24.2, :change=>-0.3},
# :nino3=>{:temp=>25.3, :change=>-0.3},
# :nino34=>{:temp=>26.5, :change=>-0.1},
# :nino4=>{:temp=>28.6, :change=>0.3}}},
# ...snip...
end

The logic is fundamentally the same as when reading the first row. To loop
over all the rows in the file, there are two main changes: first, we loop until we
hit the end of the file by checking file.eof?; it will return true when the end of
the file is reached and therefore end our loop. The other addition is the call
to file.read(1) at the end of the row; this will consume the newline character at
the end of each line. We’re also using strip to strip the whitespace from the
week name, and to_f to convert the temperature numbers to floats.

This method works and is fast. But by only using read to consume a fixed
numbers of bytes, we haven’t seen the most important advantage of treating
the file in this way: the fact that it offers us random access to the records
within the file.

More books at 1Bookcase.com report erratum • discuss

Chapter 1. Reading from Files • 16

Seeking Through the File

Until now we’ve looked at advancing through the file as a stream, starting at
the beginning and moving through the whole file. But just as we can use read
to advance through and consume a portion of the file, we can also move to a
specific location without consuming anything. This is a fast way to skip data
that we’re not interested in.

To continue with our temperature data, let’s imagine we wanted to be able to

access a particular week. Not necessarily the first one and not all of them at
once, but instead just a single row from within the records.

Well, because each of the columns within the data has a fixed width, that
means that all of the rows do, too. Adding up the columns, including the
newline at the end, gives us 10 + 4 * (9 + 4) + 1 = 63 characters, so we know that
each of our records is 63 bytes long.

If we used seek to skip 63 bytes into the file, then our first call to read would
begin reading from the second record:
noaa-skip-first-row.rb
File.open("data/wksst8110.for") do |file|
file.seek(63)
file.read(10)
# => " 10JAN1990"
end

As we can see, our first call to read returns for us the date of the second week
in the file, not the first. Using this method, we can now skip to arbitrary
records—the first, the tenth, the thousandth, whatever we like—and read
their data.

The most important part of this is that seeking happens in constant time.
That means that it takes the same amount of time no matter how large the
file is and no matter how far into the file we want to seek. We’ve finally
uncovered the amazing benefit to fixed-width files like this—that we gain the
ability to access records within them at random, so it’s no slower to find the
303rd record than it is to find the third—or even the 300,003rd.

In the final version of our script, then, we can write a get_week method that
will retrieve a record for us given an index for that record (1 for the first, 2 for
the second, and so on):

More books at 1Bookcase.com report erratum • discuss

Reading Fixed-Width Files • 17

noaa-seek.rb
def get_week(file, week)
file.seek((week - 1) * 63)

week = {
date: file.read(10).strip,
temps: {}
}

[:nino12, :nino3, :nino34, :nino4].each do |region|

week[:temps][region] = {
temp: file.read(9).to_f,
change: file.read(4).to_f
}
end

week
end

File.open("data/wksst8110.for") do |file|
get_week(file, 3)
# => {:date=>"17JAN1990",
# :temps=>
# {:nino12=>{:temp=>24.2, :change=>-0.3},
# :nino3=>{:temp=>25.3, :change=>-0.3},
# :nino34=>{:temp=>26.5, :change=>-0.1},
# :nino4=>{:temp=>28.6, :change=>0.3}}}
get_week(file, 303)
# => {:date=>"18OCT1995",
# :temps=>
# {:nino12=>{:temp=>20.0, :change=>-0.8},
# :nino3=>{:temp=>24.1, :change=>-0.9},
# :nino34=>{:temp=>25.8, :change=>-0.9},
# :nino4=>{:temp=>28.2, :change=>-0.5}}}
get_week(file, 1303)
# => {:date=>"17DEC2014",
# :temps=>
# {:nino12=>{:temp=>22.9, :change=>0.1},
# :nino3=>{:temp=>26.0, :change=>0.8},
# :nino34=>{:temp=>27.4, :change=>0.8},
# :nino4=>{:temp=>29.4, :change=>1.0}}}
end

Here we use the get_week method to fetch the third, 303rd, and 1,303rd records.
With this method we can treat the data within the file almost as though it
was a data structure within our script—like an array—even though we haven’t
had to read any of it in. This allows us to randomly access data within even
the largest of files in a very fast and efficient way.

More books at 1Bookcase.com report erratum • discuss

Chapter 1. Reading from Files • 18

One important caveat is that read and seek operate at the level of bytes, not
characters. You’ll learn more about the difference between the two in Chapter
7, Encodings, on page 89, but it’s worth noting that if you’re using a multibyte
character encoding, like UTF-8, then using seek carelessly might leave you in
the middle of a multibyte character and might mean that you get some gib-
berish when you try to read data.

You should therefore use these methods only when you know that you’re
dealing solely with single-byte characters or when you know that the location
you’re seeking to will never be in the middle of a character—as in our temper-
ature data example, where we’re seeking to the boundaries between records.

Despite this limitation of seek, hopefully you can see the benefit of using a
fixed-width file like this. We can retrieve any value, no matter how big the file
is, without reading any unnecessary data; we have what’s called random
access to the data within. To retrieve the tenth record, we just need to seek
567 bytes from the start of the file; to retrieve the 703rd, we just need to seek
44,226 bytes from the start; and so on. The wonderful thing is that no matter
how large our file gets, this operation will always take the exact same amount
of time—even if we’ve got hundreds of megabytes of data. That’s why it’s
sometimes worth putting up with the limitations of such a format: it’s both
very simple and very fast.

Wrapping Up
That’s about it for reading files. We looked at how to open a file and what we
can do with the resulting File object. We covered reading files in one go and
processing them like streams, and why you’d prefer one or the other. We
explored how we can use the methods offered by Enumerable to transform and
manipulate the content of files. We looked at line-by-line processing and
reading arbitrary numbers of bytes, and how we can seek to arbitrary locations
in the file to replicate some of the functionality of a database.

With these techniques, we’ve gained an impressive arsenal for reading text
from files large and small. Next, we’ll take our newfound knowledge of streams
and apply it to another source of text: standard input.

More books at 1Bookcase.com report erratum • discuss

CHAPTER 2

Processing Standard Input

We’ve looked at how we can use Ruby to read data that exists on the filesys-
tem. Another common source of information, though, is direct input to a
script. This might be text that users input using their keyboard, or it might
be text that’s redirected to your script from another process. From the per-
spective of Ruby, these two different types of input are actually the same
thing, processed in the same way.

This source of input is called standard input, and it’s one of the foundations
of text processing. Along with its output equivalents standard output and
standard error, it enables different programs to communicate with one
another in a way that doesn’t rely on complex protocols or unintelligible
binary data, but instead on straightforward, human-readable text.

Learning how to process standard input will allow you to write flexible and
powerful utilities for processing text, primarily by enabling you to write pro-
grams that form part of text processing pipelines. These chains of programs,
linked together so that each one’s output flows into the input of the next, are
incredibly powerful. Mastering them will allow you to make the most both of
your own utilities and of those that already exist, giving you the most text
processing ability for the least amount of typing possible.

Let’s take a look at how we can write scripts that process text from standard
input, slotting into pipeline chains and giving us flexibility, reusability, and
power.

Redirecting Input from Other Processes

Standard input can be used to read text from the keyboard. You might have
used it that way when learning Ruby, prompting users for input and storing
the text they typed:

More books at 1Bookcase.com report erratum • discuss

Chapter 2. Processing Standard Input • 20

print "What's your name? "

name = $stdin.gets.chomp
puts "Hi, #{name}!"

Here we ask standard input—$stdin—for a line of input using the gets method,
using chomp to remove the trailing newline. This gives us a string, which we
store in name.

This simplistic use of standard input isn’t particularly useful, let’s face it.
But it’s actually only half of the story. Standard input isn’t just used to read
from the keyboard interactively; it can also read from input that’s been redi-
rected—or piped—to your script from another process.

The ultimate goal here is to be able to use your scripts in pipeline chains.
These are chains of programs strung together so that the output from the
first is fed into the input of the second, the output of the second becomes the
input of the third, and so on. Here’s an example:
$ ps ax | grep ruby | cut -d' ' -f1

Here we use three separate commands, each of which performs a different

individual task, and combine them to perform quite a complex operation. In
this case, that operation is to fetch all of the processes running on the system,
search for ones that contain ruby in their name, and then display the first
whitespace-separated column of the output (which contains the process ID).
That’s actually quite a feat, and it was achieved without writing a script; we
can do all of that processing just by typing a command into our shell.

That example used preexisting commands to do its work. But we can write
our own programs that slot into such workflows. Imagine that we frequently
wanted to convert sections of text to uppercase. We know how to convert text
to uppercase in Ruby, so we could write a script that works like this:
$ echo "hello world" | ruby to-uppercase.rb
HELLO WORLD

…and that also works like this:

$ hostname | ruby to-uppercase.rb
ROB.LOCAL

In other words, we could write a program that converts any text it receives
on standard input to uppercase, then outputs that converted text. It won’t
know where the text is coming from (for example, the echo command we saw
previously versus the hostname command)—it accepts anything you pass to it.
This gives you great flexibility in how you use the script, opening up ways of
using it that you might not have foreseen when writing it.

More books at 1Bookcase.com report erratum • discuss

Redirecting Input from Other Processes • 21

This flexibility is what makes such scripts useful. Your goal, or at least a
pleasant side effect of processing text in this way, is to build up a library of
such scripts so that, if you encounter the same problem again, you can just
slot the script you wrote last time into the new pipeline chain and be on your
way. The to-uppercase.rb script is a good example of this: you might need to write
it from scratch the first time you encounter the problem of converting input
to uppercase, but after that it can be used again and again in completely
different situations.

Actually writing our to-uppercase.rb script is pretty straightforward. In fact, we

don’t need to know anything more than we learned when prompting users
for their name. That’s because Ruby doesn’t distinguish between standard
input that comes from the keyboard and standard input that’s redirected;
you can just read blindly from standard input. Likewise, you don’t have to
care whether your output is being written to the screen or being piped to
another process; you can just write blindly to standard output. The shell will
take care of redirections for you.

To write our to-uppercase.rb example, we need to read from standard input,

convert that text to uppercase, and then output it to standard output. In
Ruby, that’s one line:
puts $stdin.read.upcase

Saving this script as to-uppercase.rb, we’ve got everything we need. We can run
it like this:
$ echo "hello world" | ruby to-uppercase.rb
HELLO WORLD
$ hostname | ruby to-uppercase.rb
ROB.LOCAL
$ whoami | ruby to-uppercase.rb
ROB

We now have a script that reads from standard input, modifies what it receives,
and outputs it to standard output. It’s general purpose. It doesn’t know or
care where its input comes from, but it processes the input happily regardless.

Countless examples of this type of tool already exist, distributed with Unix-
like operating systems: grep, for example, which outputs only lines that match
a given pattern, or sort, which outputs an alphabetically sorted version of its
input. The scripts you write yourself will be right at home with these standard
Unix utilities as part of your text processing pipelines.

More books at 1Bookcase.com report erratum • discuss

Chapter 2. Processing Standard Input • 22

Example: Extracting URLs

In the previous example, we used read to read all the standard input in one
go. But as we saw with files, reading everything into memory in one gulp often
isn’t the best idea, especially when our input begins to grow in size.

It was also annoying in the previous example that we had to type ruby to-
uppercase.rb. Other commands are short and snappy—cut, grep—but we had to
type what feels like a lot of superfluous information.

For our next example, we’re going to write a script that extracts URLs from
the input passed to it, outputting any that it finds and ignoring the rest of
the input. So, if we passed it the following text:
Alice's website is at http://www.example.com
While Jane's website is at https://example.net and contains a useful blog.

we’d expect to have these URLs extracted from it:

http://www.example.com
https://example.net

This script will be called urls, and once we’ve written it we’ll be able to use it
in any pipeline we like. Because it will treat its input as a stream, we’ll be
able to use it on whatever input we like, no matter how large it is. So we’ll be
able to extract the URLs from a text file:
$ cat file.txt | urls

Or extract the URLs from within a web page:

$ curl http://example.com | urls

Let’s take a look at what we need to do to write the urls script.

The Shebang
Up until now we’ve only run our Ruby scripts by telling the Ruby interpreter
the name of the file to execute. But when we’re using ordinary Unix commands,
such as grep or uniq, we just specify them as commands in their own right.
Ideally, we want to be able to do the same with our URL extractor. It would
be annoying if we had to type ruby urls.rb or something similar each time we
wanted to use it, especially if we’re going to be using it a lot.

But if we just called our script urls, how would our shell know that it was a
Ruby script and know to pass its contents to Ruby to execute? The answer
is, because we tell it to, and we tell it using a special line at the top of our
script called the shebang. In this case, we’d use:

More books at 1Bookcase.com report erratum • discuss

Example: Extracting URLs • 23

#!/usr/bin/env ruby

The special part is the #!—it’s this that gives the line its name (“hash” +
“bang”). Since the Ruby interpreter might be in different places on different
people’s computers, we use a command called env to tell the shell to use ruby,
wherever ruby might be.

The presence of this shebang allows us to save our script as a file called urls
and run it directly, rather than as ruby urls. The final step in this process is to
allow the file to be executed. We can do this with the chmod command:
$ chmod +x urls

That’s it. We can now call ./urls from within the directory our urls file resides
in, and it will execute our script as Ruby code.

If we wanted to be able to call our version from anywhere, not just from the
directory in which it’s saved, we could put it into a directory that’s within our
PATH—/usr/local/bin, for example. Many people create a directory under their
home directory—typically called bin—and put that into their path, so that they
have a place to keep all of their own scripts without losing them among the
ones that came with their system or that they’ve installed from elsewhere.

Putting the script in a directory that’s in your PATH will make it feel just like
any other text processing command and make it really easy to use wherever
you are. If you think you’ll use a particular script regularly, then don’t hesitate
to put it there. The only thing you need to do is to make sure the name of the
script doesn’t clash with an existing command that you still want to be able
to use—otherwise, you’ll run your script when you type the command, rather
than whatever command originally had that name. So don’t call it ls or mv!

Looping Over Lines

When writing scripts like these, we’ll often want to loop over the input we’re
passed one line at a time. That way, we don’t need to read the whole input
into memory at once, which would be less scaleable and would also slow down
the other parts of our pipeline chain.

Just like the File objects we saw in the previous chapter, $stdin has an each_line
method that allows us to iterate over the lines in our input:
$stdin.each_line do |line|
# ...
end

Wherever possible, we should try to treat standard input as a stream. If our

script is used to process a large amount of data, this stream processing will

More books at 1Bookcase.com report erratum • discuss

Chapter 2. Processing Standard Input • 24

mean that we can pass our output along to the next stage in the process as
and when we process it. If our script is the last stage in the pipeline, that
means the user sees output more quickly; and if we’re earlier in the pipeline,
then it means the next part of the pipeline can be doing its processing while
we’re working on our next chunk.

The Logic
Unlike our to-uppercase.rb example, we’re not actually interested in printing the
line of output, even in a modified form. Instead we want to extract any URLs
we find in it and then output those. To do that, we’ll use a regular expression.
We’ll be covering these in depth in Chapter 8, Regular Expressions Basics,
on page 103, so don’t worry too much about them now:
urls
#!/usr/bin/env ruby

$stdin.each_line do |line|
urls = line.scan(%r{https?://\S+})
urls.each do |url|
puts url
end
end

Here we use String’s scan method to extract everything that looks like a URL.
Then, we loop over them—after all, there might be multiple URLs in a single
line—and output each one of them.

Running the Script

If we invoke our script as follows, assuming that we’ve put it in our PATH, we’ll
see the output we’re expecting:
$ printf "hello\nworld http://www.example.com/\nhttps://example.net/" | urls
http://www.example.com/
https://example.net/

We now have the general-purpose URL-matcher that we were after, and it

took only a few lines of Ruby code! That’s great. We can now find all the URLs
in a file:
$ cat some-file.txt | urls
http://www.example.org.uk
https://example.co.uk
<literal:elide>snip</literal:elide>

More books at 1Bookcase.com report erratum • discuss

Concurrency and Buffering • 25

Or all the URLs in a log file from our web server:

$ cat /var/log/webserver/access | urls
https://example.com/about-us
http://www.example.com/contact
https://example.com/about-us
<literal:elide>snip</literal:elide>

Of course, we’re not limited to having our script be the final stage in the
pipeline. We could use it as an intermediary step—for example, to fetch a web
page, extract the URLs from it, and then download each of those URLs:
$ curl http://example.com | urls | xargs wget

Hopefully you can imagine many scenarios where having such a script and
other tools like it would come in handy. Before long, if you’re anything like
me, you’ll have built up quite the collection of them, each in true Unix fashion
built to do one thing—but to do it well.

Concurrency and Buffering

When thinking about pipeline chains, you could be forgiven for thinking that
they’re executed in sequence; that is, that the first command generates all of
its output, then the second command takes that input and outputs whatever
it needs to, and then after that the third process does its bit, and so on.

In reality, though, that’s not the case. All of the programs in the pipeline chain
run simultaneously, and data flows between them bit by bit—just like water
through a real pipe. While the second process is working with the first chunk
of information, the first process is generating another chunk; by the time the
first chunk is through to the third or fourth process in the pipeline, the first
process may be onto the third, tenth, or hundredth chunk.

The amazing thing about this concurrency is that the processes themselves
need know nothing about it. It’s all taken care of by the operating system and
the shell, leaving the individual process to worry only about fetching input
and producing output.

We can prove this concurrency by typing the following into our command
line:
$ sleep 5 | echo "hello, world"
hello, world

If the tasks were executed in series, we’d see nothing for five seconds, and
only then would hello, world appear on our screen. But instead, because the
echo command starts at the same time as sleep, we see the output immediately.

More books at 1Bookcase.com report erratum • discuss

Chapter 2. Processing Standard Input • 26

When we request more data from standard input—when calling $stdin.gets, for
example—Ruby will do one of two things. If it has input available in its buffer,
it will pass it on immediately. If it doesn’t, though, it will block, waiting until
the process before it in the pipeline has generated enough output.

What constitutes “enough output” is up to the operating system, but the

upshot is that input will be passed to your script in chunks; on a Linux sys-
tem, for example, those chunks will be 65,536 bytes in size. If the process
before you generates 65,535 bytes and then waits ten seconds before gener-
ating some more output, then your process will wait ten seconds before
receiving any input at all.

This can be frustrating when the input you’re receiving is in many small
chunks, especially if those small chunks are slow to generate. One example
is the find command, which searches the filesystem for files matching given
conditions. It might generate hundreds of filenames per second, or it might
generate one per minute, depending on how many files you’re searching
through and how many of them match your conditions.

If we pipe the result of a find into this script, it will be a long time before the
script actually receives any input, and because this buffering happens at the
output stage, not the input stage, there’s nothing we can do about it. Our
supposedly concurrent pipeline sometimes doesn’t behave concurrently at
all.

While we have no control over the behavior of other programs, if we’re writing
programs ourselves that generate slow output like find does, then we can
remove this buffering by telling our standard output stream to behave syn-
chronously. To illustrate the change, here’s a script that uses the default
behavior and therefore has its output buffered:
stdout-async.rb
100.times do
"hello world".each_char do |c|
print c
sleep 0.1
end
print "\n"
end

If we run this script and pipe the output into cat:

$ ruby stdout-async.rb | cat

then we’ll see the problem: nothing happens for a very, very long time. Because
we’re outputting a character only every 0.1 seconds, it would take us 410

More books at 1Bookcase.com report erratum • discuss

Wrapping Up • 27

seconds to fill up a 4,096-byte buffer and a staggering two hours to fill up a

65,536-byte buffer, so we see nothing until the program ends.

The synchronous version of the script avoids this problem:

stdout-sync.rb
$stdout.sync = true

100.times do
"hello world".each_char do |c|
print c
sleep 0.1
end
print "\n"
end

Here we set $stdout.sync to true, telling our standard output stream not to buffer
but instead to flush constantly. If we pipe the input from this script into cat,
we’ll see a character appear every 0.1 second. Although the script will take
the same amount of time in total to execute, the next program in the pipeline
will have the chance to work with the output immediately, potentially speeding
up the overall time the pipeline takes.

Wrapping Up
We looked now at how to use standard input to obtain input from users’
keyboards, how to redirect the output of other programs into our own, and
how powerful text processing pipelines can be. We saw the value of small
tools that perform a single task and how they can be composed together in
different ways to perform complex text processing tasks. We learned how to
write scripts that can be directly executed and that can process standard
input as a stream and so can work with large quantities of input.

We used standard input so far from the perspective of scripts—simple ones,

at times, but scripts nevertheless. Sometimes, though, we don’t want to have
to go to the trouble of writing a full-fledged script to process some data.
Wouldn’t it be nice to have the flexibility of Ruby in throwaway one-liners
that we write in our shell? The next chapter looks at how we can do just that.

More books at 1Bookcase.com report erratum • discuss

CHAPTER 3

Shell One-Liners
We’ve looked at processing text in Ruby scripts, but there exists a stage of
text processing in which writing full-blown scripts isn’t the correct approach.
It might be because the problem you’re trying to solve is temporary, where
you don’t want the solution hanging around. It might be that the problem is
particularly lightweight or simple, unworthy of being committed to a file. Or
it might be that you’re in the early stages of formulating a solution and are
just trying to explore things for now.

In such cases, it would be advantageous to be able to process text from the

command line, without having to go to the trouble of committing your thoughts
to a file. This would allow you to quickly throw together text processing
pipelines and scratch whatever particular itch that you have—either solving
the problem directly or forming the foundation of a future, more solid solution.

Such processing pipelines will inevitably make use of standard Unix utilities,
such as cat, grep, cut, and so on. In fact, those utilities might actually be suffi-
cient—tasks like these are, after all, what they’re designed for. But it’s common
to encounter problems that get just a little too complex for them, or that for
some reason aren’t well suited to the way they work. At times like these, it
would nice if we could introduce Ruby into this workflow, allowing us to
perform the more complex parts of the processing in a language that’s familiar
to us.

It turns out that Ruby comes with a whole host of features that make it a
cinch to integrate it into such workflows. First, we need to discover how we
can use it to execute code from the command line. Then we can explore dif-
ferent ways to process input within pipelines and some tricks for avoiding
lengthy boilerplate—something that’s very important when we’re writing
scripts as many times as we run them!

More books at 1Bookcase.com report erratum • discuss

Chapter 3. Shell One-Liners • 30

Arguments to the Ruby Interpreter

You probably learned on your first day of programming Ruby that you can
invoke Ruby from the command line by passing it the filename of a script to
run:
$ ruby foo.rb

This will execute the code found in foo.rb, but otherwise it won’t do anything
too special. If you’ve ever written Ruby on the command line, you’ll definitely
have started Ruby in this way.

What you might not know is that by passing options to the ruby command,
you can alter the behavior of the interpreter. There are three key options that
will make life much easier when writing one-liners in the shell. The first is
essential, freeing you from having to store code in files; the second and third
allow you to skip a lot of boilerplate code when working with input. Let’s take
a look at each them in turn.

Passing Code with the -e Switch

By default, the Ruby interpreter assumes that you’ll pass it a file that contains
code. This file can contain references to other files (require and load statements,
for example), but Ruby expects us to pass it a single file in which execution
will begin.

When it comes to using Ruby in the shell, this is hugely limiting. We don’t
want to have to store code in files; we want to be able to compose it on the
command line as we go.

By using the -e flag when invoking Ruby, we can execute code that we pass
in directly on the command line—removing the need to commit our script to
a file on disk. (It might be helpful to remember -e as standing for evaluate,
because Ruby is evaluating the code we pass contained within this option.)
The universal “hello world” example, then, would be as follows:
$ ruby -e 'puts "Hello world"'
Hello world

Any code that we could write in a script file can be passed on the command
line in this way. We could, though it wouldn’t be much fun, define classes
and methods, require libraries, and generally write a full-blown script, but
in all likelihood we’ll limit our code to relatively short snippets that just do a
couple of things. Indeed, this desire to keep things short will lead to making

More books at 1Bookcase.com report erratum • discuss

Arguments to the Ruby Interpreter • 31

choices that favor terseness over even readability, which isn’t usually the
choice we make when writing scripts.

This is the first step toward being able to use Ruby in an ad hoc pipeline: it
frees us from having to write our scripts to the filesystem. The second step
is to be able to read from input. After all, if we want our script to be able to
behave as part of a pipeline, as we saw in the previous chapter, then it needs
to be able to read from standard input.

The obvious solution might be to read from STDIN in the code that we pass in
to Ruby, looping over it line by line as we did in the previous chapter:
$ printf "foo\nbar\n" | ruby -e 'STDIN.each { |line| puts line.upcase }'
FOO
BAR

But this is a bit clunky. Considering how often we’ll want to process input
line by line, it would be much nicer if we didn’t have to write this tedious
boilerplate every time. Luckily, we don’t. Ruby offers a shortcut for just this
use case.

Streaming Lines with the -n Switch

If we pass Ruby the -n switch as well as -e, Ruby will act as though the code
we pass to it was wrapped in the following:
while gets
# execute code passed in -e here
end

This means that the code we pass in the -e argument is executed once for
each line in our input. The content of the line is stored in the $_ variable. This
is one of Ruby’s many global variables, sometimes referred to as cryptic globals,
and it always points to the last line that was read by gets.

So instead of writing the clunky looping example that we saw earlier:

$ printf "foo\nbar\n" | ruby -e 'STDIN.each { |line| puts line.upcase }'
FOO
BAR

we can simply write:

$ printf "foo\nbar\n" | ruby -ne 'puts $_.upcase'
FOO
BAR
</code>

<p> There's more to <inlinecode>$_</inlinecode> than this, though.

More books at 1Bookcase.com report erratum • discuss

Chapter 3. Shell One-Liners • 32

Ruby also defines some global methods that either act on

<inlinecode>$_</inlinecode> or have it as a default argument.
<ic>print</ic> is one of them: if you call it with no arguments,
it will output the value of <inlinecode>$_</inlinecode>. So we
can output the input that we receive with this short script:
</p>

[code language="session"]
$ printf "foo\nbar\n" | ruby -ne 'print'
foo
bar

This implicit behavior is particularly useful for filtering down the input to
only those lines that match a certain condition—only those that start with f,
for example:
$ printf "foo\nbar\n" | ruby -ne 'print if $_.start_with? "f"'
foo

This kind of conditional output can be made even more terse with another
shortcut. As well as print, regular expressions also operate implicitly on $_.
We’ll be covering regular expressions in depth in Chapter 8, Regular Expres-
sions Basics, on page 103, but if in the previous example we changed our
start_with? call to use a regular expression instead, it would read:

$ printf "foo\nbar\n" | ruby -ne 'print if /^f/'

This one-liner is brief almost to the point of being magical; the subject of both
the print statement and the if are both completely implicit. But one-liners like
this are optimized more for typing speed than for clarity, and so tricks like
this—which have a subtlety that might be frowned upon in more permanent
scripts—are a boon.

There are also shortcut methods for manipulating input. If we invoke Ruby
with either the -n or -p flag, Ruby creates two global methods for us: sub and
gsub. These act just like their ordinary string counterparts, but they operate
on $_ implicitly.

This means we can perform search and replace operations on our lines of
input in a really simple way. For example, to replace all instances of COBOL
with Ruby:
$ echo 'COBOL is the best!' | ruby -ne 'print gsub("COBOL", "Ruby")'
Ruby is the best!

We didn’t need to call $_.gsub, as you might expect, since the gsub method
operates on $_ automatically. This is a really handy shortcut.

More books at 1Bookcase.com report erratum • discuss

Another Random Scribd Document
with Unrelated Content
crystalline precipitate will form, caused by the sparing solubility of
the chloride of barium itself in acid solutions.

Sulphurous Acid.

Symbol, SO{2}. Atomic weight, 32.

This is a gaseous compound, formed by burning sulphur in
atmospheric air or oxygen gas; also by heating oil of vitriol in
contact with metallic copper, or with charcoal.
When an acid of any kind is added to hyposulphite of soda,
sulphurous acid is formed as a product of the decomposition of
hyposulphurous acid, but it afterwards disappears from the liquid by
a secondary reaction, resulting in the production of trithionate and
tetrathionate of soda.
Properties.—Sulphurous acid possesses a peculiar and suffocating
odor, familiar to all in the fumes of burning sulphur. It is a feeble
acid, and escapes with effervescence, like carbonic acid, when its
salts are treated with oil of vitriol. It is soluble in water.

Water.

Symbol, H{2}O. Atomic weight, 9.

Water is an oxide of hydrogen, containing single atoms of each of
the gases.
Distilled water is water which has been vaporized and again
condensed: by this means it is freed from earthy and saline
impurities, which, not being volatile, are left in the body of the
retort. Pure distilled water leaves no residue on evaporation, and
should remain perfectly clear on the addition of nitrate of silver, even
when exposed to the light; it should also be neutral to test-paper.
The condensed water of steam-boilers sold as distilled water is
apt to be contaminated with oily and empyreumatic matter, which
discolors nitrate of silver, and is therefore injurious.
Rain-water, having undergone a natural process of distillation, is
free from inorganic salts, but it usually contains a minute portion of
ammonia, which gives it an alkaline reaction to test-paper. It is very
good for photographic purposes if collected in clean vessels, but
when taken from a common rain-water tank should always be
examined, and if much organic matter be present, tinging it of a
brown color and imparting an unpleasant smell, it must be rejected.
Spring or river water, commonly known as "hard water," usually
contains sulphate of lime, and carbonate of lime dissolved in
carbonic acid: also chloride of sodium in greater or less quantity. On
boiling the water, the carbonic acid gas is evolved, and the greater
part of the carbonate of lime (if any is present) deposits, forming an
earthy incrustation on the boiler.
In testing water for sulphates and chlorides, acidify a portion
with a few drops of pure nitric acid, free from chlorine (if this is not
at hand, use pure acetic acid); then divide it into two parts, and add
to the first a dilute solution of chloride of barium, and to the second
nitrate of silver,—a milkiness indicates the presence of sulphates in
the first case or of chlorides in the second. The photographic nitrate
bath cannot be used as a test, since the iodide of silver it contains is
precipitated on dilution, giving a milkiness which might be mistaken
for chloride of silver.
Common hard water can often be used for making a nitrate bath
when nothing better is at hand. The chlorides it contains are
precipitated by the nitrate of silver, leaving soluble nitrates in
solution, which are not injurious. The carbonate of lime, if any is
present, neutralizes free nitric acid, rendering the bath alkaline in the
same manner as carbonate of soda. Sulphate of lime, usually
present in well water, is said to exercise a retarding action upon the
sensitive silver salts, but on this point the writer is unable to give
certain information.
Hard water is not often sufficiently pure for the developing fluids.
The chloride of sodium it contains decomposes the nitrate of silver
upon the film, and the image cannot be brought out perfectly. The
New River water, however supplied to many parts of London, is
almost free from chlorides and answers very well. In other cases a
few drops of nitrate of silver solution may be added to separate the
chlorine, taking care not to use a large excess.

Black Varnish.

Asphaltum, dissolved in Spirits or Oil of Turpentine.—The

asphaltum may be coarsely pulverized and put into a bottle
containing the turpentine, and in a few hours, if it be occasionally
shaken, it will be dissolved and ready for use. It should be of about
the consistency of thick paste.
I use the above, but will now give two more compositions, for
any who may wish to adopt them:
Black Japan.—Boil together a gallon of boiled linseed oil, 8
ounces of amber, and 3 ounces of asphaltum. When sufficiently cool,
thin it with oil of turpentine.
Brunswick Black.—Melt 4 lbs. of asphaltum, add 2 lbs. of hot
boiled linseed oil, and when sufficiently cool, add a gallon of oil of
turpentine.
The following is from Humphrey's Journal, Vol. viii, number 16.
Black Varnish.—I generally purchase this from the dealer; but I
have made an article which answered the purpose well, by dissolving
pulverized asphaltum in spirits of turpentine. Any of the black
varnishes can be improved by the addition of a little bees'-wax to it.
It is less liable to crack and gives an improved gloss.
Before closing this chapter, it has been thought advisable to
remark, that one of the most important departments of Photography
is the practice of its chemistry. Many of the annoying failures
experienced by those who are just engaging in the practice of the
art, arise from the want of good and pure chemical agents, and the
most certain way to avoid this, is to purchase them only from
persons who thoroughly understand both their nature and mode of
application. As many who may read this work might wish to know
the prices of the various articles employed in the practice of the
processes given, they can be informed by addressing the author,
who will furnish them with a printed Price List.

PRACTICAL DETAILS
OF THE

POSITIVE

AMBROTYPE PROCESS.
CHAPTER IV.

LEWIS'S PATENT VICES FOR HOLDING THE GLASS—CLEANING AND

DRYING THE GLASS—COATING—EXPOSURE IN THE CAMERA—
DEVELOPING—FIXING OR BRIGHTENING—BACKING UP, &C.

Manipulations.

Under the head of manipulations I give the MANIPULATIONS.

method I employ, and avoid confusion by omitting all comments
upon the thousand suggestions of others.
The glass is to have its sharp edges and corners removed, by
drawing a file once or twice over it. The article used for holding the
glass is called a vice. This vice is firmly secured to a bench.
[Since the foregoing pages have been in type there has been
introduced into market a new patent vice, adopted both for glass
and plate blocks. I find it, although a little more expensive, an article
better suited to the wants of the operator or amateur. It is called
Lewis's Patent Glass Vice.]
Clasp the glass firmly in the vice, and pour or spurt upon it a little
alcohol and rotten stone, previously formed into a paste, and then,
with a piece of cotton flannel, the same as used in the
daguerreotype, rub the glass until it is perfectly cleansed from all
foreign substances, which will soon be known by experience. The
rotten stone paste should not be allowed to dry while rubbing, as it
is more liable to scratch the glass. I use another small bottle
containing clear alcohol, which I spurt upon the glass, to obviate the
drying.
When the glass has been sufficiently cleaned, it should, while
wet, be put in a vessel of water for future rinsing. Clean, as before,
as many plates of glass as may be required, and when enough are
ready, rinse them off in the water, and then in a quantity of clean
water, or a running current, give them a second thorough rinsing,
and set them aside to drain.
A convenient method of doing this, is to drive two nails
horizontally into the wall or partition, a sufficient distance apart (say
about 2½ inches) for the glass to rest on: the upper corner of the
glass should be placed against the wall, and the extreme lower
diagonal corner left hanging between the nails—which will probably
be found the best position for draining yet suggested.
After drying, they may be put into a box for safe and clean
keeping. Particular caution is necessary to avoid handling the glass
during the operation. I never take the glass between my fingers, so
that they come in contact with both sides of it, except at one
particular corner, as at Figs. A and B. A quantity of glass prepared as
above, may be kept on hand for use two or three days, and when
wanted they should be again put into the vice[C] and cleaned, first
with cotton flannel wet with alcohol, and then with dry flannel; and
then, at a temperature slightly above that of the surrounding
atmosphere, except in cases where the thermometer stands above
70°, it is ready for the brush,[D] which should be carefully applied to
each surface, to free it from all particles of dust, and then it is ready
for the film of collodion.
[C] The vice should be thoroughly cleansed, and no particles
of rotten stone, or other matter, be allowed to come in contact
with the glass, as it might adhere to the edges and wash off into
the silvering bath, and ultimately cause specks. Always remember
that cleanliness is an indispensable requisite in order to produce a
good picture.
[D] One of the most desirable articles I have found for this
purpose is the wide (3 inch) flat camel's-hair brush often called a
blender.
Fig. A. Fig. B.

The glass is held between the thumb and forefinger of the left
hand by the corner 1, Fig. A., 3 and 4 towards and nearest the body,
and as nearly level as possible. I find this the best position to hold
the glass; as, in the case of the larger ones, they can be rested on
the end of the little finger, which should be placed as near the edge
as possible. Then, from the collodion vial, pour on the collodion,
commencing a little beyond the centre and towards 1, continuing
pouring in the same place until the collodion nearly reaches the
thumb—the glass slightly inclined that way; then let the glass incline
towards 4, and continue to pour towards 2.
As soon as enough has been put on to liberally flow the glass,
rapidly and steadily raise corner 1, and hold it directly over 3, where
the excess will flow oil into the mouth of the vial, which should be
placed there to receive it. In case of a speck of dust falling at the
time of coating, it can often be prevented from injuring the surface
by changing the direction of the flowing collodion, so as to stop it in
some place where it will not be seen when the picture is finished.
Now, with the thumb and finger of the right hand, I wipe off any
drops or lines of collodion that may be found upon the outer edge or
side of the glass, being careful not to disturb that connected with
the face.
When the coating has become sufficiently dry, so that when I put
my finger against it, it does not break the film, but only leaves a
print, I put it into the silvering bath [see Fig. p. 34]. I generally try
corners 2 and 3. The time, from the first commencement of pouring
on collodion to its being put into the bath, should not exceed about
half a minute, at a temperature of 60°. The finger test is the best I
have found. The glass is to be rested on a dipper [see Fig. p. 34],
and placed steadily and firmly into the nitrate of silver bath—this in a
dark room. It should not be allowed to rest for an instant as it is
entering the solution, or it would cause a line. The time for the glass
to remain in the bath depends upon the age and amount of silver
the bath contains; for a new solution, from two to three minutes will
be sufficient to give the proper action. If it be old, three to five
minutes will be better. When it is properly coated, it can be raised up
and taken by the corner, and allowed to drain for a few seconds, and
then should be placed in the tablet, and is ready for the camera. The
time of exposure will depend upon the amount of light present. If
the bath is newly mixed, and the collodion recently iodized, it should
produce a sufficiently strong impression by an exposure of about
one-third of the time required for a daguerreotype. If the collodion
has been iodized some time, and the bath is old, about one-half of
the time necessary to produce a daguerreian image will be required.
The plate should in no case be allowed to become dry from the
time it is taken from the bath up to the time of pouring on the
developer. At a temperature of about 70°, I have had the glass out
of the bath ten minutes without drying. After exposure, the glass
should be taken again into the dark room, and removed from the
tablet and held over a sink, pail, or basin and the developing solution
poured on it as follows: hold the glass between the thumb and
finger of the left hand, by the opposite end corner from that in
coating with collodion, i. e., 2, and let 3 and 4 be from you.
Commence pouring on the developing solution MANIPULATIONS
THE POSITIVE
OF

at the end by the thumb, and let it flow quickly and PROCESS.
evenly over the entire surface, the first flooding washing off any
excess of nitrate of silver there may be about the edges or corners
of the glass (if this silver is not washed off, it flows over the edges
and on the surface of the impression, producing white wavy clouds
of scum), and then hold the glass as nearly level as possible, it
having upon its surface a thin covering of solution (care should be
observed not to pour the developing solution on the plate in one
place, as it would remove all the nitrate of silver and prevent the
development of the image, leaving only a dark or black spot where it
is poured on). Put down the bottle containing the developing
solution, and take up a quart pitcher previously filled with water, and
as soon as the outline of the image can be plainly seen by the weak
or subdued light of an oil or fluid lamp or candle, pour the water
over copiously and rapidly. Continue this until all the iron solution
has been removed. If this is not done, the plate will be covered with
blue scum on the application of the washing solution. Then the glass
can be taken into a light room, and the iodide of silver coating
washed off with the cyanide solution, and then rinsed with clear
pure water, and stood in a position to drain and dry. I place a little
blotting paper under them: it aids in absorbing the water, and
facilitates the operation.
Place the face of the glass against the wall, in order to prevent
dust from falling upon it. I have often dried the coating by holding or
standing the glass adjacent to a stove. A steady heat is advisable, as
it leaves the surface in a more perfect state, and free from any
scum. After the coating is perfectly dry, it is ready for the preserving
process. It should be warmed evenly, and when about milk warm,
"Humphrey's Collodion Gilding" is poured on the image in precisely
the same manner as the collodion. In a few seconds the coating
sets, and after three-quarters of a minute, if it has not become dry,
the blaze of a spirit lamp may be applied to the back and it will
immediately become perfectly transparent, and nearly as hard as the
glass itself: the effect is fully equal, if not superior, to that of chloride
of gold in gilding the daguerreotype image. The surface becomes
brilliant and permanent. The back of the glass can now be wiped
and cleaned with paper or cloth, and gently warmed, and then with
a common small brush one coat of black varnish can be applied. This
brush should be drawn from side to side across the glass, and on the
side opposite to that which has received the image.
This is in order not to make streaks in the coating of varnish, but
to have uniform lines across the entire length or breadth of the
glass. If the varnish is of the proper consistency, it will flow into a
smooth, even coating. After this first coating is dry, apply a second
in the same manner, only in an opposite direction, so as to cross the
lines of the first, uniting at right angles; when this last coating is
very nearly dry, a piece of paper, glazed black on one side, and cut
to the proper size, can be put next the varnish; it gives it a clean
finish, at the same time that it aids towards a dense blackening.
I sometimes apply the black varnish by flowing, in the same
manner as in putting on the collodion.
This picture is to be colored and put up in the same manner as
the daguerreotype image, with a mat and glass. The last glass may
be dispensed with by first using the collodion gilding, and then upon
its surface apply the black varnish, as before. In this case the image
is seen through the same glass it is on, and without being reversed:
in this case the mat goes on the outside of the glass.
When the image is seen through the glass upon which it is taken,
it cannot be colored with very great success, as it cannot be seen
through the reduced silver forming it. This forms a more or less
opaque surface; but in point of economy the single glass is
preferable. Yet I would not recommend such economy, for I consider
that a good impression ought to be well put up, and the welfare of
the art fully substantiates that consideration.
Many ways have been devised for putting up pictures I have
produced pleasing effects upon colored glasses: for instance, a
picture on a light purple glass has a very pleasing effect; also in
some other colors. I have also used patent leather for backing the
image.
I have produced curious and interesting results by placing a piece
of white paper, or coloring white the back of the whites of the
image, and then blackening over or around this. By this means the
whites are preserved very clear.
Positives for Pins, Lockets, etc.—I employ mica for floating the
collodion on, as it can be as easily cut and fitted as the metallic plate
in the daguerreotype; and positives taken upon fine, clear,
transparent mica, are fully equal to those taken upon glass, and yet
they are ambrotypes.
Mica is an article familiar to every one, as being used in stoves,
gratings, etc.
The method of using it, is to take the impression on a thick piece,
and then split it off, which can readily be done in the most perfect,
thin, transparent plates; it is equally as thin as tissue paper, and can
be cut as easily. The thickness of the piece upon which the
impression is taken is of no moment, since it can be reduced at
pleasure and is more easily handled while thick.

Observations on the Positive Collodion Process.

Fogging.—There are numerous causes which will produce fogging:

the principal ones will be mentioned. One is the admission of light
upon the collodion. This maybe from a want of closeness of the dark
room, the tablet,[E] the camera, or by accidental exposure. The
method to locate the particular cause is to, first, when the glass is
taken from the nitrate bath, let it stand for sufficient time to drain,
then pour on the developer, and if the coating assumes a mistiness,
or light-grey color, the fault is in the dark room; again, if the plate,
after it has been treated with the developer and fixed, is clear, then
also the fault is there. Now try the tablet in the same manner, and if
not there, try the camera, and the proper location will be found.
[E] Since the foregoing pages have been in type an entirely
new feature in the line of apparatus has been introduced; this is
W. & W. H. Lewis's Patent Plate-holder with solid glass corners.
These Holders have every requisite for excluding the light from
the sensitive surface; they are accompanied with a "shut off," so
that when the slide is drawn no light can reach the glass. This, in
connection with the unequalled advantage of the solid corners,
makes them the most desirable article for the Operator.
Humphrey's Journal, in referring to these Holders, says:—
"We are always glad to note every step which our
mechanics make towards improvement on the apparatus
used by our practical photographic operators, and make
the present announcement of one which has only to be
known to be readily understood, and to be seen to be
appreciated. A patent has recently been granted for
making solid glass corners, which are to be attached to
plate-holders, and form the most perfect article that has
ever been introduced. Heretofore the operator has had
the corners of his plate-holders made with separate
pieces of glass, cut so as to fit the corners of his frames;
these are only glued or grooved in, and are constantly
coming apart, falling out, and annoying in many ways;
for our part, we never have considered them as fit for
use in any manner. We look upon the present
improvement as destined to entirely supersede all the
methods heretofore introduced. In this case the
collodionized or albumenized plate can come in contact
with no other substance than a single piece of glass, and
consequently there is far less liability of accident from
either the staining of the plate or breaking of the holder.
The rapid favor this improvement has gained already
shows its great advantage over all other methods
heretofore employed."

"Decomposition by exposure to light or by long keeping, even in

the dark. The author conceives that it is possible for organic matter
alone to produce, after a time, a partial decomposition of solution of
nitrate of silver, sufficient to prevent it from being employed
chemically neutral, but probably not much interfering with its
properties in other respects.
"Use of rain water or of water containing carbonate of silver
being perfectly neutral and from nitric acid. This difficulty is not a
theoretical one only, but has actually been experienced. Rain water
usually contains ammonia, and has a faint alkaline reaction. Pump
water often abounds with carbonate of lime, much of which, but not
the whole, is deposited on boiling. To remove the alkaline condition,
add acetic acid, one drop to half a pint of the solution.
"Partial decomposition of the bath, by contact with metallic iron,
with hyposulphite of soda, or with any developing agent, even in
small quantity. Also by the use of accelerators, which injure the bath
by degrees, and eventually prevent its employment in an accurately
neutral state.
"Vapor of ammonia, or hydrosulphate of ammonia, escaping into
the developing room."
Spots.—One principal cause of spots is dust. The operating room
should be kept as free from this as possible, and instead of its being
dusted, it should be wiped with a damp cloth. Specks or flakes of
iodide of silver are often found in the nitrate bath. These sometimes
occur by an ever-iodized collodion, and sometimes by collodion
falling off while being silvered. When this occurs, the nitrate of silver
solution should be filtered. A new sponge or a tuft of cotton is a
good article to filter nitrate of silver solution through. A small particle
of light finding its way upon the plate, will produce a spot. Another
and very frequent cause is, putting the slide of the tablet down
rapidly, causing it to spatter upon the plate the solution which has
drained off from it. This paper will be opaque when viewed by
reflected light, and dark when viewed by transmitted light.
Occasionally a sort of transparent spot will appear: this may be
traced to a want of sensibility of the iodide of silver. Large
transparent spots frequently appear by the operator's pouring the
developing solution upon one place, and washing off the small
quantity of nitrate of silver necessary to develope the image. This
will be easily detected, and can be obviated by flooding the most of
the surface of the glass with a steady stream of the developer.
Stains and Lines.—If the glass be allowed to rest for an instant
with one portion of its surface in the silvering solution and the other
out of it, it would cause a streak across; hence the necessity of
totally immersing it with one firm, steady motion removing the glass
before it has been thoroughly wetted, and the ether and alcohol
allowed a uniform action over the entire surface. A plate should not
be disturbed in the bath until it has been in a full minute at least.
Irregular Lines are often caused by using the developing solution
too strong, or by not pouring it evenly over the plate at once. Should
it be allowed to rest in its progress, if but for an instant, it will leave
its line. Sometimes spangles of metallic silver appear: these are
caused by the presence of too much nitric acid in the developer for
the proportion of iodide in the film and the strength of the bath.
There are other phases connected with the practice of the
positive process, which it would be almost impossible to commit to
paper, and cannot be so explained as to be perfectly comprehended
by the new experimenter. It is absolutely necessary for all to observe
every little point noticed in the foregoing pages, and at the same
time exercise some good judgment, and no one need hesitate
through fear of not being successful.

PRACTICAL DETAILS
OF THE

NEGATIVE PROCESS.
CHAPTER V.

NEGATIVE PROCESS—SOLUBLE COTTON—PLAIN COLLODION—

DEVELOPING SOLUTION—RE-DEVELOPING SOLUTION—FIXING
THE IMAGE—FINISHING THE IMAGE—NITRATE OF SILVER
BATH.

Negative Process.

The manipulations and chemicals employed in the production of

the negative collodion pictures are very similar to those already
given for operating by the positive process; frequent reference will
therefore necessarily be made to portions of that process, as
described in the preceding pages, and only such parts will be given
here, as do not correspond with the foregoing.
It is thought advisable to omit in this chapter every reference
that does not have a desired tendency to aid the operator in the
plain straightforward order of manipulation. The negative process is
fast becoming popular and needs the attention of all who desire to
keep pace with the experiments in the art. Since the first edition of
this work it has been my pleasure to see many fine photographic
specimens produced by the following process, and no one need fail,
if he will carefully adhere to the details given.
There perhaps may be circumstances making it advisable for
some to have but one nitrate of silver solution for both positive and
negative collodion pictures: for such, a process will be given in the
following pages, which has recently appeared in Humphrey's Journal,
and is called, after its author, the "Helio Process," this is well
adapted for most purposes.
Soluble Cotton.

The method for preparing this has been given in page 41. It is
prepared in the same manner for both positives and negatives.

Plain Collodion.

The preparation of plain collodion employed is the same as that

described at page 53.

Developing Solution for Negatives.

Rain or distilled water 6 ounces.

Protosulphate of iron 300 grains.
Acetic acid 2 ounces.

A little alcohol may be added to make it flow more evenly over

the plate—say 1 oz.
This solution can be kept in a pint bottle, and should have a
funnel devoted solely to the purpose of filtering it. One of the most
convenient dishes for receiving this solution, when poured over the
plate, is a bowl with a lip to it, as it can be readily poured back into
the funnel.
The mode of employing this developer is the same as that for
positives, described at page 133. It may be used an indefinite
number of times, but should be kept clean; it soon assumes a red
color.

Re-developing Solution.
This solution is for the purpose of giving increased intensity to
the negative, but as its use in the hands of beginners is attended
with some difficulty, I would not recommend the operator to try it
until he has had considerable experience in the developing process,
or he will undoubtedly spoil his proofs. Its use requires promptness
of action and quick observation.
The following is the formula for its preparation:
Water 4 ounces.
Protosulphite of iron 400 grains.
Put this into a bottle, and when the crystals are dissolved, it is
ready for use. It should be kept filtered, and can be used only once.
Now in another bottle put
Water 4 ounces.
Nitrate of silver 48 grains.
Remarks.—The impression is to be well washed after the
developing solution has been poured off, and then the re-developing
solution (that portion containing the protosulphate of iron) can be
poured on—the plate being held perfectly level: the surface is
completely covered; the water containing the nitrate of silver should
then be poured rapidly on, to mix with the iron, when the surface of
the impression will instantly commence to blacken; and if the action
be allowed to continue for a lengthened period, say one minute, the
impression will be ruined.
It is a matter worthy of notice, that there is no perceptible action
when the iron solution is poured over the glass; but the action is
very energetic the instant the nitrate of silver solution comes in
contact with the iron salt and the silver.
As soon as any change can be observed, after the re-developer
has been poured over the plate, it should be quickly and copiously
washed off with clean water, and then it is ready for the fixing
process.
I would dissuade novices in the art from practising with the re-
developing solution, until they have first thoroughly mastered the
entire process of taking negatives. The developing solution is the
only one used by operators generally, and will, with proper care,
produce satisfactory results.

Fixing the Image.

Water 8 ounces.
Hyposulphite of soda 4 ounces.

Remarks.—This is nearly a saturated solution. The glass can be put

in a dish and the solution poured over, or held in the hand, in the
same way as the plate in the daguerreotype process. It can readily
be seen when a sufficient action has been attained, as the unaltered
bromo-iodide of silver will be dissolved, leaving only the reduced
surface holding the image.
This action should not be continued too long, as it affects the
intensity of the picture, injuring it for printing.
The glass should be well washed by pouring over it clean water,
and then it can be stood away to dry, in a nearly perpendicular
position, on clean blotting paper, or otherwise, as is most
convenient; when thoroughly dry, it is ready for the finishing.

Finishing the Image.

This is done with the same material, and in the same manner, as
that given for positives—page 134.
Remarks.—The glass negatives, when not wanted for use, should
be carefully put aside in a box, and kept free from dust and
dampness: by so doing, it is believed that they will remain good for
any length of time.
Nitrate of Silver Bath.

This solution differs only from the positive bath, by omitting the
nitric acid: in all other respects it is precisely the same, and is
prepared by the same formula, as given at page 64.
This is called the neutral bath, and is best adapted to the
negative process. The nitrate of silver employed in its preparation
should be perfectly free from excess of nitric acid, otherwise the
whole solution will be slightly acid.
If it should not be convenient to obtain nitrate of silver without
this objection, the acid may be neutralized by putting into the
solution a small quantity of common washing soda— say 1 grain to
each 100 grains of nitrate of silver—previously dissolved in about
half an ounce of water. This may be put in at the same time that the
iodide of potassium is, and it would save one filtration.
In twenty samples of nitrate of silver that I have tried the above
quantity of soda has been found sufficient; if, however, the white
precipitate first formed is re-dissolved on shaking the mixture, free
nitric acid is present, and more of the soda may be added.
This bath will improve by age, and be less liable to fog after
having been in constant use for one or two weeks.
Operators who have the means, and design following the art
professionally, will find it to their advantage to make from two to
three times the quantity of solution they require for immediate use:
by this means they will be enabled to replenish their stock, which
may be used up or otherwise lost.

PRACTICAL DETAILS
OF THE

PRINTING PROCESS.
CHAPTER VI.

PRINTING PROCESS—SALTING PAPER—SILVERING PAPER—

PRINTING THE POSITIVE—FIXING AND COLORING BATH—
MOUNTING THE POSITIVE.

The Printing Process.

There is probably no department of the MANIPULATIONS

THE PRINTING
OF

photographic art where can be found an equal PROCESS.

amount of variety, as regards chemicals, manipulations, etc. The
course adopted in the commencement, of giving only one process
for the operator to work by—and that a good one—will be strictly
adhered to in this place. I have produced as good positives on paper
by the following plan, as I have ever seen. Should the reader wish
more extensive acquaintance with the printing processes, he is
referred to Humphrey's Journal.

Salting Paper.

Water 1 quart.
Muriate of ammonia 65 grains.

The water is put into a flat, gutta-percha, glass, or earthen dish,

and the muriate of ammonia is put into it, and stirred until it is
dissolved and is well mixed with the water; then proceed as follows:
we will suppose we have a gutta-percha dish sufficiently large to
take in a sheet of paper 8 by 10 inches, and about 1½ or 2 inches
deep: take hold of two corners of the paper with the thumb and
finger of each hand, and then draw the paper through the solution,
by passing it from one end of the dish to the other, so that it will be
wetted on both sides; then turning it over in the same manner, draw
it back, so that its surface will be thoroughly moistened, but it is not
necessary to saturate the paper. Now the paper is ready for drying,
which may be done by hanging it on the edge of a shelf by means of
little tack nails put through it at the same corners by which it was
held in passing through the salting solution. In order to prevent
streaks, from forming upon the paper, it is better to hang it in such a
manner that it cannot touch the shelf, except at the corners: say the
sheet is eight inches wide, and the tacks (which are put through the
corners) to be only five or six inches apart, this will give the proper
bend outwards, preventing its contact with the shelf. This entire
operation can be performed in daylight, or otherwise as suits the
convenience of the operator.
This paper, when dry, should be laid between the folds of blotting
paper (filtering paper will answer), and may be kept for any length
of time, and is ready for the silvering process.

Silvering Paper.

In silvering paper, I employ the ammonio-nitrate, which is

prepared as follows:—

Water 2¼ ounces.
Nitrate of silver 75 grains.
Dissolve (in a 4-ounce vial) the nitrate of silver in the water, and
then pour one-fourth of the solution into an ounce graduate or any
convenient vessel: this keep for farther use in preventing the
presence of an excess of ammonia. Now, into the bottle containing
the three-fourths put about 4 drops of aqua-ammonia; shake well
and a brown precipitate will be given. Continue adding the ammonia,
drop by drop, and shake after each addition, until the brown
precipitate is re-dissolved and the solution is clear; then pour back
into the bottle the one-fourth taken out at first: this will leave the
solution slightly turbid, and when so, there is no excess of ammonia
which would be objectionable. It may now be filtered through
filtering paper, and it (the clear liquid) is ready for use. This should
be kept in the dark, as it decomposes rapidly when exposed to light.
The method of silvering the paper with ammonio-nitrate of silver,
is as follows: take a tuft of clean cotton, roll it into a ball-shape, then
wet it by holding it against the mouth of the bottle containing the
ammonio-nitrate, and when well wet, apply it to the paper (which
should be placed flat on a clean board) by gently rubbing it over the
surface, care being taken not to roughen it.
If the solution has not been filtered for some time, it would be
advisable to pour a little on the centre of the paper, and then
distribute it over the surface by means of the cotton, which is held in
the fingers: by this last method any sediment which may be in the
bottom of the bottle is prevented from getting upon the paper, and
causing spots.
I have used a brush for the purpose of distributing the solution,
by which plan there is less liability of getting it on the fingers and
staining them. Care must be taken to cover the entire surface of the
paper, or there will be light streaks, occasioned by the absence of
the silvering solution.
This want of silver will appear on the paper in light parts, as seen
in the accompanying cut:
Fig. 36.
After the paper has been perfectly coated, or washed with the
silvering solution, it should be placed in a perpendicular position to
dry. I usually tack the paper on a board of the requisite size, and
then stand it on one edge until it has drained and dried. As soon as
dry, it is ready for use. This paper will not keep more than twelve
hours, therefore the operator should silver in the morning the
quantity required for the day. It is imperatively necessary that the
silvered paper be kept in the dark. It is extremely sensitive to light,
and a very brief exposure of the prepared sheet would render it unfit
for use.

Printing the Positive.

The several kinds of apparatus used for holding the negative and
the sensitive paper together, have already been given on page 36,
Figs. 31, 32, 33. The paper having been salted and silvered, as just
described, should be placed on the pad of the printing frame or
glasses, with its sensitive surface up, and then the negative placed
directly upon and in contact with it; then it is to be fastened
together, when it will be ready for exposure to the direct rays of the
sun. From 10 to 40 seconds will be found enough to give a
sufficiently intense print.
The paper first changes to a slate color, and then to a brown or
copper color t when of a dark slate color is about the proper time to
take it out and immerse in the toning bath.

Fixing and Coloring Bath.

I have employed the proportions given by Mr. Hardwich in his

Photographic Chemistry, page 209—Humphrey's American edition.
Solution of chloride of gold, a quantity
equivalent to 4 grains.
Nitrate of silver 30 "
Hyposulphite of soda 2 ounces.
Water 8 "
"Dissolve the hyposulphite of soda in four ounces of the water,
the chloride of gold in three ounces, the nitrate of silver in the
remaining ounce; then pour the diluted chloride by degrees into the
hyposulphite, stirring meanwhile with a glass rod; and afterwards
the nitrate of silver in the same way. This order of mixing the
solution is to be strictly observed; if it were reversed, the
hyposulphite of soda being added to the chloride of gold, the result
would be the reduction of metallic gold. The difference depends
upon the fact that the hyposulphite of gold which is formed is an
exceedingly unstable substance, and cannot exist in contact with
unaltered chloride of gold. It is necessary that it should be dissolved
by hyposulphite of soda immediately on its formation, and so
rendered more permanent by conversion into a double salt of soda
and gold.
"The time of coloration depends much upon the quantity of gold
present, and may in some cases be extended to many hours. The
results of a few experiments, performed roughly, appeared to
indicate that the activity of this bath is less affected by depression of
temperature than those prepared with tetrathionate. Certainly the
injurious effects of prolonged immersion are not so evident as with
the first two formulæ: the purity of the whites remains unaltered for
many hours if the bath is new, but with an old bath there is a
tendency to yellowness, which is probably caused by the presence of
sulphuretted hydrogen. Fresh chloride of gold must be added from
time to time, as it appears to be required."
After the impression has remained in the toning bath a sufficient
length of time, it should be placed in a dish or sink of clean water,
which should be changed several times—floating for at least 12
hours; then it may be taken out and hung up to dry.
"Touching."—The coloring of a photograph forms no part of my
process: this is a matter to be given into the hands of an artist, and
when it bears the finishing touch of his skill, it is no longer a
photograph, but an oil or watercolor painting; all the delicate
workings of nature having been lost or hidden under the colors.
A photograph may often be "touched" to advantage. If, as is
frequently the case, there be little white spots on the face of the
paper, they may be readily covered by the application of a little India
ink, with the point of a wet pencil or fine small brush.

Mounting of Positives.

This, though a small matter in itself, is worthy of great attention.

The durability of the proof depends much upon the purity of the
paste used in causing its adhesion to the Bristol board. I have
employed the following composition with the most eminent success:
—

Gum arabic 2 ounces

Gum tragacanth 1½ "
Isinglass 1½ "
Sugar ½ "
Water 3 pints
These ingredients should all be dissolved, and then boiled down
to the proper consistency, by means of a gentle heat.
I will give another composition, which will serve a good purpose,
and keep for a long time:—
Water 8 ounces.
One table spoonful of wheat flour
Powdered alum 40 grains.
Powdered resin "
Brown sugar 1 ounce.
Bichloride of mercury 20 grains.
This last composition may be more convenient for operators, and
it will answer the purpose well. It is thought by some to be the best
and most durable paste yet prepared for the purpose.
FACTS WORTH MENTIONING.

The Poisonous Effects of cyanide of potassium FACTS WORTH

KNOWING.
upon sores, may be obviated by immediately
applying some of the positive developing solution, described at page
62. By this means much annoyance may be avoided to persons
afflicted with chapped or sore hands.
Bending Glass Rods or tubes can be easily done by subjecting
them to the blaze of a spirit lamp—the same as that used for gilding
the daguerreotype. First hold the rod just above the blaze, then
gradually allow it to descend into it, imparting to the rod a rotatory
motion with the finger and thumb: this will soon cause a softening of
the glass, when it may be bent to any desired shape. If the ends are
to be bent to form hooks, another small piece of glass, or any warm
metal, may be placed upon the end, in the blaze of the lamp, and as
soon as thoroughly softened, it can be pressed or bent to form the
hook. By filing around a glass tube or rod, it may be easily and
safely broken at the desired point, by giving it a sudden jerk
between both hands, holding it close to the encircled part.
Cementing Glass may be readily accomplished by placing the two
ends together in the blaze of the lamp, and holding them there until
they attain a sufficient degree of heat to slightly fuse: when cool, the
ends will be found perfectly united.
The Background best adapted to positives is unbleached muslin,
such, as is sold for sheeting, and can be found in almost any dry
goods' store: it should be from two to three yards wide. A clouded
appearance is given to the background by merely marking it with
charcoal, forming streaks or "waves" resembling clouds. These come
out black, or dark, in the positive, and give a variegated appearance.
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

Download Complete Mastering Ruby Closures 1st Edition Benjamin Tan Wei Hao PDF for All Chapters
100% (2)
Download Complete Mastering Ruby Closures 1st Edition Benjamin Tan Wei Hao PDF for All Chapters
79 pages
Complete Download Narrative Therapy Making Meaning Making Lives 1st Edition Catrina Brown PDF All Chapters
100% (1)
Complete Download Narrative Therapy Making Meaning Making Lives 1st Edition Catrina Brown PDF All Chapters
67 pages
Get Programming Ruby 3 3 The Pragmatic Programmers Guide 5 / converted Edition Noel Rappin PDF ebook with Full Chapters Now
100% (2)
Get Programming Ruby 3 3 The Pragmatic Programmers Guide 5 / converted Edition Noel Rappin PDF ebook with Full Chapters Now
45 pages
MineSight - Designing Pits For LTP With Pit Expansion Tool
100% (2)
MineSight - Designing Pits For LTP With Pit Expansion Tool
51 pages
Rake Task Management Essentials
From Everand
Rake Task Management Essentials
Andrey Koleshko
3/5 (1)
Instant Parallel Processing with Gearman
From Everand
Instant Parallel Processing with Gearman
John Ewart
No ratings yet
How To Install ACSR
No ratings yet
How To Install ACSR
4 pages
Po Acct Generator Customization
No ratings yet
Po Acct Generator Customization
28 pages
Anyconnect VPN Troubleshooting
No ratings yet
Anyconnect VPN Troubleshooting
21 pages
PSPP Tutorial
100% (2)
PSPP Tutorial
20 pages
FBX 2011 2 Converter
No ratings yet
FBX 2011 2 Converter
32 pages
Instant Download Text Processing with Ruby Extract Value from the Data That Surrounds You 1st Edition Rob Miller PDF All Chapters
100% (1)
Instant Download Text Processing with Ruby Extract Value from the Data That Surrounds You 1st Edition Rob Miller PDF All Chapters
76 pages
Text Processing with Ruby Extract Value from the Data That Surrounds You 1st Edition Rob Miller - The ebook in PDF format is ready for immediate access
No ratings yet
Text Processing with Ruby Extract Value from the Data That Surrounds You 1st Edition Rob Miller - The ebook in PDF format is ready for immediate access
79 pages
Complete Download (Ebook) Text Processing with Ruby: Extract Value from the Data that Surrounds You by Rob Miller PDF All Chapters
100% (6)
Complete Download (Ebook) Text Processing with Ruby: Extract Value from the Data that Surrounds You by Rob Miller PDF All Chapters
81 pages
48909
No ratings yet
48909
79 pages
PDF Programming Ruby 1 9 2 0 The Pragmatic Programmers Guide Fourth Edition Dave Thomas download
100% (1)
PDF Programming Ruby 1 9 2 0 The Pragmatic Programmers Guide Fourth Edition Dave Thomas download
67 pages
Programming Ruby 1 9 2 0 The Pragmatic Programmers Guide Fourth Edition Dave Thomas pdf download
100% (1)
Programming Ruby 1 9 2 0 The Pragmatic Programmers Guide Fourth Edition Dave Thomas pdf download
56 pages
Full download Programming Ruby 1 9 2 0 Dave Thomas pdf docx
100% (2)
Full download Programming Ruby 1 9 2 0 Dave Thomas pdf docx
63 pages
Programming Ruby 1 9 2 0 Dave Thomas download
100% (1)
Programming Ruby 1 9 2 0 Dave Thomas download
50 pages
Get Programming Ruby 1 9 2 0 Dave Thomas free all chapters
No ratings yet
Get Programming Ruby 1 9 2 0 Dave Thomas free all chapters
62 pages
Programming Ruby 1 9 2 0 Dave Thomas - Read the ebook online or download it as you prefer
100% (1)
Programming Ruby 1 9 2 0 Dave Thomas - Read the ebook online or download it as you prefer
55 pages
Download Full Programming Ruby 1 9 2 0 Dave Thomas PDF All Chapters
100% (7)
Download Full Programming Ruby 1 9 2 0 Dave Thomas PDF All Chapters
81 pages
The Book of Ruby: A Hands-On Guide for the Adventurous
From Everand
The Book of Ruby: A Hands-On Guide for the Adventurous
Huw Collingbourne
2.5/5 (5)
Programming Ruby 1 9 2 0 Dave Thomas - The latest ebook is available, download it today
100% (1)
Programming Ruby 1 9 2 0 Dave Thomas - The latest ebook is available, download it today
45 pages
Ruby by Example: Concepts and Code
From Everand
Ruby by Example: Concepts and Code
Kevin C. Baird
5/5 (2)
Instant RubyMine Assimilation
From Everand
Instant RubyMine Assimilation
Dave Jones
No ratings yet
Rails: Novice to Ninja: Build Your Own Ruby on Rails Website
From Everand
Rails: Novice to Ninja: Build Your Own Ruby on Rails Website
Glenn Goodrich
4/5 (1)
Metaprogramming Ruby program like the Ruby pros 1st Edition Paolo Perrotta 2024 Scribd Download
100% (13)
Metaprogramming Ruby program like the Ruby pros 1st Edition Paolo Perrotta 2024 Scribd Download
50 pages
Ruby Phrasebook
No ratings yet
Ruby Phrasebook
218 pages
Metaprogramming Ruby program like the Ruby pros 1st Edition Paolo Perrotta instant download
100% (13)
Metaprogramming Ruby program like the Ruby pros 1st Edition Paolo Perrotta instant download
53 pages
Instant Apache Stanbol
From Everand
Instant Apache Stanbol
Reto Bachmann-Gmur
No ratings yet
Mastering Ruby Closures 1st Edition Benjamin Tan Wei Hao 2024 scribd download
100% (18)
Mastering Ruby Closures 1st Edition Benjamin Tan Wei Hao 2024 scribd download
52 pages
Mastering Ruby Closures 1st Edition Benjamin Tan Wei Hao - The ebook is ready for download with just one simple click
67% (3)
Mastering Ruby Closures 1st Edition Benjamin Tan Wei Hao - The ebook is ready for download with just one simple click
62 pages
Download (Ebook) Mastering Ruby Closures by Benjamin Tan Wei Hao ISBN 9781680502206, 9781680502381, 1680502204, 1680502387 ebook All Chapters PDF
100% (2)
Download (Ebook) Mastering Ruby Closures by Benjamin Tan Wei Hao ISBN 9781680502206, 9781680502381, 1680502204, 1680502387 ebook All Chapters PDF
82 pages
Ruby Programming For Beginners: The Simple Guide to Learning Ruby Programming Language Fast!
From Everand
Ruby Programming For Beginners: The Simple Guide to Learning Ruby Programming Language Fast!
Tim Warren
2/5 (2)
Preface: The Need For This Book
No ratings yet
Preface: The Need For This Book
5 pages
Getting Started with React Native: Learn to build modern native iOS and Android applications using JavaScript and the incredible power of React
From Everand
Getting Started with React Native: Learn to build modern native iOS and Android applications using JavaScript and the incredible power of React
Ethan Holmes
4/5 (3)
Mastering Ruby Closures A Guide to Blocks Procs and Lambdas 1st Edition Benjamin Tan Wei Hao - Quickly download the ebook to never miss any content
100% (2)
Mastering Ruby Closures A Guide to Blocks Procs and Lambdas 1st Edition Benjamin Tan Wei Hao - Quickly download the ebook to never miss any content
52 pages
(Ebook) Enterprise Integration with Ruby by Maik Schmidt ISBN 9780976694069, 0976694069 instant download
100% (1)
(Ebook) Enterprise Integration with Ruby by Maik Schmidt ISBN 9780976694069, 0976694069 instant download
56 pages
Learning Mongoid
From Everand
Learning Mongoid
Gautam Rege
No ratings yet
[Ebooks PDF] download Mastering Ruby Closures A Guide to Blocks Procs and Lambdas 1st Edition Benjamin Tan Wei Hao full chapters
100% (7)
[Ebooks PDF] download Mastering Ruby Closures A Guide to Blocks Procs and Lambdas 1st Edition Benjamin Tan Wei Hao full chapters
61 pages
Complete Download (Ebook) Mastering Ruby Closures: A Guide to Blocks, Procs, and Lambdas by Benjamin Tan Wei Hao ISBN 9781680502619, 1680502611 PDF All Chapters
100% (1)
Complete Download (Ebook) Mastering Ruby Closures: A Guide to Blocks, Procs, and Lambdas by Benjamin Tan Wei Hao ISBN 9781680502619, 1680502611 PDF All Chapters
76 pages
JavaScript: Novice to Ninja
From Everand
JavaScript: Novice to Ninja
Darren Jones
2/5 (1)
Web Development with Jade
From Everand
Web Development with Jade
Seán Lang
No ratings yet
Ruby best practices 1st ed Edition Gregory T Brown pdf download
No ratings yet
Ruby best practices 1st ed Edition Gregory T Brown pdf download
55 pages
Build Awesome Command Line Applications in Ruby Control Your Computer Simplify Your Life 1st Edition David Bryant Copeland download
No ratings yet
Build Awesome Command Line Applications in Ruby Control Your Computer Simplify Your Life 1st Edition David Bryant Copeland download
56 pages
(Ebook) Programming Ruby 3.3: The Pragmatic Programmers Guide by Noel Rappin, Dave Thomas ISBN 9781680509823, 1680509829instant download
100% (4)
(Ebook) Programming Ruby 3.3: The Pragmatic Programmers Guide by Noel Rappin, Dave Thomas ISBN 9781680509823, 1680509829instant download
54 pages
Full Download Mastering Ruby Closures A Guide to Blocks Procs and Lambdas 1st Edition Benjamin Tan Wei Hao PDF DOCX
100% (1)
Full Download Mastering Ruby Closures A Guide to Blocks Procs and Lambdas 1st Edition Benjamin Tan Wei Hao PDF DOCX
51 pages
Rails for PHP Developers 1st Edition Derek Devries - Read the ebook online or download it to own the complete version
No ratings yet
Rails for PHP Developers 1st Edition Derek Devries - Read the ebook online or download it to own the complete version
42 pages
Instant Jsoup How-to
From Everand
Instant Jsoup How-to
Pete Houston
No ratings yet
Programming Elm Build Safe and Maintainable Front End Applications 1st Edition Jeremy Fairbank 2024 Scribd Download
100% (5)
Programming Elm Build Safe and Maintainable Front End Applications 1st Edition Jeremy Fairbank 2024 Scribd Download
62 pages
The Book of Ruby A Hands On Guide for the Adventurous 1st Edition Huw Collingbourne pdf download
100% (2)
The Book of Ruby A Hands On Guide for the Adventurous 1st Edition Huw Collingbourne pdf download
64 pages
The Book of Ruby A Hands On Guide for the Adventurous 1st Edition Huw Collingbourne pdf download
100% (1)
The Book of Ruby A Hands On Guide for the Adventurous 1st Edition Huw Collingbourne pdf download
58 pages
C++ for Beginners: The Complete Guide to Learn C++ Programming with Ease and Confidence
From Everand
C++ for Beginners: The Complete Guide to Learn C++ Programming with Ease and Confidence
Lena Neill
No ratings yet
Get Programming Ruby 3 3 The Pragmatic Programmers Guide 5 / converted Edition Noel Rappin free all chapters
No ratings yet
Get Programming Ruby 3 3 The Pragmatic Programmers Guide 5 / converted Edition Noel Rappin free all chapters
79 pages
Metaprogramming Ruby 2 Program Like the Ruby Pros 2nd Edition Paolo Perrotta download
100% (1)
Metaprogramming Ruby 2 Program Like the Ruby Pros 2nd Edition Paolo Perrotta download
55 pages
Getting Started with Grunt: The JavaScript Task Runner
From Everand
Getting Started with Grunt: The JavaScript Task Runner
Jaime Pillora
3/5 (3)
Instant Download (Ebook) Programming Ruby 3.3: The Pragmatic Programmers Guide by Noel Rappin, Dave Thomas ISBN 9781680509823, 1680509829 PDF All Chapters
100% (3)
Instant Download (Ebook) Programming Ruby 3.3: The Pragmatic Programmers Guide by Noel Rappin, Dave Thomas ISBN 9781680509823, 1680509829 PDF All Chapters
81 pages
Guide To Programming And Algorithms Using R Ozgur Ergul Ergl pdf download
100% (4)
Guide To Programming And Algorithms Using R Ozgur Ergul Ergl pdf download
56 pages
Advanced Express Web Application Development
From Everand
Advanced Express Web Application Development
Andrew Keig
No ratings yet
RSpec Essentials
From Everand
RSpec Essentials
Mani Tadayon
3/5 (1)
Mastering Python Design Patterns
From Everand
Mastering Python Design Patterns
Sakis Kasampalis
No ratings yet
Code - May June 2024
No ratings yet
Code - May June 2024
76 pages
(Ebook) Programming Ruby 1.9 by Dave Thomas, Chad Fowler, Andy Hunt ISBN 9781934356081, 1934356085download
100% (4)
(Ebook) Programming Ruby 1.9 by Dave Thomas, Chad Fowler, Andy Hunt ISBN 9781934356081, 1934356085download
58 pages
2050Learn Enough Ruby to Be Dangerous Michael Hartl - Quickly download the ebook to never miss any content
100% (1)
2050Learn Enough Ruby to Be Dangerous Michael Hartl - Quickly download the ebook to never miss any content
53 pages
Clojure for Java Developers
From Everand
Clojure for Java Developers
Díaz Eduardo
No ratings yet
Viili Perpetual Nocook Homemade Yogurt How To Make The Worlds Easiest Healthiest 100percent Natural Yogurt 1st Caleb Warnock download
No ratings yet
Viili Perpetual Nocook Homemade Yogurt How To Make The Worlds Easiest Healthiest 100percent Natural Yogurt 1st Caleb Warnock download
36 pages
Fifty Shades Of Dorian Gray Oscar Wilde pdf download
No ratings yet
Fifty Shades Of Dorian Gray Oscar Wilde pdf download
52 pages
In Jesus Name The History And Beliefs Of Oneness Pentecostals David A Reed instant download
No ratings yet
In Jesus Name The History And Beliefs Of Oneness Pentecostals David A Reed instant download
77 pages
Commercial Law Q A 2003 2004 3 e Cavendish Q a 3rd Edition Dobson download
100% (1)
Commercial Law Q A 2003 2004 3 e Cavendish Q a 3rd Edition Dobson download
83 pages
Symmetry in Syntax Merge Move and Labels 1st Edition Barbara Citko download
100% (1)
Symmetry in Syntax Merge Move and Labels 1st Edition Barbara Citko download
77 pages
W 23
No ratings yet
W 23
49 pages
Instant Download (Ebook) A Practical Guide to Digital Forensics Investigations 2nd Edition by Darren R. Hayes ISBN 9780789759917, 0789759918 PDF All Chapters
100% (5)
Instant Download (Ebook) A Practical Guide to Digital Forensics Investigations 2nd Edition by Darren R. Hayes ISBN 9780789759917, 0789759918 PDF All Chapters
60 pages
Netwrix Auditor Installation Configuration Guide PDF
No ratings yet
Netwrix Auditor Installation Configuration Guide PDF
258 pages
Manovikas School For Inclusion: Practice Paper
No ratings yet
Manovikas School For Inclusion: Practice Paper
7 pages
HQD Portable Meter: Basic User Manual Manuel D'Utilisation de Base Manual Básico Del Usuario Manual Básico Do Usuário
No ratings yet
HQD Portable Meter: Basic User Manual Manuel D'Utilisation de Base Manual Básico Del Usuario Manual Básico Do Usuário
118 pages
Microsoft Word For Dissertations
No ratings yet
Microsoft Word For Dissertations
21 pages
Huawei - Router ATN Series MML
No ratings yet
Huawei - Router ATN Series MML
13 pages
Vimba User Guide
No ratings yet
Vimba User Guide
16 pages
Version Control With Git and Github
No ratings yet
Version Control With Git and Github
42 pages
Backup Policy and Procedure Ver 4
No ratings yet
Backup Policy and Procedure Ver 4
5 pages
Camera Remote SDK API Reference v1.12.00
No ratings yet
Camera Remote SDK API Reference v1.12.00
435 pages
3off Handover 20.03.11
No ratings yet
3off Handover 20.03.11
7 pages
CBLM Grade 10
No ratings yet
CBLM Grade 10
37 pages
Read Me
No ratings yet
Read Me
2 pages
Usefull Stuff Datastage
100% (1)
Usefull Stuff Datastage
23 pages
VS 2017 COMMUNITY RTW ENU Eula.1033
No ratings yet
VS 2017 COMMUNITY RTW ENU Eula.1033
4 pages
NP330 NP332UserManual en
No ratings yet
NP330 NP332UserManual en
23 pages
Data Loss Prevention
100% (2)
Data Loss Prevention
27 pages
Cp5153 Operating System Internals: For Questions, Notes, Syllabus & Results
No ratings yet
Cp5153 Operating System Internals: For Questions, Notes, Syllabus & Results
2 pages
CN Unit 6
No ratings yet
CN Unit 6
8 pages
Leoni CAT-1 Guardlogix 1 LabMan
No ratings yet
Leoni CAT-1 Guardlogix 1 LabMan
106 pages
Windows Administrator L1 Interview Question - System Administrator
No ratings yet
Windows Administrator L1 Interview Question - System Administrator
7 pages
ADP Updating Via Email
No ratings yet
ADP Updating Via Email
4 pages
Moshell Installation - Windows
No ratings yet
Moshell Installation - Windows
2 pages