0% found this document useful (0 votes)

93 views

Immediate download Web App Development and Real-Time Web Analytics with Python: Develop and Integrate Machine Learning Algorithms into Web Apps Nokeri ebooks 2024

Web

Uploaded by

burboroady6i

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views

Immediate download Web App Development and Real-Time Web Analytics with Python: Develop and Integrate Machine Learning Algorithms into Web Apps Nokeri ebooks 2024

Web

Uploaded by

burboroady6i

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Get ebook downloads in full at ebookmeta.

com

Web App Development and Real-Time Web Analytics

with Python: Develop and Integrate Machine
Learning Algorithms into Web Apps Nokeri

https://ebookmeta.com/product/web-app-development-and-real-
time-web-analytics-with-python-develop-and-integrate-
machine-learning-algorithms-into-web-apps-nokeri/

OR CLICK BUTTON

DOWNLOAD NOW

Explore and download more ebook at https://ebookmeta.com

Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Real-Time Twilio and Flybase. Build Real-Time Web Apps

Using Twilio and Flybase with Node.js 1st Edition Roger
Stringer
https://ebookmeta.com/product/real-time-twilio-and-flybase-build-real-
time-web-apps-using-twilio-and-flybase-with-node-js-1st-edition-roger-
stringer/
ebookmeta.com

Web Development with Clojure Build Bulletproof Web Apps

with Less Code 3rd Edition Dmitri Sotnikov Scot Brown

https://ebookmeta.com/product/web-development-with-clojure-build-
bulletproof-web-apps-with-less-code-3rd-edition-dmitri-sotnikov-scot-
brown/
ebookmeta.com

Web Application Development with Streamlit: Develop and

Deploy Secure and Scalable Web Applications to the Cloud
Using a Pure Python Framework 1st Edition Mohammad
Khorasani
https://ebookmeta.com/product/web-application-development-with-
streamlit-develop-and-deploy-secure-and-scalable-web-applications-to-
the-cloud-using-a-pure-python-framework-1st-edition-mohammad-
khorasani/
ebookmeta.com

Leadership Team Coaching in Practice Case Studies on

Developing High Performing Teams 2nd Edition Peter Hawkins

https://ebookmeta.com/product/leadership-team-coaching-in-practice-
case-studies-on-developing-high-performing-teams-2nd-edition-peter-
hawkins/
ebookmeta.com
Codename Dweeb Friends to Lovers Christmas Military
Romance 1st Edition Mazzy King

https://ebookmeta.com/product/codename-dweeb-friends-to-lovers-
christmas-military-romance-1st-edition-mazzy-king/

ebookmeta.com

Machinability and Tribological Performance of Advanced

Alloys 2nd Edition George Pantazopoulos

https://ebookmeta.com/product/machinability-and-tribological-
performance-of-advanced-alloys-2nd-edition-george-pantazopoulos/

ebookmeta.com

Budgeting For Dummies Athena Valentine Lent

https://ebookmeta.com/product/budgeting-for-dummies-athena-valentine-
lent/

ebookmeta.com

Vultures First Edition Dalpat Chauhan

https://ebookmeta.com/product/vultures-first-edition-dalpat-chauhan/

ebookmeta.com

ESP32 Formats and Communication: Application of

Communication Protocols with ESP32 Microcontroller 1st
Edition Neil Cameron
https://ebookmeta.com/product/esp32-formats-and-communication-
application-of-communication-protocols-with-esp32-microcontroller-1st-
edition-neil-cameron/
ebookmeta.com
Toddlers Parents and Culture 1st Edition Maria A.
Gartstein

https://ebookmeta.com/product/toddlers-parents-and-culture-1st-
edition-maria-a-gartstein/

ebookmeta.com
Web App Development
and Real-Time Web
Analytics with Python
Develop and Integrate Machine Learning
Algorithms into Web Apps
—
Tshepo Chris Nokeri
Web App Development
and Real-Time Web
Analytics with Python
Develop and Integrate Machine
Learning Algorithms into Web Apps

Tshepo Chris Nokeri

Web App Development and Real-Time Web Analytics with Python: Develop and
Integrate Machine Learning Algorithms into Web Apps

Tshepo Chris Nokeri

Pretoria, South Africa

ISBN-13 (pbk): 978-1-4842-7782-9 ISBN-13 (electronic): 978-1-4842-7783-6

https://doi.org/10.1007/978-1-4842-7783-6

Copyright © 2022 by Tshepo Chris Nokeri

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with
every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an
editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the
trademark.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not
identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to
proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of publication,
neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or
omissions that may be made. The publisher makes no warranty, express or implied, with respect to the
material contained herein.
Managing Director, Apress Media LLC: Welmoed Spahr
Acquisitions Editor: Celestin Suresh John
Development Editor: James Markham
Coordinating Editor: Mark Powers
Cover designed by eStudioCalamar
Cover image by Andrew Kliatskyi on Unsplash (www.unsplash.com)
Distributed to the book trade worldwide by Apress Media, LLC, 1 New York Plaza, New York, NY 10004,
U.S.A. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.
springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science
+ Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
For information on translations, please e-mail booktranslations@springernature.com; for reprint,
paperback, or audio rights, please e-mail bookpermissions@springernature.com.
Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook versions and
licenses are also available for most titles. For more information, reference our Print and eBook Bulk Sales
web page at http://www.apress.com/bulk-sales.
Any source code or other supplementary material referenced by the author in this book is available to
readers on GitHub via the book’s product page, located at www.apress.com/9781484277829. For more
detailed information, please visit http://www.apress.com/source-code.
Printed on acid-free paper
I would like dedicate this book to my family, friends, and
anyone who played a pivotal role in any aspect of my life,
including the Apress team for the continous support.
Table of Contents
About the Author�� xi

About the Technical Reviewer�� xiii

Acknowledgments��xv

Chapter 1: Tabulating Data and Constructing Static 2D and 3D Charts�� 1

Tabulating the Data�� 1
2D Charting�� 4
Box-Whisker Plot�� 6
Histogram�� 7
Line Plot�� 8
Scatter Plot�� 9
Density Plot�� 10
Violin Plot�� 11
Regression Plot�� 13
Joint Plot�� 13
Heatmap�� 14
3D Charting�� 17
Conclusion�� 19

Chapter 2: Interactive Tabulation and Charting�� 21

Plotly�� 21
Tabulating the Data with Plotly�� 22
Interactive Charting�� 23
2D Charting�� 23
Box Plot�� 26
Violin Plot�� 27
Histogram�� 28

v
Table of Contents

Scatter Plot�� 32
Density Plot�� 34
Bar Chart�� 36
Pie Chart�� 38
Sunburst�� 38
Choropleth Map�� 41
Heatmap�� 42
3D Charting�� 43
Indicators�� 44
Conclusion�� 45

Chapter 3: Containing Functionality and Styling for Interactive Charts�� 47

Updating Graph Layout�� 47
Updating Plotly Axes�� 48
Including Range Slider�� 48
Including Buttons to a Graph�� 49
Subplots�� 51
Styling Charts�� 56
Altering Color Schemes�� 57
Color Sequencing�� 57
Customizing Traces�� 59
Conclusion�� 61

Chapter 4: Essentials of HTML�� 63

Communication Between a Web Browser and a Web Server�� 63
URL Structure�� 63
Domain Hosting�� 64
Shared Web Hosting�� 64
Managed Web Hosting�� 65
Web Servers�� 66
HyperText Markup Language�� 66
HTML Elements�� 67

vi
Table of Contents

Meta Tag�� 75
Practical Example�� 75
Viewing Web Page Source�� 78
Conclusion�� 78

Chapter 5: Python Web Frameworks and Apps�� 79

Web Frameworks�� 79
Web Apps�� 80
Flask�� 80
WSGI�� 80
Werkzeug�� 80
Jinja�� 81
Installing Flask�� 81
Initializing a Flask Web App�� 81
Flask App Code�� 82
Deploy a Flask Web App�� 82
Dash�� 82
Installing Dash Dependencies�� 83
Initializing a Dash Web App�� 83
Dash Web App Code�� 83
Deploy a Dash Web App�� 84
Jupyter Dash�� 84
Conclusion�� 85

Chapter 6: Dash Bootstrap Components�� 87

Number Input�� 88
Text Area�� 89
Select�� 89
Radio Items�� 90
Checklist�� 91
Switches�� 92
Tabs�� 93

vii
Table of Contents

Button�� 94
Table�� 95
Conclusion�� 97

Chapter 7: Styling and Theming a Web App�� 99

Styling�� 99
Cascade Styling Sheet�� 100
Bootstrap�� 102
Dash Bootstrapping�� 103
Dash Core Components�� 104
Dash Bootstrap Components�� 104
Implementing Dash Bootstrap Components Theming�� 104
Dash HTML Components�� 106
Dash Web Application Layout Design�� 106
Responsive Grid System�� 107
Conclusion�� 109

Chapter 8: Building a Real-Time Web App�� 111

Dash App Structure�� 112
Importing Key Dependencies�� 112
Loading an External CSS File�� 115
Loading the Bootstrap Icons Library�� 116
Initializing a Web App�� 116
Navigation Bars�� 116
Top Navigation Bar�� 117
Specifying the Responsive Side Navigation Bar�� 121
Specifying the Web App CSS Code�� 123
Side Navigation Bar Menus and Submenus�� 124
Search Functionality�� 127
Creating Interactive Charts�� 129
Containing an Interactive Table and Allowing Generating a Report and
Enabling Download�� 131

viii
Table of Contents

Specifying the App Layout�� 134

Specifying a Callback Function�� 135
Callback for a Responsive Side Navigation Bar�� 136
Callback for URL Routing�� 137
Specifying a Callback Function for Unhiding Content�� 138
Specifying a Callback Function for Interactive Charts�� 139
Specifying a Callback Function for Unhiding an Interactive Table�� 141
Specifying a Callback Function for an Interactive Table�� 142
Specifying a Callback Function for Callback for Data Download�� 143
Run the Dash App�� 143
Conclusion�� 144

Chapter 9: Basic Web App Authentication�� 145

Authentication with Dash Auth�� 145
Authentication with Flask�� 148
Login Form�� 150
Login on Home Page�� 155
Conclusion�� 158

Chapter 10: Dash into a Full Website�� 159

Home Page�� 159
Footer Navigation Bar�� 168
Banner�� 173
Callback to Collapse the Navigation for Small Screens�� 175
Home Page�� 176
Contact Us�� 176
Billing/Checkout�� 183
Conclusion�� 188

Chapter 11: Integrating a Machine Learning Algorithm into a Web App�� 189
An Introduction to Linear Regression�� 189
An Introduction to sklearn�� 190
Preprocessing�� 191

ix
Table of Contents

Splitting Data into Training and Test Data�� 192

Standardization�� 192
Training an Algorithm�� 192
Predictions�� 193
Integrating an Algorithm to a Web App�� 194
Initializing a Web App�� 195
Navigation Bars�� 195
Search Functionality�� 200
Containing Interactive Tables for Results�� 201
Specifying the App Layout and Callbacks for Responsive Side Menus and
URL Routing�� 203
Specifying a Callback to Load Independent Variables Values�� 206
Specifying a Callback for Loading the Dependent Variable Values�� 206
Specifying a Callback for Descriptive Statistics�� 207
Specifying a Callback for Correlation Analysis Results�� 208
Specifying a Callback for an Algorithm’s Predictions�� 209
Specifying a Callback for an Algorithm’s Intercept and Coefficients�� 210
Specifying a Callback for an Algorithm’s Evaluation Results�� 211
Running the Dash App�� 213
Conclusions�� 213

Chapter 12: Deploying a Web App on the Cloud�� 215

Integrated Development Environment�� 215
PyCharm�� 215
Virtual Environment�� 216
File Structure�� 218
Integrating Innumerable Python Files�� 219
Hosting Web Apps�� 219
Dash Enterprise�� 219
Heroku�� 219
Conclusion�� 221

Index�� 223
x
About the Author
Tshepo Chris Nokeri harnesses advanced analytics and
artificial intelligence to foster innovation and optimize
business performance. He delivers complex solutions to
companies in the mining, petroleum, and manufacturing
industries. He received a bachelor’s degree in information
management. He graduated with honours in business
science from the University of the Witwatersrand,
Johannesburg, on a Tata Prestigious Scholarship and a
Wits Postgraduate Merit Award. He was unanimously awarded the Oxford University
Press Prize. Tshepo has authored three books: Data Science Revealed (Apress, 2021),
Implementing Machine Learning in Finance (Apress, 2021), and Econometrics and Data
Science (Apress, 2022).

xi
About the Technical Reviewer
Brij Kishore Pandey works as a software engineer, architect,
and strategist at ADP. He has a wide interest in software
development using cutting-edge tools/technologies in
cloud computing, data engineering, data science, artificial
intelligence, and machine learning. He has 12 years of
experience working with global corporate leaders, including
JP Morgan Chase, American Express, 3M Company, Alaska
Airlines, Cigna Healthcare, and ADP.

xiii
Acknowledgments
Writing a single-authored book is demanding, but I received firm support and active
encouragement from my family and dear friends. Many heartfelt thanks to the Apress
team for their backing throughout the writing and editing process. And my humble
thanks to all of you for reading this; I earnestly hope you find it helpful.

xv
CHAPTER 1

Tabulating Data and

Constructing Static 2D
and 3D Charts
This chapter introduces the basics of tabulating data and constructing static graphical
representations. It begins by demonstrating an approach to extract and tabulate data
by implementing the pandas and SQLAlchemy libraries. Subsequently, it reveals two
prevalent 2D and 3D charting libraries: Matplotlib and seaborn. It then describes a
technique for constructing basic charts (i.e., box-whisker plot, histogram, line plot,
scatter plot, density plot, violin plot, regression plot, joint plot, and heatmap).

T abulating the Data

The most prevalent Python library for tabulating data comprising rows and columns
is pandas. Ensure that you install pandas in your environment. To install pandas in a
Python environment, use pip install pandas. Likewise, in a conda environment, use
conda install -c anaconda pandas.
The book uses Python version 3.7.6 and pandas version 1.2.4. Note that examples in
this book also apply to the latest versions.
Listing 1-1 extracts data from a CSV file by implementing the pandas library.

Listing 1-1. Extracting a CSV File Using Pandas

import pandas as pd
df = pd.read_csv(r"filepath\.csv")

1
© Tshepo Chris Nokeri 2022
T. C. Nokeri, Web App Development and Real-Time Web Analytics with Python,
https://doi.org/10.1007/978-1-4842-7783-6_1
Chapter 1 Tabulating Data and Constructing Static 2D and 3D Charts

Listing 1-2 extracts data from an Excel file by implementing pandas.

Listing 1-2. Extracting an Excel File Using Pandas

df = pd.read_excel(r"filepath\.xlsx")

Notice the difference between Listings 1-1 and 1-2 is the file extension (.csv for
Listing 1-1 and .xlsx for Listing 1-2).
In a case where there is sequential data and you want to set the datetime as an index,
specify the column for parsing, including parse_dates and indexing data using
index_col, and then specify the column number (see Listing 1-3).

Listing 1-3. Sparse and Index pandas DataFrame

df = pd.read_csv(r"filepath\.csv", parse_dates=[0], index_col=[0])

Alternatively, you may extract the data from a SQL database.

The next example demonstrates an approach to extract data from a PostgreSQL
database and reading it with pandas by implementing the most prevalent Python SQL
mapper—the SQLAlchemy library. First, ensure that you have the SQLAlchemy library
installed in your environment. To install it in a Python environment, use pip install
SQLAlchemy. Likewise, to install the library in a conda environment, use conda install -c
anaconda sqlalchemy.
Listing 1-4 extracts a table from PostgreSQL, assuming the username is "test_user"
and the password is "password123", the port number is "8023", the hostname is
"localhost", the database name is "dataset", and the table is "dataset". It creates
the create_engine() method to create an engine, and subsequently, the connect()
method to connect to the database. Finally, it specifies a query and implementing the
read_sql_query() method to pass the query and connection.

Listing 1-4. Extracting a PostgreSQL Using SQLAlchemy and Pandas

import pandas as pd
import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, String, MetaData
engine = sqlalchemy.create_engine(
sqlalchemy.engine.url.URL(
drivername="postgresql",

2
Other documents randomly have
different content
English. The speed was limited not by the memory or the computer
itself but by the input, which had to be prepared on tape by a typist.
Subsequently a scanning system capable of 2,400 words a minute
upped the speed considerably.
Impressive as the translator was, its impact was dulled after a
short time when it was found that a second “translation” was required
of the resulting pidgin English, particularly when the content was
highly technical. As a result, work is being done on more
sophisticated translation techniques. Making use of predictive
analysis, and “lexical buffers” which store all the words in a sentence
for syntactical analysis before final printout, scientists have improved
the translation a great deal. In effect, the computer studies the
structure of the sentence, determining whether modifiers belong with
subject or object, and checking for the most probable grammatical
form of each word as indicated by other words in the sentence.
The advanced nature of this method of translation requires the
help of linguistics experts. Among these is Dr. Sydney Lamb of the
University of California at Berkeley who is developing a computer
program for analysis of the structure of any language. One early
result of this study was the realization that not enough is actually
known of language structure and that we must backtrack and build a
foundation before proceeding with computer translation techniques.
Dr. Lamb’s procedure is to feed English text into the computer and
let it search for situations in which a certain word tends to be
preceded or followed by other words or groups of words. The
machine then tries to produce the grammatical structure, not
necessarily correctly. The researcher must help the machine by
giving it millions of words to analyze contextually.
What the computer is doing in hours is reproducing the evolution
of language and grammar that not only took place over thousands of
years, but is subject to emotion, faulty logic, and other inaccuracies
as well. Also working on the translation problem are the National
Bureau of Standards, the Army’s Office of Research and
Development, and others. The Army expects to have a computer
analysis in 1962 that will handle 95 per cent of the sentences likely
to be encountered in translating Russian into English, and to
examine foreign technical literature at least as far as the abstract
stage.
Difficult as the task seems, workers in the field are optimistic and
feel that it will be feasible to translate all languages, even the
Oriental, which seem to present the greatest syntactical barriers. An
indication of success is the announcement by Machine Translations
Inc. of a new technique making possible contextual translation at the
rate of 60,000 words an hour, a rate challenging the ability of even
someone coached in speed-reading! The remaining problem, that of
doing the actual reading and evaluation after translation, has been
brought up. This considerable task too may be solved by the
computer. The machines have already displayed a limited ability to
perform the task of abstracting, thus eliminating at the outset much
material not relevant to the task at hand. Another bonus the
computer may give us is the ideal international and technical
language for composing reports and papers in the first place. A
logical question that comes up in the discussion of printed language
translation is that of another kind of translation, from verbal input to
print, or vice versa. And finally from verbal Russian to verbal English.
The speed limitation here, of course, is human ability to accept a
verbal input or to deliver an output. Within this framework, however,
the computer is ready to demonstrate its great capability.
A recent article in Scientific American asks in its first sentence if a
computer can think. The answer to this old chestnut, the authors say,
is certainly yes. They then proceed to show that having passed this
test the computer must now learn to perceive, if it is to be considered
a truly intelligent machine. A computer that can read for itself, rather
than requiring human help, would seem to be perceptive and thus
qualify as intelligent.
Even early computers such as adding machines printed out their
answers. All the designers have to do is reverse this process so that
printed human language is also the machine’s input. One of the first
successful implementations of a printed input was the use of
magnetic ink characters in the Magnetic Ink Character Recognition
(MICR) system developed by General Electric. This technique called
for the printing of information on checks with special magnetic inks.
Processed through high-speed “readers,” the ink characters cause
electrical currents the computer can interpret and translate into
binary digits.
Close on the heels of the magnetic ink readers came those that
use the principle of optical scanning, analogous to the method man
uses in reading. This breakthrough came in 1961, and was effected
by several different firms, such as Farrington Electronics, National
Cash Register, Philco, and others, including firms in Canada and
England. We read a page of printed or written material with such
ease that we do not realize the complex way our brains perform this
miracle, and the optical scanner that “reads” for the computer
requires a fantastically advanced technology.
As the material to be read comes into the field of the scanner, it is
illuminated so that its image is distinct enough for the optical system
to pick up and project onto a disc spinning at 10,000 revolutions per
minute. In the disc are tiny slits which pass a certain amount of the
reflected light onto a fixed plate containing more slits. Light which
succeeds in getting through this second series of slits activates a
photoelectric cell which converts the light into proportionate electrical
impulses. Because the scanned material is moving linearly and the
rotating disc is moving transversely to this motion, the character is
scanned in two directions for recognition. Operating with great
precision and speed, the scanner reads at the rate of 240 characters
a second.
National Cash Register claims a potential reading rate for its
scanner of 11,000 characters per second, a value not reached in
practice only because of the difficulty of mechanically handling
documents at this speed. Used in post-office mail sorting, billing, and
other similar reading operations, optical scanners generally show a
perfect score for accuracy. Badly printed characters are rejected, to
be deciphered by a human supervisor.
It is the optical scanner that increased the speed of the Russian-
English translating computer from 40 to 2,400 words per minute. In
post-office work, the Farrington scanner sorts mail at better than
9,000 pieces an hour, rejecting all handwritten addresses. Since
most mail—85 per cent, the Post Office Department estimates—is
typed or printed, the electronic sorter relieves human sorters of most
of their task. Mail is automatically routed to proper bins or chutes as
fast as it is read.
The electronic readers have not been without their problems. A
drug firm in England had so much difficulty with one that it returned it
to the manufacturer. We have mentioned the one that was confused
by Christmas seals it took for foreign postage stamps. And as yet it
is difficult for most machines to read anything but printed material.
An attempt to develop a machine with a more general reading
ability, one which recognizes not only material in which exact criteria
are met, but even rough approximations, uses the gestalt or all-at-
once pattern principle. Using a dilating circular scanning method, the
“line drawing pattern recognizer” may make it possible to read
characters of varying sizes, handwritten material, and material not
necessarily oriented in a certain direction. A developmental model
recognizes geometric figures regardless of size or rotation and can
count the number of objects in its scope. Such experimental work
incidentally yields much information on just how the eye and brain
perform the deceptively simply tasks of recognition. Once 1970 had
been thought a target date for machine recognition of handwritten
material, but researchers at Bell Telephone Laboratories have
already announced such a device that reads cursive human writing
with an accuracy of 90 per cent.
The computer, a backward child, learned to write long before it
could read and does so at rates incomprehensible to those of us who
type at the blinding speed of 50 to 60 words a minute. A character-
generator called VIDIAC comes close to keeping up with the brain of
a high-speed digital computer and has a potential speed of 250,000
characters, or about 50,000 words, per second. It does this,
incidentally, by means of good old binary, 1-0 technique. To add to its
virtuosity, it has a repertoire of some 300 characters. Researchers
elsewhere are working on the problems to be met in a machine for
reading and printing out 1,000,000 characters per second!
None of us can talk or listen at much over 250 words a minute,
even though we may convince ourselves we read several thousand
words in that period of time. A simple test of ability to hear is to play
a record or tape at double speed or faster. Our brains just won’t take
it. For high-speed applications, then, verbalized input or output for
computers is interesting in theory only. However, there are occasions
when it would be nice to talk to the computer and have it talk back.
In the early, difficult days of computer development, say when
Babbage was working on his analytical engine, the designer
probably often spoke to his machine. He would have been stunned
to hear a response, of course, but today such a thing is becoming
commonplace. IBM has a computer called “Shoebox,” a term both
descriptive of size and refreshing in that is not formed of initial
capitals from an ad writer’s blurb. You can speak figures to Shoebox,
tell it what you want done with them, and it gets busy. This is
admittedly a baby computer, and it has a vocabulary of just 16
words. But it takes only 31 transistors to achieve that vocabulary,
and jumping the number of transistors to a mere 2,000 would
increase its word count to 1,000, which is the number required for
Basic English.
The Russians are working in the field of speech recognition too, as
are the Japanese. The latter are developing an ambitious machine
which will not only accept voice instructions, but also answer in kind.
To make a true speech synthetizer, the Japanese think they will need
a computer about 5,000 times as fast as any present-day type, so for
a while it would seem that we will struggle along with “canned” words
appropriately selected from tape memory.
We have mentioned the use of such a tape voice in the
computerized ground-controlled-approach landing system for
aircraft, and the airline reservation system called Unicall in which a
central computer answers a dialed request for space in less than
three seconds—not with flashing lights or a printed message but in a
loud clear voice. It must pain the computer to answer at the snail-like
human speed of 150 words a minute, so it salves its conscience by
handling 2,100 inputs without getting flustered.
The writer’s dream, a typewriter that has a microphone instead of
keys and clacks away merrily while you talk into it, is a dream no
longer. Scientists at Japan’s Kyoto University have developed a
computer that does just this. An early experimental model could
handle a hundred Japanese monosyllables, but once the
breakthrough was made, the Japanese quickly pushed the design to
the point where the “Sonotype” can handle any language. At the
same time, Bell Telephone Laboratories works on the problem from
the other end and has come up with a system for a typewriter that
talks. Not far behind these exotic uses of digital computer techniques
are such things as automatic translation of telephone or other
conversations.
Information Retrieval
It has been estimated that some 445 trillion words are spoken in
each 16-hour day by the world’s inhabitants, making ours a noisy
planet indeed. To bear out the “noisy” connotation, someone else
has reckoned that only about 1 per cent of the sounds we make are
real information. The rest are extraneous, incidentally telling us the
sex of the speaker, whether or not he has a cold, the state of his
upper plate, and so on. It is perhaps a blessing that most of these
trillions of words vanish almost as soon as they are spoken. The
printed word, however, isn’t so transient; it not only hangs around,
but also piles up as well. The pile is ever deeper, technical writings
alone being enough to fill seven 24-volume encyclopedias each day,
according to one source. As with our speech, perhaps only 1 per
cent of this outpouring of print is of real importance, but this does not
necessarily make what some have called the Information Explosion
any less difficult to cope with.
The letters IR once stood for infra-red; but in the last year or so
they have been appropriated by the words “information retrieval,”
one of the biggest bugaboos on the scientific horizon. It amounts to
saving ourselves from drowning in the fallout from typewriters all
over the earth. There are those cool heads who decry the pushing of
the panic button, professing to see no exponential increase in
literature, but a steady 8 per cent or so each year. The button-
pushers see it differently, and they can document a pretty strong
case. The technical community is suffering an embarrassment of
riches in the publications field.
While a doubling in the output of technical literature has taken the
last twelve years or so, the next such increase is expected in half
that time. Perhaps the strongest indication that IR is a big problem is
the obvious fact that nobody really knows just how much has been,
is being, or will be written. For instance, one authority claims
technical material is being amassed at the rate of 2,000 pages a
minute, which would result in far more than the seven sets of
encyclopedias mentioned earlier. No one seems to know for sure
how many technical journals there are in the world; it can be
“pinpointed” somewhere between 50,000 and 100,000. Selecting
one set of figures at random, we learn that in 1960 alone 1,300,000
different technical articles were published in 60,000 journals. Of
course there were also 60,000 books on technical subjects, plus
many thousands of technical reports that did not make the formal
journals, but still might contain the vital bit of information without
which a breakthrough will be put off, or a war lost. Our research
expenses in the United States ran about $13 billion in 1960, and the
guess is they will more than double by 1970. An important part of
research should be done in the library, of course, lest our scientist
spend his life re-inventing the wheel, as the saying goes.
To back up this saying are specific examples. For instance, a
scientific project costing $250,000 was completed a few days before
an engineer came across practically the identical work in a report in
the library. This was a Russian report incidentally, titled “The
Application of Boolean Matrix Algebra to the Analysis and Synthesis
of Relay Contact Networks.” In another, happier case, information
retrieval saved Esso Research & Engineering Co. a month of work
and many thousands of dollars when an alert—or lucky—literature
searcher came across a Swedish scientist’s monograph detailing
Esso’s proposed exploration. Another literature search obviated tests
of more than a hundred chemical compounds. Unfortunately not all
researchers do or can search the literature in all cases. There is
even a tongue-in-cheek law which governs this phenomenon
—“Mooer’s” Law states, “An information system will tend not to be
used whenever it is more painful for a customer to have information
than for him not to have it.”
As a result, it has been said that if a research project costs less
than $100,000 it is cheaper to go ahead with it than to conduct a
rigorous search of the literature. Tongue in cheek or not, this state of
affairs points up the need for a usable information retrieval system.
Fortune magazine reports that 10 per cent of research and
development expense could be saved by such a system, and 10 per
cent in 1960, remember, would have amounted to $1.3 billion. Thus
the prediction that IR will be a $100 million business in 1965 does
not seem out of line.
The Center for Documentation at Western Reserve University
spends about $6-1/2 simply in acquiring and storing a single article
in its files. In 1958 it could search only thirty abstracts of these
articles in an hour and realized that more speed was vital if the
Center was to be of value. As a result, a GE 225 computer IR
system was substituted. Now researchers go through the entire store
of literature—about 50,000 documents in 1960—in thirty-five
minutes, answering up to fifty questions for “customers.”
International Business Machines Corp.

The document file of this WALNUT information retrieval system contains the
equivalent of 3,000 books. A punched-card inquiry system locates the desired
filmstrip for viewing or photographic reproduction.

International Business Machines Corp.

This image converter of the WALNUT system optically reduces and transfers
microfilm to filmstrips for storage. Each strip contains 99 document images. As a
document image is transferred from microfilm to filmstrip, the image converter
simultaneously assigns image file addresses and punches these addresses into
punched cards controlling the conversion process.

The key to information retrieval lies in efficient abstracting. It has

been customary to let people do this task in the past because there
was no other way of getting it done. Unfortunately, man does not do
a completely objective job of either preparing or using the abstract,
and the result is a two-ended guessing game that wastes time and
loses facts in the process. A machine abstracting system, devised by
H. Peter Luhn of IBM, picks the words that appear most often and
uses them as keys to reduce articles to usable, concise abstracts. A
satisfactory solution seems near and will be a big step toward a
completely computerized IR system.
For several years there has been a running battle between the
computer IR enthusiast and the die-hard “librarian” type who claims
that information retrieval is not amenable to anything but the human
touch. It is true that adapting the computer to the task of information
retrieval did not prove as simple as was hoped. But detractors are in
much the same fix as the man with a shovel trying to build a dike
against an angry rising sea, who scoffs at the scoop-shovel operator
having trouble starting his engine. The wise thing to do is drop the
shovel and help the machine. There will be a marriage of both types
of retrieval, but Verner Clapp, president of the Washington, D.C.,
Council on Library Resources, stated at an IR symposium that
computers offer the best chance of keeping up with the flood of
information.
One sophisticated approach to IR uses symbolic logic, the forte of
the digital computer. In a typical reductio ad logic, the following
request for information:
An article in English concerning aircraft or spacecraft, written neither before
1937 or after 1957; should deal with laboratory tests leading to conclusions on an
adhesive used to bond metal to rubber or plastic; the adhesive must not become
brittle with age, must not absorb plasticizer from the rubber adherent, and must
have a peel-strength of 20 lbs/in; it must have at least one of these properties—no
appreciable solution in fuel and no absorption of solvent.
becomes the logical statement:
KKaVbcPdeCfg, and KAhiKKKNjNklSmn.
Armed with this symbolic abbreviation, the computer can dig quickly into its
memory file and come up with the sought-for article or articles.

It has been suggested that the abstracting technique be applied at

the opposite end of the cycle with a vengeance amounting to birth
control of new articles. A Lockheed Electronics engineer proposes a
technical library that not only accepts new material, but also rejects
any that is not new. Here, of course, we may be skirting danger of
the type risked by human birth control exponents—that of unwittingly
depriving the world of a president, or a powerful scientific finding.
Perhaps the screening, the function of “garbage disposal,” as one
blunt worker puts it, should be left as an after-the-fact measure.
Despite early setbacks, the computer is making progress in the job
of information retrieval. Figures of a 300 per cent improvement in
efficiency in this new application are cited over the last several
years. Operation HAYSTAQ, a Patent Office project in the chemical
patent section accounting for one-fifth of all patents, showed a 50
per cent improvement in search speed and 100 per cent in accuracy
as a result of using automated methods. Desk-size computer
systems with solid-state circuits are being offered for information
retrieval.
The number of scientific information centers in this country,
starting with one in 1830, reached 59 in 1940 and now stands at
144. Significantly, of 2,000 scientists and engineers working at these
centers, 381 are computer people.
Some representative information retrieval applications making
good use of computer techniques are the selection of the seven
astronauts for the Mercury Project from thousands of jet pilots,
Procter & Gamble’s Technical Information Service, demonstration of
an electronic law library to the American Bar Association, and Food
Machinery and Chemical Corporation’s Central Research Laboratory.
The National Science Foundation, the National Bureau of Standards,
and the U.S. Patent Office are among the government agencies in
addition to the military services that are interested in electronic
information retrieval.
Summary
The impact of the computer on education, language and
communication, and the handling of information is obviously already
strongly felt. These inroads will be increased, and progress hastened
in the years ahead of us. Perhaps of the greatest importance is the
assigning to the machine functions closer to the roots of all these
things. Rather than simply read or translate language, for example,
the computer seems destined to improve on it. The same applies to
the process of teaching and to the storage and retrieval of data. The
electronic computer has shown that it is not a passive piece of
equipment, but active and dynamic in nature. It will soon be as much
a part of the classroom and library as books; one day it may take the
place of books themselves.
Lichty, © Field Enterprises, Inc.

“How come they spend over a million on our new school, Miss Finch, and then
forget to put in computer machines?”
“’Tis one and the same Nature that rolls on her course, and whoever has
sufficiently considered the present state of things might certainly conclude as to
both the future and the past.”
—Montaigne
11: The Road Ahead

In Book One of Les Miserables, Cosette says, “Would you realize

what Revolution is, call it Progress; and would you realize what
Progress is, call it Tomorrow.” Victor Hugo’s definitions apply well to
what has been termed by some the computer revolution and by
others simply the natural evolution of species. The computer has a
past and a present, differentiated mainly by the slope of the line
plotting progress against time. Its future, which blurs somewhat with
the present, will obviously be characterized by a line approaching the
vertical.
The intelligent machine has been postulated for years, first by the
scientist, then by the science-fiction writer, and now again by the
scientist. Norbert Wiener of cybernetics fame, Ashby and his
homeostat, Grey Walter and his mechanical turtles, A. M. Turing,
John von Neumann, and others, have recently been joined by men
like Ramo, Samuel, Newell, et al., who, if not actually beating the
drums for machine intelligence, do more than admit to the possibility.
For each such pro there are cons, of course, from sincere, intelligent
authorities who in effect holler “Get a horse!” at those who say the
computer is coming.
The Royal Society in England met its stiffest opposition from
otherwise intelligent people who deplored naturalism in any form.
Perhaps such detractors are a necessary goad, a part of progress.
At any rate, science survived the Nicholas Gimcrack jibes of the
Popes and Addisons and Swifts. Darwin was more right than Butler,
though the latter probably made more money from his work. Today,
we find a parallel situation in that there are those who refuse to
accept the computer as an intelligent machine, though it is
interesting to watch these objectors regroup and draw another line
the machine dare not go past.
The writers of science and pseudo-science have often been
accused of fantasy and blue-sky dreams. A case in point in the
electronics field is the so-called “journalistor” or marvelous successor
to the transistor. Such riding off in all directions with each new
laboratory experiment may be justified in that it prods the scientist
who must keep up with the press and his advertising department!
This theory apparently works, and now it seems that the most
startling and fantastic stories come not from writers, but from the
scientists themselves.
In 1960 the Western Joint Computer Conference was held in San
Francisco, and one session was devoted to the fanciful design and
use of a computer with the problem-solving capability of an intelligent
man and the speed and capacity of a high-speed data-processor. It
was proposed to use “tunnel-effect tetrodes” with a switching time of
one ten-billionth of a second as the logic and storage elements.
These would be fabricated of thin-film materials by electron beam
micromachining, and 100 billion of them could be packed into a cubic
inch volume. With these tiny components and new circuit modes a
supercomputer could be built, stored with information, and
programmed to solve what one of the participants called the most
difficult problem the human being faces today—that of bargaining.
This computer has not yet been built; it won’t be for some time.
But design and fabrication are moving in that direction on a number
of fronts. One of these fronts is that of hardware, the components
used in building up the computer circuitry. In a decade we moved
from vacuum tubes to transistors to thin-film devices. Examples of
shrinkage on a gross scale are shown in the use of a single ferrite
core to replace some twenty conventional (relatively speaking!)
components.
Memory circuits once were mechanical relays or tube circuits.
Briefly they were transistorized, and then ferrite cores. Magnetic thin-
film circuits have now been developed, making random-access
storage almost as compact as the sequential tape reel. As circuits
grow smaller the major problem is manipulating them, or even
seeing them, and a sneeze can be disastrous in today’s electronics
plant.
One early journalistor was the molecular circuit. Many scientists
and engineers working in the field scoffed at or derided such a
visionary scheme. But the industry has indeed progressed into the
integrated-circuit technology—a sort of halfway point—and is now on
the fringe of actual functional block techniques in which the individual
components are not discernible. Electronic switching and other
action at the molecular level is close to reality, and hardheaded
scientists now speak calmly of using a homogeneous block of
material as a memory, scanning its three dimensions with the speed
of light to locate any one or more of billions of bits of data in a few
inches of volume.
Writing on the head of a pin was a prophetic bit of showmanship,
and pinhead-size computers will not necessarily have pinhead
mentalities. This progress toward a seemingly hopeless goal takes
on an inexorable quality when the writings of von Neumann are
compared with the state of the art today. Starting out much faster but
much larger than similar elements of the brain, computer
components have been made even faster while simultaneously
shrinking dramatically toward the dimensions necessary to produce
quantitative equivalence. It happens that these goals work out well
together, the one helping the other. Circuitry is now at the point
where speed is ultimately dependent on that limiter of all physical
activity, the speed of light, or of electrons through a conductor. Only
by putting elements closer together can speed be increased; thus
one quality is not achieved at the sacrifice of the other.
International Business Machines Corp.