Instant download Learn Data Mining Through Excel: A Step-by-step Approach for Understanding Machine Learning Methods 1st Edition Hong Zhou pdf all chapter
Instant download Learn Data Mining Through Excel: A Step-by-step Approach for Understanding Machine Learning Methods 1st Edition Hong Zhou pdf all chapter
com
https://textbookfull.com/product/learn-data-mining-through-
excel-a-step-by-step-approach-for-understanding-machine-
learning-methods-1st-edition-hong-zhou/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/essential-excel-2019-a-step-by-step-
guide-david-slager/
textboxfull.com
https://textbookfull.com/product/endovascular-interventions-a-step-by-
step-approach-1st-edition-wiley/
textboxfull.com
https://textbookfull.com/product/bayesian-statistics-for-beginners-a-
step-by-step-approach-therese-m-donovan/
textboxfull.com
EXCEL VBA Programming By Examples Programming For Complete
Beginners Step By Step Illustrated Guide to Mastering
Excel VBA Thanh Tran
https://textbookfull.com/product/excel-vba-programming-by-examples-
programming-for-complete-beginners-step-by-step-illustrated-guide-to-
mastering-excel-vba-thanh-tran/
textboxfull.com
https://textbookfull.com/product/learn-opengl-learn-modern-opengl-
graphics-programming-in-a-step-by-step-fashion-1st-edition-joey-de-
vries/
textboxfull.com
https://textbookfull.com/product/a-step-by-step-guide-to-qualitative-
data-coding-1st-edition-philip-adu/
textboxfull.com
https://textbookfull.com/product/ansi-c-programming-learn-ansi-c-step-
by-step-1st-edition-yashavant-kanetkar/
textboxfull.com
Hong Zhou
The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
Let’s get right to the topic. Why do we need to learn Excel in our data
mining endeavor? It is true that there are quite a few outstanding data
mining software tools such as RapidMiner and Tableau that make the
mining process easy and straightforward. In addition, programming
languages Python and R have a large number of reliable packages
dedicated to various data mining tasks. What is the purpose of studying
data mining or machine learning through Excel?
Why Excel?
If you are already an experienced data mining professional, I would say
that you are asking the right question and probably you should not read
this book. However, if you are a beginner in data mining, or a visual
learner, or want to understand the mathematical background behind
some popular data mining techniques, or an educator, then this book is
right for you, and probably is the first book you should read before you
start your data mining journey.
Excel allows you to work with data in a transparent manner,
meaning when an Excel file is opened, the data is visible immediately
and every step of data processing is also visible. Intermediate results
are contained in the Excel worksheet and can be examined while you
are conducting your mining task. This allows you to obtain a deep and
clear understanding of how the data are manipulated and how the
results are obtained. Other software tools and programming languages
hide critical aspects of the model construction process. For most data
mining projects, the goal is to find the internal hidden patterns inside
the data. Therefore, hiding the detailed process is beneficial to the users
of the tools or packages. But it is not helpful for beginners, visual
learners, or those who want to understand how the mining process
works. Let me use k-nearest neighbors method (K-NN) to illustrate the
learning differences between RapidMiner, R, and Excel. Before we do
that, we need to understand several terminologies in data mining.
There are two types of data mining techniques: supervised and
unsupervised. Supervised methods require the use of a training dataset
to “train” the software programs or algorithms (such programs or
algorithms are often referred to as machines) first. Programs are
trained to reach an optimal state called a model. This is why a training
process is also called modeling. Data mining methods can also be
categorized into parametric and nonparametric methods. For
parametric methods, a model is just a set of parameters or rules
obtained through the training process that are believed to allow the
programs to work well with the training dataset. Nonparametric
methods do not generate a set of parameters. Instead, they dynamically
evaluate the incoming data based on the existing dataset. You may be
confused by such definitions at this time. They will make sense soon.
What is a training dataset? In a training dataset, the target variable
(also called label, target, dependent variable, outcome variable,
response), the value of which to be predicted, is given or known. The
value of the target variable depends on the values of other variables
which are usually called attributes, predictors, or independent
variables. Based on the attribute values, a supervised data mining
method computes (or so-called predicts) the value of the target
variable. Some computed target values might not match the known
target values in the training dataset. A good model indicates an optimal
set of parameters or rules that can minimize the mismatches.
A model is usually constructed to work on future datasets with
unknown target values in a supervised data mining method. Such
future datasets are commonly called scoring datasets. In an
unsupervised data mining method , however, there is no training
dataset and the model is an algorithm that can directly be applied on
the scoring datasets. K-nearest neighbors method is a supervised data
mining technique.
Suppose we want to predict if a person is likely to accept a credit
card offer based on the person’s age, gender, income, and number of
credit cards they already have. The target variable is the response to
the credit card offer (assume it is either Yes or No), while age, gender,
income, and number of existing credit cards are the attributes. In the
training dataset, all variables including both the target and attributes
are known. In such a scenario, a K-NN model is constructed through the
use of the training dataset. Based on the constructed model, we can
predict the responses to the credit card offer of people whose
information is stored in the scoring dataset.
In RapidMiner, one of the best data mining tools, the prediction
process is as follows: retrieve both the training data and scoring data
from the repository ➤ set role for the training data ➤ apply the K-NN
operator on the training data to construct the model ➤ connect the
model and the scoring data to the Apply Model operator. That’s it! You
can now execute the process and the result is obtained. Yes, very
straightforward. This is shown in Figure 1-1. Be aware that there is no
model validation in this simple process.
Formula
Formula is the most important feature of Excel. Writing a formula is like
writing a programming statement. In Excel, a formula always starts
with an equal sign (“=” without quotation marks).
Upon opening an Excel file, we are greeted with a table-like
worksheet. Yes, every worksheet is a huge table. One reason why Excel
is naturally suitable for data storage, analysis, and mining is because
data are automatically arranged in a table format in Excel. Each cell in
the big table has a name or so-called reference. By default, each column
is labeled by an alphabet, while each row is labeled with a number. For
example, the very first cell at the top-left corner is cell A1, that is,
column A and row 1. The content in a cell, whatever it is, is represented
by the cell reference.
Enter number 1 in cell A1. The value of cell A1 is 1 and A1
represents 1 at this moment.
Enter the formula “=A1*10” (without the double quotation marks)
in cell B1 and hit the Enter key. Note that the formula starts with “=”. Be
aware that this is the only time a formula is presented inside a pair of
double quotation marks in this book. From now on, all formulas are
presented directly without quotation marks.
Enter the text “A1 * 10” in cell C1. Because the text does not start
with “=”, it is not a formula.
Our worksheet looks like Figure 1-3.
6. Press down the left mouse button and drag down to cell A6.
3. Drag down the mouse cursor to cell B6. Our worksheet looks like
Figure 1-5.
Absolute Reference
Assume that there is a ledger keeping the fixed interest rate and the
amounts of loans lent to customers. Our job is to compute the interest
each customer owes. Follow these instructions to complete the
experiment:
1. Open a blank Excel worksheet. Enter texts “Loan”, “Interest”, and
“Rate” in cells A1, B1, and D1. Leave C1 blank. Enter “5%” inside
cell D2 (without quotation marks).
4. In cell B2, enter the formula =A2*D2. This formula calculates how
much interest A2 owes. Everything should be perfectly fine at this
moment.
5. Let’s try another quick autofill skill: select cell B2 ➤ move the
mouse cursor to the left-down corner of cell B2 until the cursor
becomes a black cross ➤ double-click. See Figure 1-6. This double-
click action automatically fills in formulas from cell B3 to cell B12
(B2 already has a formula). Note: Autofill by double-click works
only with vertical downward autofill operations.
Figure 1-6 Autofill by double-click
Our worksheet now looks like Figure 1-7.
Figure 1-7 Autofill failed on interest calculation
Except for cell B2, all other cells in column B do not obtain the
correct result. The reason is because when we autofill from B2 to B12
vertically, the original formula in cell B2 (=A2*D2) changes: the row
index of every cell reference in the formula is automatically
incremented one by one. As shown in Figure 1-7, the formula in cell B3
is =A3*D3. We can imagine that the formula in cell B4 is =A4*D4.
In cell B3, we need the formula to be =A3*D2, that is, when we
autofill from B2 to B3, we need the cell reference A2 to be changed to
A3, but we need the cell reference D2 to stay the same.
As mentioned before, cell reference D2 has two parts: the column
index D and the row index 2. For vertical autofill, the column index will
never change but the row index will. To keep the row index unchanged
during autofill, we need to place “$” before the row index. This means,
the formula in cell B2 should be =A2*D$2. Using the symbol “$” to keep
cell reference(s) in a formula unchanged or locked in autofill operations
is called absolute reference.
With this corrected formula in cell B2, let’s autofill from cell B2 to
B12 again. We should get the correct result this time.
Be aware that in cell B2, the formula =A2*$D$2 works fine, too, for
this specific task. When placing $ before the column index D, the
column index won’t change even when we are performing a horizontal
autofill. However, be aware that we will run into cases where we must
keep only part of a cell reference (either row index or column index
only) unchanged in an autofill operation.
2. Click cell A15; under the main tab Home, click Paste ➤ select Paste
Special. This process is explained in Figure 1-8. Note if your
operating system has right-click enabled, you can right-click cell
A15 ➤ select Paste Special, too.
Figure 1-8 Select Paste Special
3. A small menu shows up. On this menu, choose Values under Paste
as shown in Figure 1-9. Be aware of another available feature
“Transpose” on this menu. It is a good idea to practice this
Transpose feature as it is very useful for data preparation.
Figure 1-9 Paste Values
Part of our worksheet looks similar to Figure 1-10.
Figure 1-10 After Paste Values
IF Function Series
The IF statement is said to be the most used statement in programming,
and this saying is also true in our learning data mining through Excel
effort. As we are going to make use of the function IF and other IF-
related functions very often, it is a good idea for us to get some basic
understanding of them first.
This book comes with a number of sample Excel files. We are going
to make use of them often. These files are available at
https://github.com/hhohho/Learn-Data-Mining-
through-Excel. These Excel files are examples designed for different
data mining practices. There are two ways to download them.
1. Download them together as a compressed file and decompress
them into a folder (directory) on your computer.
2. The sale amount more than $50000 but less than or equal to
$100000 gains 10% commission.
=IF(C2<=50000,0,IF(C2<=100000,(C2-
50000)*10%,50000*0.1+(C2-100000)*20%))
Discovering Diverse Content Through
Random Scribd Documents
Lampe in Rifa't Aghas Sammlung.
Daphne.
Adana 261.
Aden 256.
Aleppo 62, 162, 214 Anm., 235, 241, 246, 251, 252, 255, 257, 260,
283;
Burg 259, A. 253.
Alexandria 232.
Allāt 91.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com