as attached


The below questions are over chapters 1-5. Please use this template and answer the questions on this form.

1. For each of the datasets note if data privacy is an important issue
a. Census data collected from 1900- 1950
b. IP addresses and visit times of web users who visit your website.
c. Images from Earth orbiting satellites
d. Names and addresses of people from the telephone book
e. Names and email addresses collected from the web.

2. Classify the following attributes as binary, discrete, or continuous. Also classify them as qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have more than one interpretation, so briefly indicate your reasoning if you think there may be some ambiguity.
Example: Age in years. Answer: Discrete, quantitative, ratio

(a) Time in terms of AM or PM.
(b) Brightness as measured by a light meter.
(c) Brightness as measured by people’s judgments.
(d) Angles as measured in degrees between 0◦ and 360◦.
(e) Bronze, Silver, and Gold medals as awarded at the Olympics.
(f) Height above sea level.
(g) Number of patients in a hospital.
(h) ISBN numbers for books. (Look up the format on the Web.)
(i) Ability to pass light in terms of the following values: opaque, translucent, transparent.
(j) Military rank.
(k) Distance from the center of campus.
(l) Density of a substance in grams per cubic centimeter.
(m) Coat check number. (When you attend an event, you can often give your coat to someone who, in turn, gives you a number that you can use to claim your coat when you leave.)

3. Which of the following quantities is likely to show more temporal autocorrelation: daily rainfall or daily temperature? Why?

4. Distinguish between noise and outliers. Be sure to consider the following questions.
a. Is noise ever interesting or desirable? Outliers?
b. Can noise objects be outliers?
c. Are noise objects always outliers?
d. Are outliers always noise objects?
e. Can noise make a typical value into an unusual one, or vice versa?
5. Discuss the advantages and disadvantages of using sampling to reduce the number of data objects that need to be displayed. Would simple random sampling (without replacement) be a good approach to sampling? Why or why not?

6. How might you address the problem that a histogram depends on the number and location of the bins?

7. Show that the entropy of a node never increases after splitting it into smaller successor nodes.

8. Compute a two-level decision tree using the greedy approach described in this chapter. Use the classification error rate as the criterion for splitting. What is the overall error rate of the induced tree?
Note: To determine the test condition at the root note, you first need to computer the error rates for attributes X, Y, and Z. 
For attribute X the corresponding counts are:










For Y the corresponding counts are:











Answering 9 Questions about Data Mining

Topic: Introduction to Data Mining


Answer all the 9 questions on the attached “Questions” document.

Your answers MUST be inside the attached “Questions” document. Use it as TEMPLAET.

Reading Materials:

You will need to use the textbook from Ch 1 to Ch 5.

· Textbook “Introduction to Data Mining” 2nd ED. ISBN: 9780133128901


· Please FULLY ANSWER all the questions. Some questions contain multiple parts, So please answer all questions in full.
· Beside the textbook, Use at least one quality scholarly (peer reviewed) source.

· Please ensure to provide the DOI / URL for all references.
· Please ensure to use in-text Citation.
· Please answer the questions directly. No need for an introduction or conclusion.
· APA 7th guidelines.



Michigan State Universit
University of Minnesota
University of Minnesota
University of Minnesota

330 Hudson Street, NY NY 10013

Director, Portfolio Management: Engineering, Computer Science & Global
Editions: Julian Partridge

Specialist, Higher Ed Portfolio Management: Matt Goldstein

Portfolio Management Assistant: Meghan Jacoby

Managing Content Producer: Scott Disanno

Content Producer: Carole Snyder

Web Developer: Steve Wright

Rights and Permissions Manager: Ben Ferrini

Manufacturing Buyer, Higher Ed, Lake Side Communications Inc (LSC):
Maura Zaldivar-Garcia

Inventory Manager: Ann Lam

Product Marketing Manager: Yvonne Vannatta

Field Marketing Manager: Demetrius Hall

Marketing Assistant: Jon Bryant

Cover Designer: Joyce Wells, jWellsDesign

Full-Service Project Management: Chandrasekar Subramanian, SPi Global

Copyright ©2019 Pearson Education, Inc. All rights reserved. Manufactured in
the United States of America. This publication is protected by Copyright, and
permission should be obtained from the publisher prior to any prohibited
reproduction, storage in a retrieval system, or transmission in any form or by
any means, electronic, mechanical, photocopying, recording, or likewise. For
information regarding permissions, request forms and the appropriate
contacts within the Pearson Education Global Rights & Permissions
department, please visit

Many of the designations by manufacturers and sellers to distinguish their
products are claimed as trademarks. Where those designations appear in this
book, and the publisher was aware of a trademark claim, the designations
have been printed in initial caps or all caps.

Library of Congress Cataloging-in-Publication Data on File

Names: Tan, Pang-Ning, author. | Steinbach, Michael, author. | Karpatne,
Anuj, author. | Kumar, Vipin, 1956- author.

Title: Introduction to Data Mining / Pang-Ning Tan, Michigan State University,
Michael Steinbach, University of Minnesota, Anuj Karpatne, University of
Minnesota, Vipin Kumar, University of Minnesota.

Description: Second edition. | New York, NY : Pearson Education, [2019] |
Includes bibliographical references and index.

Identifiers: LCCN 2017048641 | ISBN 9780133128901 | ISBN 0133128903

Subjects: LCSH: Data mining.

Classification: LCC QA76.9.D343 T35 2019 | DDC 006.3/12–dc23 LC record
available at

1 18

ISBN-10: 0133128903

ISBN-13: 9780133128901

To our families …

Preface to the Second Edition
Since the first edition, roughly 12 years ago, much has changed in the field of
data analysis. The volume and variety of data being collected continues to
increase, as has the rate (velocity) at which it is being collected and used to
make decisions. Indeed, the term, Big Data, has been used to refer to the
massive and diverse data s

Why Choose Us

  • 100% non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee

How it Works

  • Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.