Friday, 21 September 2018

Statistics for Business and Economics

Data and Data Sets
  • Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation.
  •  All the data collected in a particular study are referred to as the data set for the study.
Elements, Variables, and Observations
  •  Elements are the entities on which data are collected.
  •  A variable is a characteristic of interest for the elements.
  • The set of measurements obtained for a particular element is called an observation.
  • A data set with n elements contains n observations.
  • The total number of data values in a complete data set is the number of elements multiplied by the number of variables.
Scales of Measurement
  • Scales of measurement include
·         Nominal
·         Ordinal
·         Interval
·         Ratio
  • The scale determines the amount of information contained in the data.
  • The scale indicates the data summarization and statistical analyses that are most appropriate.
Nominal scale  
       Data are labels or names used to identify an attribute of the element.
       A nonnumeric label or numeric code may be used.

Ordinal scale
       The data have the properties of nominal data and the order or rank of the data is meaningful.
       A nonnumeric label or numeric code may be used.

Interval scale
       The data have the properties of ordinal data, and the interval between observations is expressed in terms of a fixed unit of measure.
       Interval data are always numeric.

Ratio scale
         Data have all the properties of interval data and the ratio of two values is meaningful.
         Ratio data are always numerical.
         Zero value is included in the scale.

Categorical and Quantitative Data
       Data can be further classified as being categorical or quantitative.
       The statistical analysis that is appropriate depends on whether the data for the variable are categorical or quantitative.
       In general, there are more alternatives for statistical analysis when the data are quantitative.

Categorical Data
       Labels or names are used to identify an attribute of each element
       Often referred to as qualitative data
       Use either the nominal or ordinal scale of measurement
       Can be either numeric or nonnumeric
       Appropriate statistical analyses are rather limited

Quantitative Data
       Quantitative data indicate how many or how much.
       Quantitative data are always numeric.
       Ordinary arithmetic operations are meaningful for quantitative data.

Cross-Sectional Data
Cross-sectional data are collected at the same or approximately the same point in time.

Time Series Data
Time series data are collected over several time periods.
Graphs of time series data help analysts understand
       what happened in the past
       identify any trends over time, and
       project future levels for the time series

Data Sources - Existing Sources
       Internal company records – almost any department
       Business database services – Dow Jones & Co.
       Government agencies  - U.S. Department of Labor
       Industry associations – Travel Industry Association of America
       Special-interest organizations – Graduate Management Admission Council (GMAT)
       Internet – more and more firms

Data Acquisition Considerations
Time Requirement
       Searching for information can be time consuming.
       Information may no longer be useful by the time it is available.
Cost of Acquisition
       Organizations often charge for information even when it is not their primary business activity.
Data Errors
       Using any data that happen to be available or were acquired with little care can lead to misleading information.

Descriptive Statistics
       Most of the statistical information in newspapers, magazines, company reports, and other publications consists of data that are summarized and presented in a form that is easy to understand.
       Such summaries of data, which may be tabular, graphical, or numerical, are referred to as descriptive statistics.

Statistical Inference
Population: The set of all elements of interest in a particular study.
Sample: A subset of the population.
Statistical inference: The process of using data obtained from a sample to make estimates and test hypotheses about the characteristics of a population.
Census: Collecting data for the entire population.
Sample survey: Collecting data for a sample.

Analytics
Analytics is the scientific process of transforming data into insight for making better decisions.
Techniques:
       Descriptive analytics: This describes what has happened in the past.
       Predictive analytics: Use models constructed from past data to predict the future or to assess the impact of one variable on another.
       Prescriptive analytics: The set of analytical techniques that yield a best course of action.

Big data and Data Mining:
Big data: Large and complex data set.
Three V’s of Big data:
  •  Volume : Amount of available data
  •  Velocity: Speed at which data is collected and processed
  • Variety: Different data types
Data warehousing
Data warehousing is the process of capturing, storing, and maintaining the data.
       Organizations obtain large amounts of data on a daily basis by means of magnetic card readers, bar code scanners, point of sale terminals, and touch screen monitors.
       Wal-Mart captures data on 20-30 million transactions per day.
       Visa processes 6,800 payment transactions per second.

Data Mining
       Methods for developing useful decision-making information from large databases.
       Using a combination of procedures from statistics, mathematics, and computer science, analysts “mine the data” to convert it into useful information.
       The most effective data mining systems use automated procedures to discover relationships in the data and predict future outcomes prompted by general and even vague queries by the user.

Data Mining Applications
  • The major applications of data mining have been made by companies with a strong consumer focus such as retail, financial, and communication firms.
  • Data mining is used to identify related products that customers who have already purchased a specific product are also likely to purchase (and then pop-ups are used to draw attention to those related products).
  • Data mining is also used to identify customers who should receive special discount offers based on their past purchasing volumes
Data Mining Requirements
       Statistical methodology such as multiple regression, logistic regression, and correlation are heavily used.
       Also needed are computer science technologies involving artificial intelligence and machine learning.
       A significant investment in time and money is required as well.

Data Mining Model Reliability
       Finding a statistical model that works well for a particular sample of data does not necessarily mean that it can be reliably applied to other data.
       With the enormous amount of data available, the data set can be partitioned into a training set (for model development) and a test set (for validating the model).
       There is, however, a danger of overfitting the model to the point that misleading associations and conclusions appear to exist.
       Careful interpretation of results and extensive testing is important.

Ethical Guidelines for Statistical Practice
       In a statistical study, unethical behavior can take a variety of forms including:
       Improper sampling
       Inappropriate analysis of the data
       Development of misleading graphs
       Use of inappropriate summary statistics
       Biased interpretation of the statistical results
       One should strive to be fair, thorough, objective, and neutral as you collect, analyze, and present data.
       As a consumer of statistics, one should also be aware of the possibility of unethical behavior by others.

Ethical Guidelines for Statistical Practice
       The American Statistical Association developed the report “Ethical Guidelines for Statistical Practice”.
       It contains 67 guidelines organized into 8 topic areas:
       Professionalism
       Responsibilities to Funders, Clients, Employers
       Responsibilities in Publications and Testimony
       Responsibilities to Research Subjects
       Responsibilities to Research Team Colleagues
       Responsibilities to Other Statisticians/Practitioners
       Responsibilities Regarding Allegations of Misconduct
       Responsibilities of Employers Including Organizations, Individuals, Attorneys, or Other Clients


No comments:

Post a Comment

Lao Tzu

  Kata Bijak Kehidupan Lakukan hal-hal sulit selagi masih mudah & Lakukan hal-hal besar saat masih kecil. Perbuatan Besar berawal dari p...