Monday, December 5, 2011

Alter Type: It’s Not What You Think, by Steven J. Fink

As I was reviewing a colleague’s SPSS syntax code the other day, I came across a command called “Alter Type.”  It sounded like a new scary movie, a psychiatric DSM code, or an abnormal personality attribute.  


I looked up this code in the Command Syntax Reference Manual (available through the Help menu) and there it was—a very useful command which can be applied in many applications.  

In brief, it does exactly what the name implies.  It changes the Variable Type (string or numeric) or Format of variables, including the Width of string variables.  As I was reading the explanation, it appears to be a new and improved Format statement, a so-called Format on steroids! 

Format statements are often used to change the width and decimals of numeric variables or the format of a date variable.  The Alter Type command changes the Variable Type of any variable in one short command—no need to write elaborate or unnecessary code…just one easy statement. 

As an example, the dataset below comprises 3 variables and 2 lines of data.
DATA LIST FREE
/Numvar (F2)     StringVar (A5)   Datevar (Adate10).

BEGIN DATA
1 1234 10/28/2007
4 5678 10/28/2007
End data.

  • To change a numeric variable to a string (alphanumeric) variable, the command is:
Alter Type Numvar (A2).

  • To change a string (alphanumeric) variable to a numeric variable, the command is:
Alter Type Stringvar (F6.0).

  • To change a date variable to a string variable, the command is:
            Alter Type Datevar (A10).

One note of caution: The Alter Type command does not allow you to create a new version of the variable.  So you may want to save your data first or create a copy of the variable.

So, the next time you need to perform a calculations or merge data and the variable is not in the right Format or Type, use the Alter Type command.  After all, it’s free, you are not crazy, and it’s cool!  







Monday, October 31, 2011

Using SPSS to Calculate Response Rates Blog By Steven Fink, EvansAnalytics

In my previous blog, I showed you how to calculate response rates using SPSS by a subgroup.  As some of you probably noticed, a Total row was missing.  That’s because it’s a little more complicated, but not difficult to program.  Of course, one can perform this calculation in EXCEL, but SPSS can also perform this task better, faster (just like Steven Austin, 6 million $ man!).  To create this Total information, we’ll be using the AGGREGATE command and Merge files.

Below are 10 responses from three departments.



First, we’ll Aggregate the file by Department (/Break=Dept) and create a new dataset, called One.

DATASET DECLARE One.
AGGREGATE
  /OUTFILE='One'
  /BREAK=Dept
  /Respondents=N.

Making sure this new dataset is the Active Dataset, create a population variable, representing the total number of employees in each department.

DATASET ACTIVATE One.
If (dept=1) Population=20.
If (dept=2) Population=10.
If (dept=3) Population=15.
Exe.

Now, create another dataset which contains the sum of the Population and Respondents. Notice the /Break command.

DATASET DECLARE Two.
AGGREGATE
  /OUTFILE='Two'
  /BREAK=
  /Respondents=SUM(Respondents)
  /Population=SUM(Population).



Using the Compute command, create a variable called Dept with a value of 4, representing all respondents. (We’ll add the label later.)

Compute Dept=4.
Exe.    

The Two dataset will look like this:


Next, merge the two files together.  The active dataset is the first Aggregate file created and we are appending the one line dataset called Two.

DATASET ACTIVATE One.
ADD FILES /FILE=*
  /FILE='Two'.
Exe.

Now, we’ll add the Value label for the value of 4.  (Use Add Value Label or else the labels of the first three departments will have no label!) 

Add Value labels
       Dept
            4 'Overall'.
Exe.

Next, calculate the response rate by dividing the number of respondents by the population and multiplying times 100.

Compute Resp_Rate=(Respondents/Population)*100.
Exe.

            Optional code to format Population:
Format Population (F4.0).

Your final data file, called One, will look like this.



Now, using Customer Table, generate the final results.

CTABLES
  /VLABELS VARIABLES=Dept Respondents Population Resp_Rate DISPLAY=LABEL
  /TABLE Dept [C] BY Population [S][MEAN] + Respondents [S][MEAN] + Resp_Rate [S][MEAN 'Response Rate'   F40.2]
  /SLABELS VISIBLE=NO
  /CATEGORIES VARIABLES=Dept ORDER=A KEY=VALUE EMPTY=INCLUDE
  /TITLES
    TITLE='Response Rates by Department and Overall'.


While this dataset was very small, the exact same program will work on datasets comprising hundreds or
thousands of records.  And, to think, writing and executing this program code didn’t cost us 6 million dollars!





Saturday, October 29, 2011

Leveraging Text Analytics to Help Answer Your Business Questions, by Dawn Marie Evans, Evans Analytics


Even though the word “Analytics” has exploded everywhere on the business scene, this field is really still in its infancy.  One of the problems with the word is that “Analytics” means different things to different people.  For example, when talking about “Google Analytics,” this generally means web foot-traffic, represented in counts, charts, frequencies, etc.  For statisticians and data miners, “Analytics” refers to taking data, whether it is financial records, customer data, behavioral data, etc. and building predictive models – models that tell us about likely future behavior – that are not just descriptive of past or current phenomena but predictive of future phenomena:  The purpose is to develop a model to answer important and actionable business questions.

“Analytics” may also refer to using open-ended fields – or textual data to create categories which can be joined back to structured data sets through a technique known as National Language Processing (NLP).  It is important to point out that these methods are sensitive to the context.  For example, if the word that is being viewed is “football,” the algorithms that are applied are able to determine if the word is being used in a negative or positive or even neutral way, such as, “He hates football, “(negative) versus, “They were excited about the football game” (positive).  During the process, the analyst, just as with structured data, makes many important choices along the way.


One of the questions I am frequently asked is what type of textual data can be analyzed?  The answer is almost any type of data and very large datasets are desirable.  Examples of these datasets include streaming data (RSS) feeds from the web, Twitter feeds, blogs, PDF documents, open-end questions on surveys.  Analyzing these datasets can be very labor-intensive and time-consuming.  We are in an age where information has become overwhelming; processing and analyzing such information may be difficult, non-standardized, and expensive.  Text analytics/text mining is a standardized, less expensive approach to glean competitive intelligence and to acquire a better understanding of the voice of customers.  Using a data mining stream one can continuously run it, and refresh it to find new and important results at regular intervals.

What does it take to have a text analytics model built? Evans Analytics uses SPSS Modeler, which has a set of premier text analytics tools. SPSS Modeler comes with libraries already built-into the software.  A library is a pre-defined set of sensitive terms and algorithms that can identify and categorize words and phrases. These libraries are a great place to start with a new project. 

Many clients will request that an analyst take the project a step or two further. The next step would be for  the analyst to build custom libraries – specifically developed for the industry, the company, or the project that is analyzed so that the most relevant terms are developed.  These libraries may be saved and be reused, as needed. 

Some clients may just want simple counts.  For example, a client may only want to know a percentage of customers who preferred product X to product Y or a higher percentage of customers provided more positive comments than negative comments about a particular service.  Other clients may request  newly created categories to join back to other structured data, and then predictive modeling or customer segmentation. They may also want to know that customers who preferred product X were also more likely to live in a specific region, be in a certain age range, and also drive a minivan!  Text Analytics becomes more powerful when added to other data to examine whether differences occur by subgroup.

So, how can you leverage text analytics for your business?  Do you have competitors who are blogging or Tweeting or are there news or RSS feeds that are out there as competitive intelligence, but you haven’t gleaned the important information from them that you should be leveraging?  Do you have open ends in surveys that have overwhelmed you, but you know that important information can be extracted? Do you have research that has previously been handled through qualitative methods, but you think it would be stronger if it was analyzed and joined with your structured data?  If you have answered yes to one of these questions, you have a strong case to consider text analytics!

In my next installation, I will explain how to bring previously constructed categories into SPSS Modeler and re-use old qualitative research in a quantitative way.

Friday, October 28, 2011

Trainer Tip: Using SPSS to Calculate Response Rates Blog By Steven Fink, Evans Analytics

Part 1:

You are managing a large data collection project and your director asked you to provide weekly response rates reports by subgroup, such as department, customer type, region, etc.  Many researchers would input result into EXCEL, write a formula, and format the results.  Skip these extra steps and do it all in SPSS!  Here’s how.

Below is 10 record dataset from three departments. (The survey responses are not provided.)

ID
Dept
Dept_name
1
1
Sales
2
1
Sales
3
2
Marketing
4
2
Marketing
5
2
Marketing
6
3
HR
7
3
HR
8
3
HR
9
3
HR
10
3
HR


Using an IF statement, create a population variable, representing the total number of employees in each department.  (Below is the syntax code; one can use the GUI to perform the task.)

If (dept=1) Population=20.
If (dept=2) Population=10.
If (dept=3) Population=15.
Exe.

2. Using the AGGREGATE command, create another variable to calculate the total number of respondents for each department.  This new variable will be part of the dataset. (Note the MODE=ADDVARIABLES command).

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=Dept
  /Respondents=N.

< Next, calculate the response rate by dividing the number of respondents by the population and multiplying times 100.

Compute Resp_Rate=(Respondents/Population)*100.
Exe.

            Optional code to format Population:
Format Population (F4.0).



Your data file will look like this:

ID
Dept
Dept_name
Population
Respondents
Resp_Rate
1
1
Sales
20
2
10.00
2
1
Sales
20
2
10.00
3
2
Marketing
10
3
30.00
4
2
Marketing
10
3
30.00
5
2
Marketing
10
3
30.00
6
3
HR
15
5
33.33
7
3
HR
15
5
33.33
8
3
HR
15
5
33.33
9
3
HR
15
5
33.33
10
3
HR
15
5
33.33


4  Now, using Custom Table, generate the final results.

CTABLES
  /VLABELS VARIABLES=Dept Respondents Population Resp_Rate DISPLAY=LABEL
  /TABLE Dept [C] BY Population [S][MEAN] + Respondents [S][MEAN] + Resp_Rate [S][MEAN 'Response Rate'  F40.2]
  /SLABELS VISIBLE=NO
  /CATEGORIES VARIABLES=Dept ORDER=A KEY=VALUE EMPTY=INCLUDE
  /TITLES
    TITLE='Response Rates by Department'.

The Custom Tables output looks like this:


Remember to save the Syntax code!  You can run this code and generate the results every week.  No inputting results into EXCEL, with 100% accuracy.

The next blog will show you how to create a total row for all respondents.  (You can’t just add a Total in Custom Tables!) 

Sunday, October 9, 2011

From Survey Questions to Business Applications By Dawn Marie Evans & Steven J. Fink

*This is a re-post from Statistics & Analytics Consultants Blog


As a manager you have important business questions you need answered – and with the explosion of analytics, managers are expected to use the data to drive decisions.  Buzzwords like “Voice of the Customer,” “Customer Segmentation,” “Competitive Intelligence,” and “Business Intelligence are bandied about – but how can you nail down a definitive methodology to answer your important question?

One tool for gaining access to the voice of your customers, employees, or population of interest, is a survey. How do you know when it is time to launch a survey?  The short answer to this is when the available data that you have on hand (generally within your company’s databases) fall short in answering your most pressing business questions.  Why hire an expert?  Because if not properly constructed or sampled, the survey most likely will yield results that will either tell you very little of importance, cannot be joined back to your own data with confidence, or may not be representative of your population of interest.  You want to have confidence in the tool itself and in the results that it yields.

Below are two business case examples where surveys have been used to answer important business questions.  You may find these of interest within your own business context:

Customer Segmentation for an Online Company

Working with a company whose products were sold exclusively online, they had a database of customer records on hand.  However, this information was incomplete regarding certain attitudinal information, as well as behavioral information as to how customers were shopping with competitors – both online and in-store.  Launching a survey to a large sample of customers allowed us to gain insight into attitudes and behaviors of customers.  Using a clustering technique, customers were segmented into several key segments that had very different characteristics, based on attitudes, shopping preferences, demographics, etc.

Using principal components analysis, the survey was then reduced to just a few main questions.  When future customers registered on the site and answered these few questions, along with key demographics, they were placed into one of the segments where they would receive targeted marketing messages. This survey helped to answer business questions of: Who are our customers?  What are their motivations for shopping with us?  What are their buying behaviors by segment and demographics? Who are the major competitors by segments?  From here, the marketing department was able to develop the creative messages targeted specifically to each segment.

What Does a Survey Have to Do With Your Salary?

In another key application, an association requested the administration of an annual Compensation Survey to collect data from their members about how much they earn, how much extra they receive in cash bonuses, and deferred compensation.  Survey results may be disaggregated by level of education, position, region of the country, academic vs. non-academic, public vs. private, etc.  Associations may also examine trend data of their members over 2, 3, or 5 years.  In asking such sensitive information of workers, it is important to hire those who are skilled at constructing surveys in such a way that respondents are likely to follow through to the end of the survey.  If you start with questions that are too sensitive early on – or too complex, it is unlikely that those taking the survey will finish.  It is also important this be done by evaluators external to a person’s place of business – there needs to be a buffer, a sense of safety in answering questions that may be attitudinal with regards to their work, salary, work environment, and so forth.

Who uses this information?  Human Resources departments use this information to figure out how much to offer prospective employees or to determine whether their employees are in line with industry practices.   Similarly, prospective employees may use this information to know how much they can expect to earn.  Current employees may also use this information to compare their compensation to their peers. 

So, the next time you want to know whether you are being paid fairly, go to an association website to compare how much you could be earning.  Where did they get this information? From a survey, of course! 

If you have an important business question, and you current data cannot provide all the answers, ask Evans Analytics at info@evansanalytics to design and analyze a survey for you.

Sunday, September 25, 2011

Including, Don’t Know, Not Sure, Not Applicable, or No Opinion Response Categories By Steven Fink, EvansAnalytics

In the previous three blogs, I wrote about the topics addressing ideal number of points on a Likert scale, providing an odd vs. even number of points, and whether to include a middle category, such as “neither agree nor disagree.”  This final segment addresses whether or not to include Other response categories, such as “don’t know, “not sure,” “not applicable,” or “no opinion” in the list of responses.  If you present any of these Other categories, more respondents will use it than if you do not present it.  On the other hand, respondents may be irritated if that is what they want to say, and there is not an easy way to express their sentiment.  Here are some suggestions to consider.

First, when you are asking knowledge type questions, most survey practitioners recommend providing a “don’t know” or “not sure” response.  It may be useful to know how many or what percentage of respondents “don’t know” the location of a country, names of famous people, financial information, etc.  Another application of using “don’t know” is when only two response categories are given such as “agree” or “disagree.”  People may be reluctant to take such an all or nothing stand.  Another solution is to offer more categories closer to the middle, so that it is easier to capture persons who are “leaning” one way or the other.  Of note, on a telephone survey, a “don’t know” response may be a volunteered response and accepted as an answer.  This is somewhat problematic because some respondents will not even consider this as an option, making the administration of the items not standardized. 

Second, when asking attitudinal questions, the “don’t know” response could mean “no opinion” or “I have mixed feelings about the issue” or “None of your business.”  If you ask questions in which the respondent did not have the experience to form an opinion, then “not applicable” or “no opinion” are important response categories to include.  For example, the survey may be inquiring about attitudes towards your most recent visit to a doctor, but you may not have seen a doctor for several years.  Asking a person about their past behaviors or experiences (especially illegal or embarrassing) may also elicit a higher percentage of “no opinion” responses.  Some survey practitioners suggest asking first whether people have an opinion (or experience) on the topic or issue, and then if they do, to ask what it is. Before implementing this suggestion, consider how many people really “don’t know” or “not applicable” and how useful it is for you to have this information explicitly. 
Finally, previous research examining the number of points on a scale, providing an odd vs. even number of points, including a middle category or Other response categories are sometimes contradictory, making it hard to decide how to proceed.  Perhaps, this is where the Art of Asking (and Answering) Questions comes into play.  Most survey practitioners would recommend the following steps when writing questions:

  1.  try to use the same response scale for most or all your items.
  2. If conducting benchmark or peer analysis research, use the same scale to optimize comparisons.
  3. Consider reverse coding some items to encourage respondents to read each question and to check or any bias (pattern) of responses.
  4. Place “Don’t Know,” “Not Sure,” “Not Applicable,” or “No Opinion” response categories at the end of the scale, not the middle.
  5. Pretest the survey to make sure respondents use the full range of responses to an item.  If responses are clustered on one end, either the question is biased or the question itself is not informative. Expert panels, cognitive interviews, and asking a small group of respondents to complete the questionnaire are commonly used techniques to pretest surveys.
  6. Explore issues of reliability and validity of the items prior to analysis.  Be prepared to address these issues when reporting or publishing your findings.
If there are other topics or questions you would like to see addressed about designing surveys, please e-mail me at Steven@evansanalytics.com.  

Friday, September 16, 2011

Placing Odd. Vs. Even Number of Points on a Likert Scale By Steven J. Fink

Whether you read the news from the web or any print media, one can’t help notice the latest political poll, market research result, health survey, employee satisfaction study, to name just a few.  While many of us just want to “cut to the chase” or only read the executive summary or first paragraph of an article, I head straight toward the actual questions asked. Why?   I want to examine the questions that were asked and the response categories provided to respondents (a bit geeky, yes).  Examples of commonly used surveys could include the following:

·         4 points (Strongly Disagree, Disagree, Agree, Strongly Agree);
·         5 points (Strongly Disagree, Disagree, Neither, Agree, Strongly Agree);
·         6 points (Strongly Disagree, Disagree, Slightly Disagree, Slightly Agree, Agree, Strongly Agree);
·         7 points (where 1 is Strongly Disagree and 7 is Strongly Agree);
·         9, 10, or “even” 100 points (bad joke, sorry). 

As trivial as it may sound, providing an odd number of response categories vs. an even number of response categories on a Likert scale can make a difference in the results.  Which is right? And more importantly, which one should you use?

Research on this topic began in the 1950’s and there still is little consensus regarding rating scale length.  Much of the literature focuses on whether to provide a middle or “indifferent” point for respondents who are undecided or indifferent about the two ends of the continuum.  Even more obvious, however, is that providing a middle category increases the size or percentage of that category.  Here’s an example using hypothetical data:


Excluding Middle
Including Middle
Strongly Agree
25%
20%
Agree
25%
20%
Neither
--
20%
Disagree
25%
20%
Strongly Disagree
25%
20%

Fifty percent of respondents agreed with the statement when the middle category is excluded, compared to only 40% of respondents agreed to the statement if the middle category (neither) is included.  Similarly, the percentage of respondents disagreeing with the statement will be higher if the middle category is excluded, compared to when the middle category is included.  It is important to note that the addition of the middle category does not change the ratio of the “pro” (agree) or “con” (disagree) responses.

Should you use an odd or even number of points on your Likert scale?  Think of the purpose of the research and how you intend to analyze/report your data.  And don’t be fooled by someone telling you, “If I want positive results, design the questions/response categories so that that I get it.”  You now know better.

I’ll have more to say about the pros/cons of providing a middle category using a Likert scale in my next blog.