Survey Reports

Survey Reports

The Office of Institutional Research & Planning administers several major surveys on a regular basis in order to support top-level decision making at Ohio State.

The results of these surveys and descriptions of their administration are provided below.

Culture Survey

OHR and IRP partnered in 2008, 2011 and 2014 to administer surveys of faculty and staff about their work environment. IRP produces reports comparing the responses across the years. OHR works with the campus to build on strengths and address opportunities for improvement identified in the reports.

No. The majority of items are not included in the summary reports.

We wanted to condense the somewhat overwhelming information contained in the data into a report that contained only relevant and important information. Our approach was to group items into dimensions (or topics), determine which dimensions seemed to relate to outcomes like Overall Satisfaction, and limit our reporting to those dimensions (and the items they comprise). All of the items contain useful information, but it is often much more useful as a way to answer a specific question, rather than helping to provide understanding of the "big picture."

Are there definitions of the dimensions? What do "Faculty Resources" or "Accountability" *mean?* Why do we think that these are the important things to be looking at?

In 2008, we took responses from all of the scaled items (items where there were multiple choices, and choices were on a continuum from positive to negative or vice-versa), subjected them to a statistical technique that groups items together based on how similar the responses to them tend to be within an individual, and then made sure that we could see a conceptual thread running through all of the items in each group. We then combined responses to each item into a single group score. Next, we used these group scores to predict positive outcomes (such as Overall Satisfaction).

 Groups of items that didn't help predict any of our outcomes of interest were discarded. For each of the dimensions that were left, we looked at all of the items and came up with a description of what it was that all of these items had in common. We used that description to name the dimensions. So, when all of the items in a group asked about attributes of the employee’s supervisor, we called the resulting dimension the "Supervisor" dimension. Finally, we tested the items in each group to see which ones were statistically redundant - so that we could do the most concise measurement and reporting. If an item wasn't providing much new information beyond the other items in the group, we removed it.

Finally, we used a statistical technique to see if the dimension model we came up with using the responses in 2008 still made sense when used with the 2011 data. It did, so we retained it.

There are no explicit definitions of these dimensions. They're defined by the items they comprise. So, if you disagree with the dimension name because you think the items are generally asking about something else, you could well be right, and we welcome your input. If you wonder why a seemingly relevant item wasn't included in the dimension, it is either because the item acted fundamentally different from the items we are reporting, or it acted so similarly that it wasn't really necessary to include it for our analyses. If you wonder why other dimensions weren't added, it was because they didn't tell us anything about outcomes at the broadest levels (e.g. University or College, etc.).

Aren't we interested in the answers to specific questions?

We might find the answers to specific individual questions interesting, but those answers can also be confusing and misleading, as the data from a single question may or may not mean what one thinks it means. This can be a bit counterintuitive, as questions and responses seem quite straightforward. When you look at the questions under a dimension heading, however, they tend to be correlated within individuals. What this means in practice is that when we ask a specific question like "Can the Ohio State University become a top ten public university?" the response depends on two things: the respondents' beliefs concerning the top ten issue in particular, AND the respondents' overall perception of OSU. If we only reported that one question, we would have no idea how much somebody's response depended on the former, and how much on the latter. When we report multiple questions that depend on the perception of OSU, we can make a guess that the degree to which a response to that question DOESN'T go along with the others depends mainly on the specific question. Reporting single questions without knowing anything about what goes into the response can, as stated before, be quite misleading. For instance, people who are dissatisfied with their salary, benefits, and start-up funds also tend to be dissatisfied with the cost of parking. What this implies is that when people say that parking is too expensive, it’s simply another way of saying that the employee thinks the university expects too much of them for too little, and that lowering the cost of parking would not necessarily make them respond more positively. On the other hand, if a unit tended to be more satisfied than other units with all of those other things, but NOT the cost of parking, one might think that there is a better chance that parking really is the issue.

The responses to each of these questions were given on a scale with an odd number of possible responses (almost always 5 choices, but occasionally 3 or 7). These scales always had a positive side (satisfied, agree with positive statement, experience no stress, etc.) and a negative side (dissatisfied, agree with a negative statement like "I feel ignored" or "I am likely to leave in the next three years," experience a great deal of stress, etc.). Additionally, with an odd number of responses, there is always one choice exactly in the middle ("neither satisfied nor dissatisfied," "neither agree nor disagree," etc.). For percent positive, we throw out everyone who skipped the question (or gave a non-answer like "not applicable"), and find the percentage who gave responses on the positive side. The remaining percentage of responses were in the middle or on the negative side of the scale. In this way, higher percentages always represent better results.

We have included the survey instruments here, so you can see all of the questions and the contexts in which they were asked.

Our priority is producing reports focused on the dimensions and key outcomes. In the past, after those have been assembled and distributed, we have created tables that show the responses to all items (frequency tables), and host them on our website. We have no intention of hiding or preventing access to the data, and we strive to make these available in a timely manner. We do, however, like to know what the audience for these reports is. Therefore, the site is password protected, but you may contact IRP to gain access to the tables. As of this writing, the release timeline for the 2014 frequency tables is not known.

IRP strives for total accuracy in all of the numbers we report, but errors and omissions do unfortunately occur. When errors are discovered, or when additions or modifications to the data presentation are made, we replace the older versions of reports with the most up-to-date revisions. We hope that a minimal number of revisions will be necessary, and we will try to notify all relevant parties when a revision has taken place.

It means that, when asked how likely it was that they would leave OSU in the next three years, the employee answered "very unlikely" or "somewhat unlikely." You can't simply say that the rest of the employees are "likely to leave," because a great number of responses are actually "neither likely nor unlikely," and that category is lumped in with the negative responses on this report.

Yes, but our summary reports typically only include the responses from faculty. Because most faculty departures are voluntary (particularly among tenure-stream faculty), we see a relationship between our dimensions (topics) and faculty responses. This is especially true when controlling for retirement likelihood. For staff, on the other hand, we don't see a strong relationship between likelihood of leaving and other indicators of outcomes like satisfaction. The hypothesized reason for this is that because many staff departures are or are perceived as involuntary, negative responses may well reflect both the possibility of voluntarily moving on, and the possibilities of being laid off or terminated. As those possibilities are very different things, the data are muddled.

Generally speaking about survey results, there are no 'good' or 'bad' numbers in an absolute sense. The quick response to this question is always another question: "compared to what?" In these reports we try to provide two comparisons: The same group surveyed in another year, and other groups surveyed during the same year. A good number is one that is significantly better than the number observed for the comparison group.

Short Answer:  A difference is labeled “statistically significant” when a difference that big is unlikely to be entirely a fluke.

Longer Answer: When you're making a statistical comparison between two numbers, you usually have to start with the assumption that those two numbers are actually the same number, plus or minus some random noise. For example, if we randomly assigned every employee who responded to the survey into a "heads" group and a "tails" group based on flipping a coin for everyone, and then ran a report comparing the survey responses for the "heads" employees and the "tails" employees, there's no reason to expect that one group would respond more positively than the other, but we probably would see differences, mostly small, some big, on most of the items.

Those differences wouldn't mean anything - having your coin land on heads has nothing to do with how satisfied you are with, for example, your benefits at OSU.  In fact, we’re pretending that we flipped the coin after they responded, so being in the "heads" group or "tails" group couldn't have possibly affected the responses.  And if we did the whole coin flipping thing again, the new "tails' group would come out higher on half of the items where they looked lower the first time, and that difference would be just as meaningless.

So, how do you know if the differences on the report you're looking at represent real differences between the groups, or just the random differences that you see when you compare any two groups of people? You test for statistical significance. When something is reported as significant, it typically means that there is less than a 5% chance you'd see a difference as big as the one reported if there wasn't actually any real difference between the groups. When a difference is not statistically significant, it means that even if we were just picking the groups at random, we'd expect to see a difference that large more than 5% of the time. Put a slightly different way, when you see a significant difference, it means that some difference very probably (but never absolutely definitely) exists between the groups. If you don't see a significant difference, it means that there certainly might be a real difference between the groups, but with the data we have, we simply can't rule out dumb luck as the source of the apparent difference.

 

Of course, it’s worth bearing in mind that those 5% chances add up and you'd probably see a couple of individual items with "significant differences" even on the report where we compared the "heads" group to the "tails" group.  Statistical significance is important information, but is not an infallible indicator.

Short Answer: We're using a chi-squared test with one degree of freedom, which is the most well-known test for the kind of data we have here.

Longer Answer: When you compare two groups' responses to a yes/no (or positive/negative) type of question, you have four numbers to look at: the number of 'yes' responses from the first group, the number of 'yes' responses from the second group, the number of 'no' responses from the first group, and the number of 'no' responses from the second group. If we know how many people said "yes" overall (regardless of what group they’re in), and how many people there are between the two groups in total, we can make a good guess about each of those four numbers. For example, if there is no real difference between the groups, and if twice as many people said 'yes' than 'no', then we'd expect twice as many people in group A to say 'yes' rather than 'no,' and the same for the people in group B, and that's how we'd make our guesses.

 If there is a real difference between the groups, our guesses will probably be pretty bad. When they are bad enough that it would be hard to guess that poorly by dumb luck, we say that the difference, using the chi-squared test, is significant.

There are two reasons this tends to happen. If you haven't read the previous answer ("What test is being used for testing statistical significance?"), please read that first and then finish reading this response…

The chi-squared test looks at how well each of our four guesses matches each of the four actual numbers. When one of our guesses is less than 5, however, the test can no longer reliably tell us how good or bad that guess was. If we guess 1, and the number is 2, our guess looks good in one way (it's only 1 different!) and bad in another way (the real number was twice as big as our guess!). If we can't tell whether our guesses are good or bad, the test can't tell us anything, and we can't say that the difference is significant (no matter how improbable it seems).

When a group has 9 or fewer people, we'll always guess that one of the numbers is less than 5, so no difference will be reported as significant - even 0% vs. 100%. As the size of the smaller group grows, we have a better chance of being able to test the significance. It might seem confusing, but even if a group is much bigger than 9, our guesses can still include small numbers that can't be tested. Imagine we're comparing two departments with 40 respondents each (warning: arithmetic ahead). We start off knowing that 90% of those 80 people (so 72) answered 'positive' to a question. So we guess that 9/10 of the 40 people in group A were 'positives,' and 1/10 were negative. We make the same guesses for group B. We look at the actual percentages, and they're not 90% and 90%, but 100% for group A and 80% for group B. This difference looks huge, but when we go to test our guesses, we see that we guessed that 1/10 of 40 (in other words *4*), people in each group would be 'negatives.' When our guesses get that small, we can't test them, and therefore that 20% difference between groups with 40 employees each won't be reported as significant.

The second reason also has to do with small numbers (more arithmetic ahead…). Imagine each group has 15 people, and we're guessing that each one has 9 positives and 6 negatives. It turns out that one has 10 positives and 5 negatives, the other has 8 positive, 7 negatives. Our guesses are all big enough to test, and since we're only off by one for each of them (9 vs. 10, 9 vs. 8, 6 vs. 5, and 6 vs. 7) our guesses all look very good, meaning that we won't see a significant difference. On the report, however, we would have seen 67% vs. 53% which looked really big.

When groups are small, every single person represents a big percentage. If there are 12 people in your unit, and 9 were positive this time, whereas 8 were positive last time, you’ll have improved from 67% to 75%. That might look like real improvement, but it’s simply a single person changing their mind.

Clinical Assistant Professor, Clinical Associate Professor, Clinical Inst House Staff, Clinical Instructor, Clinical Professor, Lecturer, Lecturer-BE, Senior Lecturer, Senior Lecturer-BE, Visiting Assistant Professor, Visiting Assoc Professor-Be, Visiting Associate Professor, Visiting Asst Professor-Be, Visiting Instrctor-Benefit Elig, Visiting Instructor, Visiting Professor, Visiting Professor-Benefit Elig.

Regular Clinical Faculty were identified based on their inclusion in the Tenure Database maintained by OAA.  See University Rule 3335-5 section 19, "Faculty," for a brief description of the various types of Regular Faculty (Tenure, Clinical and Research) as well as Auxiliary titles.

Generally, Regular Clinical Faculty have the following titles: Professor – Clinical, Associate Professor – Clinical, Assistant Professor – Clinical, Instructor – Clinical. However, Regular Clinical Faculty members may also have titles such as Associate Dean, Chair, Vice Chair, etc.

National Survey of Student Engagement

The National Survey of Student Engagement (NSSE) has been administered at Ohio State on a three-year cycle since 2004.  The data is used to identify aspects of the undergraduate experience that can be improved through changes in policies and practices.

Graduate & Professional Student Survey

The Graduate and Professional Student Survey is conducted every three years by Institutional Research and Planning. All graduate and professional students are invited to respond to questions regarding satisfaction about academic program, teaching and research experiences, career & professional development, campus resources, and quality of life.

Doctoral Exit Survey

The Doctoral Exit Survey report shows the percent of respondents who responded positively to a number of prompts, including satisfaction with support, research, type of post-graduation plans, type of employer, and more.

2015-16 Overall Report Survey Instrument

Gallup Alumni Survey

In 2015, Purdue surveyed nearly 30,000 college graduates and Gallup surveyed more than 8,000 Ohio State alumni, revealing alumni had stronger favorable ratings than other universities in the three areas surveyed: workplace engagement, well-being and attachment.

"Great Jobs, Great Lives" Outcome Report University Comparison Report Press Release