Project Everest

Proposal (Operational/Other)

Survey Design and Data Structure

We collect a lot of data at PE, but it isn't always useful. The key to useful data is about data structure, and how we think about it when designing our surveys. Here are some things to incorporate into survey design.

  • Uniformity is key - if we are collecting the data on similar things across projects or even countries, let's make them the same structure. For example, both the Energy and Agriculture team will likely collect data on demographics - let's ensure that both teams ask the same questions in the same manner (i.e. age, sex, income, location, occupation). 
  • Ask about demographics - it's useful to know what type of people have different tendencies across the things we are looking to find out. If we don't ask or record demographics, we won't be able to analyse this. For example, what kind of people are more likely to adopt technological solutions? With demographic data, we could answer that question as "people aged X-Y, with an income of $Z or above, that are full time farmers".
  • When asking about things that could have many different ways to answer them, we should reduce the number of options. For example, "what is your main source of income?" could have answers like "I grow maize", "I grow wheat", "I have some chickens", "I sell electronics at a market", "I drive a bus", etc. We would reduce this to a few options: "Farmer", "Market Stall", "Transport". Think MECE - "mututally exclusive, and collectively exhaustive".
  • Make sure that answers are formatted in the same way: don't have "Farmer", "farmer", "a farmer", and "Farmers". They all need to be uniform - "Farmer".
  • Each column should only have one type of data (i.e. text, number, or date), NOT a mixture of both. For example, if a respondent earns $30-50 a week for the variable "Weekly Income", it should be recorded as "40" NOT "30-50 a week". Or, you would have two variables: "Income Numeric" and "Income Period", where you would record "40" and "Week" respectively.
  • If we are looking to find things out that are more qualitative, use a numeric scale. For example, "how comfortable are you using your mobile phone?" - instead of recording, "he finds it difficult sometimes", we would have a scale from 1-5 where 1 = very difficult and 5 = very easy. So, this would be coded as "2". 

Now, this can all be made easier by using Google Forms (or other similar services), which can then automatically collect responses on Google Sheets (just go to "Responses" and click the green Google Sheets icon). 

Gabe has provided some great content on this as well, so have a read of these as well:

Andrew Vild Jul 3, 2018

Status label added: Proposal (Operational/other)

This is excellent. We need to action this sooner rather than later so we can collect relevant and usable data.

Reply 0

Andrew Vild Jul 3, 2018

I would almost go so far as to suggest that if you can make all data uniform in this way, you could have a master database for each country. With filters that would allow for Energy data, Health data and other data to exist and be more significant. Especially with common aspects, such as income and other demographic aspects.

Users tagged:

Reply 2

Wade Tink Jul 4, 2018

This is fantastic and is to be adopted for survey's particularly within assessment projects. Be conscious that when empathising for defining the problem and the unique value prop understanding emotion (qualitative) is key. How customers use words to describe their problems links to future marketing copy. It also links to offer testing to determine value proposition and market fit.

Reply 2

Amber Johnston Jul 4, 2018

This is super useful for you guys.

Reply 0

Gabriel Raubenheimer Jul 9, 2018

To add to this, I will be running a workshop on different types of data (quantitative and qualitative) in the coming week. This will be written up here, and filmed, for use across countries.

This will cover the difference between the insights gained from qualitative, low volume data, and the information gained from quantitative, high value data. It is imperative that we understand this, because we can't gather one and expect it to behave like the other. Both are indispensable, but we have to be able to differentiate.

I'll link that here this week, and I'd be interested to hear your thoughts Will.

Reply 2