[stub] Soliciting help for experimental design

TL;DR: I ask for help, either entirely theoretical or more concrete, with design of a quick-and dirty independent survey. In particular, I need to know how to estimate the sample size, number of questions and some other things. I expect to see an effect, but 1) a small one, 2) there are going to be undercurrents & sources of heterogeneity that I cannot right now predict, 3) the whole business is money-constrained, and 4) the relevant (professional) people here have too much skin in the game.

In case this appears too involved for you, please regard it as a purely armchair exercise. Examples of such things done elsewhere are also much appreciated.

Note: sorry for errors; I would have posted this in the Open thread but for the length. I am also going from common sense here, as I see it, so maybe this is all not even wrong from the professional point of view.

Background

The last two years of non-specialized Ukrainian high-schools are going to be reformed (from 'knowledge-oriented' to 'competencies-oriented').

The children who will continue studying in common schools after the 9th grade* will be sorted into 'natural sciences' (NS), 'math' (M) or 'humanities' (H). The specialization will be accompanied by changes in the curriculum. H & M will have 3 lessons a week of 'Man and Nature' instead of a lesson of each physics, chemistry, biology, ecology (the latest is currently a separate subject, perhaps optional, I'm not sure) and geography; H & NS will have less math; and M & NS will have 3 lessons a week of 'Man and Society' instead of 'humanities' as a whole, and that only in the penultimate year. There are going to be 3,5 lessons a week dedicated to remedial or additional instruction in the 'reduced' subjects.

All other considerations aside, there is a fear that this exact reform will be 'for the worse' because the level of 'basic knowledge' will lessen, and nobody knows what 'competencies' are and how to measure them. The Ministry of Education's position is that 'knowledge should be useful in life' (everybody agrees) and 'competencies are the ability to use what you learn in school to solve your problems' (everybody disagrees with everybody else), & that the new program, of course, means new textbooks (almost everybody stops doing what they were doing to shout 'Corruption!')

(Another fear (of mine) is that such a general change invites local solutions. Enthusiasts will keep organizing lectures, which will be attended by other enthusiasts, and IT types will advertize on-line education (for which we already have a platform with resources in Ukrainian) that will be taken by a tiny fraction of the schoolchildren, and it will be called progress.)

I haven't seen quantitative propositions to evaluate the reform's effect from its opponents (perhaps they have addressed the Ministry directly, without publicizing their efforts), so this is an attempt to offer one.

The question, broadly

Will the reform lead to a change in the preparedness-to-life of the students who will study for 11 years in non-specialized schools? (That the quality of, for example, NS going into colleges will deteriorate because of less math they will have learned seems to me, and my acquaintances who teach or tutor, inevitable; but this and similar particular outcomes do not exactly match the question.)

A question, narrowly, & a tentative experimental idea

Answering the broad question directly is hard. I thought that perhaps one could measure the children's ability to distinguish utter BS from somewhat-mangled truth, at the point where they leave school, by making them sort a set of quotes from advertisements.

There are difficulties with this approach.

For instance, 'flax seed contains a lot of helpful plant hormones, lignans, that would do wonders for your skin' is a) utter BS, because lignans are not hormones of the plant, but popular sources sometimes use 'substances we get from digested plants that are similar to, in structure and possibly action, our own humoral regulatory metabolites' and 'substances that are produced by the plant to guide its own growth and development' interchangeably** (e.g., this), so b) the message-as-intended might be nearer to the truth than the message-as-written. Since our goal is not to measure 'knowledge', but to measure 'competencies', b) might be what we shall have to go with.

So I thought, maybe the questionnaire should have about 50 such examples, and the students had to state 1) whether the statement contains a formal error, 2) whether it as a whole contains definitive, conclusive information on the basis of which one could formulate a decision (to use the product, to not use the product, just, any kind of decision; options - yes, no, don't care), and 3) do they expect to see it outside of advertizing. To those same children, apply Cognitive Reflection Test.

Test the kids finishing 9th grade and 11th grade and the first-year college students now, then wait for 3 years after the reform's implementation and test a similar sample again. The college students - also ask what school they studied at. (To be clear, I have some imperfect idea about the heterogeneity of Ukrainian schools and colleges, and given the constraints, I can sample only a tiny number of them. However, this part is something I don't think LW can help me with. If I am mistaken, please let me know.) Hopefully, there will be a chance to repeat it in later years, but right now I'm thinking of the barest minimum.

Expectations, with probabilities

This is simply what I expect to see, from a large enough sample. I will, of course, have to choose just a few of these statements (or maybe something altogether different), so I ask your advice on it.

I. The formal error question.

1. In every year, when you pool all sudents of the same age, their answers will hardly differ from random guesses in accuracy. - 85%

2. After the reform, M and H students will do somewhat worse than NS students in the same age group. - 80%.

- but not statistically significantly worse. - 80%.

3. After the reform, NS first-year college students will not do better than before. - 95%

- but when taken separately, students coming from the cities will do slightly better than those who came from the villages (because a major fear of the reform's opponents is that it will disadvantage children in places where the school cannot afford a non-humanitarian specialisation). - 90%.

(Come to think of it, maybe I should just send out the questionnaries to some of my friends from biological faculties in various parts of the country, and compare only the biology freshmen quality. It looks, at least, logistically doable.)

4. Something about no significant difference in the scores of Year 9 and Year 11. Not sure how to put it into words. Maybe I should limit this to the NS students. - 70%.

II. The basis-for-a-decision question.

1. An absolute majority of students will answer this positively. - 80%.

The problem that I'd like to get at is that most of them might take it as 'this flax thing sounds fishy, I don't want to eat it' or 'this flax thing sounds good, I will eat heaps' (meaning question 1 +, question 2+ or 1-, 2+), some - 'maybe good, but I have no skin problems to solve, so *shrug*' (1-, 2 don't care or 1+, 2 don't care), still some will write down (1-, 2-) that I'm not sure how to interpret, and only a few would go with 'this has a formal error in it, and so has no bearing on whether I should eat flax' (1+, 2-). I would take it to mean that most people don't actually care much whether the information they get is true or not when seeing an advert for something.

This is why I'd like to ask them the third question, whether they expect to see this outside of an advertisement (or maybe it should just be "is this taken from an advertisement?") and apply the CRT.

III. The advertisement-or-not question

...not sure I should include it, actually. But it seems pretty much the point. What should I do?

IV. In toto

1. Only a small minority will 'guess' which statements are taken from the advertisements and contain formal errors (and vice versa), and decide whether it can be used as a basis for a decision (which is a matter of personal discretion). - 95%.

- it will positively correlate with high scores on CRT. - 80%.

2. And the percent of such minority will decrease after the reform's implementation. - 70%.

Undercurrents

which make estimating sample size difficult, among other things.

1) poorer schools might report having all three specializations, while in reality there will be only one.

2) children's parents will not send them to specialize as the children want, if the school that offers this opportunity is altogether considered 'weak'.

3) new textbooks will be simply unpolished, because this takes years (just as re-teaching the teachers will take years).

4) the reform will not happen in reality, but it will happen on paper, so any measurement of anything won't be connected to the Ministry's stated activity at all.

...

I realize that measuring real-life competencies is orders more difficult. However, if we don't have any reference point obtained before any change was made, any talk of 'the Ministry ruined it!' that will undoubtedly happen in years to come regardless of what the Ministry actually does, will be pure and unadulterated politics. So - please?

But.

All of this has any meaning if and only if the effect (or lack of effect) can actually be traced to the reform, and isn't just some fluke or a consequence of some other process, so I am quite ready to hear from you that this suggestion won't work or can't be meaningfully interpreted.

Thank you for your time.

Edit: there seems to be a chance that the reform will begin gradually, with only a few schools volunteering at first, but I do not know this for certain.

*as opposed to going into specialized gymnasiums, lyceums and other secondary-education establishments or not studying further.

** which is wrong. Improve your health with auxins or ethylene. And somebody just might try that.

Second edit-to-add.

0. I think that if the reform is indeed delayed, then fairness demands 'we' should test as many subjects as we are able until it is implemented, right?

Going from my personal resources, I will probably just compile a questionnaire, have several hundreds of copies printed out, and send packets of them to my friends in several natural-sciences departments of two (hopefully more) colleges. Then have them send me the answers and do what I can about them... I also have a friend or two working in high-schools, whom I can ask to give the tests to the children, and hopefully do this for several years in a row (changing the questions, of course), but this would be pretty useless for evident reasons. I don't expect anybody else to do this, even if I draft a proposal.

If I end up doing it, I can send you the copy of the questionnaire for whatever purpose.

1. My most urgent questions are:

1) to psychologists:

- has such testing been done before, and what were the criticisms? In general, what criticisms are frequent for this kind of surveys?

- how do people avoid nudging the respondents to politically correct responces?

2) to statisticians:

- how do people plan studies when they don't know the heterogeneity of the sample (the heterogeneity that doesn't yet exist, but certainly will later)? How do they estimate the sample size, etc.?

3) to everybody - please send me examples of glaringly BS messages. The more the better. I think most of them will be at least somewhat ambivalent, so I'd like a big pool to choose from.

LESSWRONG
is fundraising!
LW