Michael Maiello's picture

    The Data Haunted World

    The one somewhat decent idea I had during theatre history class many years ago was that Aristotle's Poetics had managed to identify nothing more than a Tragic Mean and that if a writer followed Aristotle's rules perfectly they were more likely to write something average than remarkable.  Around this time I was also taking a lot of writing classes and reading books by people who offered structural advice for writers -- it was all based on Aristotle, though updated.  Aristotle told us what all the memorable Greek tragedies had in common.  Syd Field told us what all of the Hollywood blockbusters had in common.  Art is about knowing when to follow the rules and when to break them. If you break the rules well, that's when you get something truly special.

    Timothy Egan writes about this in the Times this morning, as he tries to deal with Big Data in education (Common Core) and media (Nate Silver's new site and mission).  Silver is probably getting a bum rap as in his manifesto for fivethirtyeight he pretty well expresses his desire to use data driven journalism to augment, rather than supplant, a more subjective and humanist approach to reporting and analysis.  That said, he's harsh on wonks and commentators who seem to start with their conclusions, however far-fetched ("Romney will win," is Silver's prime example).

    Anyone skeptical of Big Data has to deal with the fact that when the data work, the data really work.  It won't do to maintain that the sun revolves around the Earth just because it fits better with the old stories and poems.  It won't do to insist that the universe is what we want it to be.

    The problem is that Big Data has an answer for just about everything and it will work, on average.  Nothing is perfect.  We can know, for example, how to teach most students most things and how to encourage them to follow the disciplines that will be most useful in the economy.  But that won't work for everybody.  There are a lot of extraordinary people who will be stifled by such a system.  If you're the type who responds better to free reading, creative projects and experiential learning, you are going to suffer in a binary, test-driven environment.

    Big Data can tell philanthropists what causes to support.  But there might be worthy causes with less bang for buck than the data suggest.  If everybody supports the soup kitchen, does the opera die?

    Big Data might tell us what medical procedures have the best chance of working, based on a wide sample of other people with illnesses and injuries.  But every person is unique.  If you are the rare person who responds to a certain treatment it does you no good to be told that your insurance won't cover it because it doesn't work for most other people.  On the other hand, since you don't know in advance that you are one of the outliers, Big Data might save you from fruitless and potentially harmful experimentation.

    If you choose to buck the data, you take some risk.  When you're right (with a market call, an election call, your NCAA tourney brackets) the Big Data adherents will dismiss it as luck rather than skill.  When you're wrong, and you will be, you will be tempted to rejoin the herd.  I am pretty sure that there are extraordinary people out there who are better off defying the data and who are ill served by completely data driven institutions.  Calling a decision or conclusion "evidence based" is fast becoming the catch-all justification for everything.  But the world is more complicated than that and our institutions need to accommodate that complexity.

     

     

    Topics: 

    Comments

    We can know, for example, how to teach most students most things and how to encourage them to follow the disciplines that will be most useful in the economy. But that won't work for everybody.…

    Big Data can tell philanthropists what causes to support. But there might be worthy causes with less bang for buck than the data suggest. If everybody supports the soup kitchen, does the opera die? …

    Big Data might tell us what medical procedures have the best chance of working, based on a wide sample of other people with illnesses and injuries. But every person is unique. If you are the rare person who responds to a certain treatment it does you no good to be told that your insurance won't cover it because it doesn't work for most other people.

    Obviously, the answer is Bigger Data. I'm being only partially facetious. With more data, we can determine which students won't respond well to which teaching technique. We can provide deeper analysis on causes to support. We can scan your personal DNA, your gut biome DNA, and examine other epigenetic influences to determine how you will respond to certain treatments. The facetious part of that is that I don't think this will be the answer to everything, but the "partially" part is that I do think that such approaches are on the horizon and will mitigate those concerns at least somewhat.


    To paraphrase a famous quote: the fault lies not in our data but in ourselves.

    Data is just data. It exists and we use it whether we or not we formally study it. Any problems in its use result how we use it. That is, in what data we choose to collect, how we assemble, analyze and present it.

    Econometricians and stock market timers famously overemphasize the results of their own analyses but if or rather when they are wrong, it is they who are wrong, not the data.

    Like VA I favor bigger data collection and also greater better transparency and availability of studies which are really just pattern-recognition exercises. The more people (or their algorithms) are able to study the data, the more corroboration of results and, hopefully, even more patterns recognized or quicker recognition of flaws or gaps in data.

     


    One complaint is that the reality of most people getting their data already turned into some kind of information by the likes of CNN and Fox News. In other words, the interpretation of the data, big or small, has already been done and this interpretation is taken as the only possible interpretation. People don't delve, looking for the biases, the flaws in logic, as a general rule. What good is finding the gaps in the data, when that data has been misunderstood or warped for some other agenda?

    But the problem isn't the data, it's the interpretation. We had bad interpretations of small data, and before that we had bad interpretations of anecdotes. [Edit to add: actually, we still do.] One real problem is that people might put more faith in big data, but one reason for that is because in so many cases it does give the right answer. Of course, this gets back to Maiello's point that it can be wrong for the 10%, the 1%, or the 0.1%, and the unfortunate truth is that the more often it's right (as a percentage) the more often we will trust it even when it's wrong. That said, I don't think settling for being wrong more often is the right choice.


    "I don't think settling for being wrong more often is the right choice."

    This is important.  This has been explored a lot in terms of markets.  I have worked for extremely rich asset managers.  Whether they know something or are lucky is subject of much debate.

    But what about medicine?  You can be lucky in the stock market.  Heck, this year, if you had a verifiably perfect NCAA bracket, Warren Buffet would give you $1 billion.  Would that be luck or skill, had someone won?  I don't know.

    But what if it's... this treatment cures cancer in only 1% of patients?  If I am a patient and have exhausted all else, would I not want to try?

    In democracy, we know there is a tyranny of the majority we must protect against.  Is there something similar for Big Data?


    I'm also thinking in terms of something along the line of the controversy around vacinnes. In the information age, it some times isn't the numerically superior group that wins, but the group who captures the most clicks, so to say. Passionate but misguided or misinformed individuals can quickly become the majority in regards to 'having a say' on the matter. When the mainstream media, from local to international, begin transmitting it, validated by its origins in big data, so much energy has to be done putting out the brush fires, nothing is left to expand on the existing knowledge.

    But the vaccine example just goes back to my bad analysis with anecdotes point. Big data has nothing to do with it.

    Aside: the child of someone I know developed autism shortly after getting their vaccine doses. With her second child, she decided not to have him vaccinated. He, too, developed autism, shortly after the time period when he would have been vaccinated. Although this sad story by itself doesn't disprove the theory, it's an anecdote that probably won't be repeated as much as those anecdotes that appear to support the theory (post hoc ergo prompter hoc and all that). I could tell you so many stories about some of the crazy things parents of children with autism have tried out of desperation, and in at least two cases those parents are medical doctors who should know better. However, I cannot judge those parents because I know their world has been turned upside down.


    "we know there is a tyranny of the majority we must protect against.  Is there something similar for Big Data?"

    Say something like teaching statistics .... which just happens to be part of Common Core math as is critical thinking. Media skills are included in the English part, also opinion reading and writing.

    You protect people by enabling them, not by shielding them from the world they will eventually have to live in.

     


    If the treatment cures cancer in only 1% of patients and is excruciatingly painful, wouldn't you want to know how likely it is to help you before you subject yourself to it? Big Data isn't saying you can't do it, it's just predicting how likely it is to be successful. Of course, Big Data might "tell" your insurance company not to pay for it. However, that 1% is the number one gets from small data analysis. Slightly better analysis would tell you that it's actually 5% for white, male, former wrestlers. (Actually, I would be highly suspect of such an analysis without knowing more about the sample size of white, male, former wrestlers.) Near-future analysis will tell you that it's 0.1% for someone with your genomic markers. Doesn't mean it definitely won't work, but it's information that will help you to decide if it's worth the cost to you. Oh, and it's got a 2% chance of killing you out-right.


    I'm going to give you a couple of clues Mike, so you understand exactly what is going on here. 

    Common Core is not a curriculum based on big, medium or small data, Common Core is an attempt to build that data. Here is why I say this, school districts are run by school boards across the country. Every single district has their own curriculum standards, the standards in some district in Texas are far different than the standards of our school district because our school board is filled with engineers and scientists. Common core itself it an attempt to normalize or standardize education across the country in order to begin to collect the data you are describing. (That data doesn't exist right now here in the US because we have disparate demands from each school district in America).

    And you might believe this is disrupting Arts education, but to be very honest with you, that began to be eliminated from curriculums across the country in the 1990's and it had nothing to do with Common Core.  It did have everything to do with the expectation that we must raise the STEM skills of students in our public schools.

    I would also like to see more people in America have a better education in math and science and in the arts. I also want to see curriculum normalized, but until we fund public education differently, i.e. levies are stupid, we aren't going to be able to solve our problems. Fair funding is the very root of our problem. We know very well that children from wealthier districts are scoring the  equivalent of those other first world nations on their math, science exams and they still get a fairly wide breadth of education in the arts. Until we accept this and seek to change it, nothing will change here. Poor kids from poor districts will continue to get the shaft.

    I would say Democracy cannot be sustained through the massive inequality that exists in our current public education system. The statistical evidence lies in my theory, not yours. 

    http://www.wnyc.org/story/311499-wealthy-districts-continue-score-higher...

    http://en.wikipedia.org/wiki/Structural_inequality_in_education

    http://www.yale.edu/ciqle/CIQLEPAPERS/CIQLEWP2009-3.pdf

    http://www.tandfonline.com/doi/abs/10.1080/10824661003634948#.Uy7la9xH3Hg

    http://www.tbp.org/pubs/Features/W07Brown.pdf

    http://www.epi.org/publication/us-student-performance-testing/

    Hope you read some of the links.


    Good links Tmac and you should put them in a blog.  


    Thanks Bruce. I think that most people just buy the hype on one side or another. I'm not an advocate per se of Common Core, but I am an advocate of curriculum normalization. I think it is important, but certainly it will do nothing to address the real problem, which is all about institutional inequality. Our efforts are better spent doing something about how our public school system(s) is funded. Bussing didn't adequately address the problem, it was a good effort I suppose given our system,  but it didn't change anything.


    I would add that one issue then with big data is that the ones who want to try to find what the data is really saying can't keep up with multitudes of quick misinterpretations shot out over the various media. A case where ignorance growing along side knowledge grows at a greater rate.

    Yeah, this is a great point, Trope.  If the conveyor data isn't good at its job, we have a problem.  Also, people have real jobs and can't spend their time trying to fact check CNN.


    I really do not get the complaint  about Common Core stifling creativity.

    Common Core is about standardizing math and English language studies. These are legitimate areas where standardization is useful. They are foundational.

    Why would anyone want to standardize art or music?

     


    The complaint isn't that people want art and music folded into Common Core, it's that the Common Core standards for math, English and science take up so much time that art and music get crowded out.


    Science is not part of Common Core, not yet anyway. 

    And there appears to be plenty of room in the English language arts section to include other arts and music.

    Funny thing I noticed while updating my memory of Common Core: your complaint, including its specific wording, was included in the criticisms section.

    Mark Naison, Fordham University Professor, and co-founder of the Badass Teachers Association, raises a similar objection: "The liberal critique of Common Core is that this a huge profit-making enterprise that costs school districts a tremendous amount of money, and pushes out the things kids love about school, like art and music.[56]

    Badass Teachers Association?  Now there is a group every parent should be proud to entrust their child's education to. :-/

     


    I'm thoroughly confused by Common Core and don't know what to believe.  Different outcomes are presented by people I respect, and there seems to be no middle (common) ground.  (Thank you, Tmac, for the links and your own explanation.)

    Diane Ravitch wrote about it yesterday and I found both her piece and the comments interesting.  I still don't know what to believe, but she talks about transparency while developing the standards and, while she gets the need to standardize some curricula, she also questions who gets to do the standardizing and how rigid it would have to be.


    Latest Comments