The misrepresentation of evidence

About a week ago I was involved in a heated twitter debate about this blog post. I felt, as I said on twitter and in my extensive comments about the blog, that it entirely misrepresented the evidence about Adverse Childhood Experiences by implying that because of risk multipliers within particular population groups, certain negative outcomes were almost inevitable for people with multiple ACEs. The author repeatedly asks rhetorical questions like “If 1 in 5 British adults said they were abused in childhood in the last CSEW (2017), why hasn’t our population literally collapsed under the weight of suicides, chronic illness, criminality and serious mental health issues?” Likewise, she asks how anyone can be successful after childhood abuse if the ACEs research is correct. I replied to explain that this simply isn’t what the data tells us or what risk multipliers mean, so the exceptions are expected rather than proof the finding is incorrect. For example the claim that a 1222% increase in the risk of suicide amongst people with 4 or more ACEs meant these people were doomed, in reality means that the odds increase from 1 in 10,000 to 1 in 92, meaning that 91 of every 92 people with 4+ ACEs do not die by suicide.

ACEs are a very useful population screening tool, and have provided incontrovertible evidence of the links between traumatic experiences in childhood and numerous social, psychological and medical outcomes that has been highly informative for those of us designing and delivering services. To me it seems like an example of how a simple piece of research can have a massive impact in the world that benefits hundreds of thousands of people. Yet that blog repeatedly implies ACEs are a harmful methodology that “targets” individuals and to is used to “pathologise and label children, arguing that those kids with the high ACE scores are destined for doom, drugs, prison, illness and early death”. It has been my experience that ACEs are used not to pathologise individuals, but to to highlight increased vulnerability, and to identify where there might be additional need for support. For example, I have used this data to argue for better mental health services for Looked After Children.

I felt that the repeated misrepresentation of the maths involved in interpreting risk multipliers undermined the entire message of the blog, to which I was otherwise sympathetic. (For the record, it is entirely appropriate to highlight bad practice in which it seems certain professionals are applying ACE scores to individuals inappropriately, and making people feel that their life chances are restricted or their parenting under scrutiny purely because of their childhood experiences of trauma). But unfortunately the author took my polite, professional rebuttal of elements of her blog as a personal attack on her – to the extent that she misgendered and blocked me on twitter, and refused to publish my response to her comments about my reply to her on the blog. That’s a shame, as the whole scientific method rests on us publishing our findings and observations, and then learning from the respectful challenge of our ideas by others with knowledge of the topic. But I guess we are all prone to defending opinions that fit with our personal experience, even if they don’t fit with the evidence.

Thinking about how uncomfortable it felt to see someone I considered to be a peer whose expertise I respected misrepresenting the evidence and being unwilling to correct their misconceptions when challenged, but instead trying to discredit or silence those making the challenge, it struck me that this was an example that highlighted a wider issue in the state of the world at the moment. Evidence is being constantly misrepresented all around us. Whether it is the President of the USA saying there is a migrant crisis to justify a wall (or any of the 7644 other false or misleading statements he has made in office) or the claims on the infamous big red bus that Brexit would give the NHS £350 million per week, or Yakult telling us their yoghurt drink is full of “science (not magic)” now that they can’t pretend live cultures are good for digestive health. There are false claims everywhere.

I stumbled into another example just before I started writing this blog, as I (foolishly) booked accommodation again through booking.com, despite the horrible experience I had last time I tried to use them (which remains unresolved despite the assurances from senior managers that they would reimburse all of my costs). The room was terrible*.

So I felt like I should be able to reflect my negative experience in my review. But oh no, Booking.com don’t let you do that. You see, despite seeing that properties appear to have scores out of ten on every page when booking, you can’t score the property out of ten. What you can do is to determine whether you give a smiley that ranges from unhappy to happy for each of their five ratings (which don’t, of course, include quality of sleep or feeling safe). So if you think the location was convenient, the property gets a score above five out of ten, no matter what other qualities mean you would never wish to sleep there again. But worse than that, the Booking.com website forces reviewers to give a minimum length of both positive and negative comments, but only displays the positive comments to potential bookers. So my “It was in a quiet, convenient location” gets shown to clients, but you have to work out how to hover in the section that brings up the review score, then click the score to bring up the averages, then click again to access the full reviews, and then shift them from being ranked by “recommended” to showing them in date order to actually get an objective picture. Then you suddenly see that at least half the guests had terrible experiences there. However, there is no regulator to cover brokers, and fire regulations and legal protections haven’t caught up with private residences being divided up and let out as pseudo-hotel rooms.

But just as Boris has faced no consequences for his bus claims (even though he stretched them further still after the ONS said he had misrepresented the truth), and Trump no consequences for his lies, and the consultants selling contracts worth hundreds of thousands of pounds of public funds to children’s social care departments proudly told me they didn’t care about evidencing their claims, so the world carries on with little more than a tut of disapproval towards people and businesses who intentionally mislead others. Maybe I’m in the minority to even care. But I do care. I feel like it is the responsibility of intelligent people and critical thinkers, people in positions of power, in the professions and particularly in the sciences, to ensure that we are genuinely led by the evidence, even if that makes the picture more complicated, or doesn’t confirm our pre-existing beliefs. To counteract this age of misinformation, we all need to be willing to play our part. That is why I have always placed such a focus on evaluations and research, and have developed my screening tools so slowly and thoroughly, despite the fact that potential customers probably don’t see this as necessary. I believe that as much as possible, we should be promoting the value of evidence, educating the public (including children) to be able to think critically and evaluate the evidence for claims, and stepping up to challenge misleading claims when we see them.

*I booked a room in a property in London which they have euphemistically called “Chancery Hub Rooms” to stay over whilst I delivered some training in Holburn. It wasn’t a hostel or a hotel, but just a small terraced house. This time it had keypad entry to the property and to the individual room, which is a system that I have used successfully several times in Cambridge. Unfortunately it didn’t work so well in London, as they changed the codes twice without informing me. Once this resulted in locking me out of the room on the night of my arrival (and meaning that the beeping on the door as I tried the various codes they sent me woke the lady in the neighbouring room, due to the total lack of sound insulation in the property) and then by locking me out of the property the following evening, when all my stuff was locked inside. It also had glass inserts above the room doors that meant your room lit up like Times Square when anyone turned the landing light on. I then discovered that the building (which I already recognised to be small, overcrowded and not complying with fire regulations) had walls like cardboard, when the couple in the next room had noisy sex, followed by noisy conversation and then a full blown argument that lasted from 3am to 4am – despite me eventually in desperation asking them quite loudly whether they could possibly save it for a time that wasn’t keeping everyone else in the building awake. Of course Booking.com didn’t see it as their problem, and the property management company just blamed the other guests for being inconsiderate.

Communicating the value of evidence

I presented at a couple of conferences over the last few weeks about my BERRI system. And I was struck, once again, by how little weight is given to evidence when it comes to services that are commissioned in the social care sector. Various glossy marketing claims and slick consultants were successfully persuading commissioners and service managers that it was equivalent to use their systems and “metrics” (in which people gave entirely subjective ratings on various arbitrarily chosen variables) to using validated outcome measures. By validated outcome measures, I mean questionnaires or metrics that have been developed through a methodical process and validated with scientific rigour that explores whether they are measuring the right things, whether they are measuring them reliably, whether those measures are sensitive to change, and whether the results are meaningful. A pathway that then leads to an established scientific process of critical appraisal when those studies are presented at conferences, published and made subject to peer review.

But outside of the academic/scientific community it is very hard to prove that having a proper process is worth the time and investment it takes. It means that you are running a much longer race than those who work without evidence. At one event last week, I asked a question of a consultancy firm making hundreds of thousands of pounds out of “improving children’s social care outcomes”, about their basis for what they chose to measure, how they measure it, and how they had validated their claims. The answer was that they were confident that they were measuring the right things, and that having any kind of scientific process or validation would slow down their ability to make impact (aka profit). My answer was that without it there was no evidence they were making any impact.

They couldn’t see that their process of skipping to the doing bit was equivalent to thinking that architects, structural drawings, planning permission and buildings regulation control slow down building houses, and selling houses they’d built without all that burdensome process. Thinking anyone can build a house (or a psychometric measure to track outcomes) feels like an example of the Dunning-Kruger effect, the idea that those with the least knowledge overestimate their knowledge the most. But the worst thing was that those commissioning couldn’t see the difference either. They find the language of evidence to be in the domain of academics and clinicians, and don’t understand it, or its importance. We are in an age where expertise is dismissed in favour of messages that resonate with a populist agenda, and it seems that this even applies when commissioning services that affect the outcomes of vulnerable population groups. I don’t know how we change this, but we need to.

For those who don’t know, I’ve been working on BERRI for 12 years now, on and off, with the goal of being able to map the needs of complex children and young people, such as those living in public care, in a way that is meaningful, sensitive to change and helps those caring for them to meet those needs better. For as long as I’ve worked with Looked After children, there has been a recognition of the fact that this population does worse in life along a wide range of metrics, and a desire to improve outcomes for them for both altruistic and financial reasons. Since Every Child Matters in 2003, there have been attempts to improve outcomes, defined with aspirations in five areas of functioning:

  • stay safe
  • be healthy
  • enjoy and achieve
  • make a positive contribution
  • achieve economic well-being

A lot of services, the one that I led included, tried to rate children on each of these areas, and make care plans that aimed to help them increase their chances in each area. Each was supposed to be associated with a detailed framework of how various agencies can work together to achieve it. However, whilst the goals are worthy, they are also vague, and it is hard to give any objective score of how much progress a young person is making along each target area. And in my specific area of mental health and psychological wellbeing they had nothing specific to say.

As with so much legislation, Every Child Matters was not followed up by the following government, and with the move of children’s social care and child protection into the remit of the Department for Education, the focus shifted towards educational attainments as a metric of success. But looking primarily at educational attendance and attainments has several problems. Firstly it assumes that children in Care are in all other ways equivalent to the general population with which they are compared (when in fact in many ways they are not, having both disproportionate socioeconomic adversity and disproportionate exposure to trauma and risk factors, as well as much higher incidence of neurodevelopmental disorder and learning disability). Secondly it limits the scope of consideration to the ages in which education is happening (primarily 5-18, but in exceptional circumstances 3-21) rather than the whole life course. Thirdly it doesn’t look at the quality of care that is being received – which has important implications for how we recruit, select and support the workforce of foster carers and residential care staff, and what expectations we have of placement providers (something I think critical, given we are spending a billion pounds a year on residential care placements, and more on secure provision, fostering agencies and therapy services that at the moment don’t have to do very much at all to show they are effective, beyond providing food, accommodation, and ensuring educational attendance). Finally, it masks how important attachment relationships, and support to improve mental health are in this population. I can see that strategically it makes sense for politicians and commissioners not to measure this need – they don’t want to identify mental health needs that services are not resourced to meet – but that is significantly failing the children and young people involved.

In my role as a clinician lead for children in Care and adopted within a CAMH service, I kept finding that children were being referred with behaviour problems, but underlying that were significant difficulties with attachment, and complex trauma histories. I was acutely aware that my service was unable to meet demand, leading us to need some system to prioritise referrals, and that there was a lot of ambiguity about what was in the remit of CAMHS and what was in the remit of social care. I wasn’t alone in that dilemma. There were a lot of defensive boundaries going on in CAMHS around the country, rejecting referrals that did not indicate a treatable mental health condition, even if the child had significant behavioural or emotional difficulties. The justification was that many children were making a normal response to abnormal experiences, and that CAMHS clinicians didn’t want to pathologise this or locate it like an organic condition inside the child, so it should best be dealt with as a social care issue.

On the other hand, I was mindful of the fact that this population have enormous mental health needs, having disproportionately experienced the Adverse Childhood Experiences that are known to lead to adverse mental and physical health outcomes. Research done by many of my peers has shown that two thirds to three quarters of Looked After children and young people score over 17 on the SDQ (the Strengths and Difficulties Questionnaire – the government mandated and CORC recommended measure for screening mental health need in children) meaning they should be eligible for a CAMH service, and various research studies have shown that 45% of LAC have a diagnosable mental health condition, but the resources are not available to meet that need. As The Mental Health Foundation’s 2002 review entitled “Mental Health of Looked After Children” put it:

Research shows that looked-after children generally have greater mental health needs than other young people, including a significant proportion who have more than one condition and/or a serious psychiatric disorder (McCann et al, 1996). But their mental health problems are frequently unnoticed or ignored. There is a need for a system of early mental health assessment and intervention for looked-after children and young people, including those who go on to be adopted.

My initial goal was to develop a new questionnaire to cover the mental health and psychological wellbeing issues that this population were experiencing, as well as considering attachment/trauma history and the child’s ability to trust others and form healthy relationships, and the behaviours that these often expressed through. I was also interested in what issues determined the type of placement given to a child, and the risk of placement breakdown, as well as what opened doors to specialist services such as therapy, and whether those services and interventions really made any difference. I therefore ran two focus groups to explore what concerns carers and professionals had about Looked After children and young people, and asked them about what they saw that might indicate a mental health problem, or any related concerns that led people to want my input, or that caused placements to wobble or break down. One group contained foster carers and the professional networks around them (link workers, children’s social workers, the nurse who did the LAC medicals, service managers) and one contained residential care workers and the professional networks around them (home managers, children’s social workers, the nurse who did the LAC medicals, service managers). I wrote their responses down on flip-charts, and then I sorted them into themes.

I had initially thought that it might cluster as behavioural and emotional, or internalising and externalising, but my items seemed more complex than that. In the end there were five themes that emerged:

  • Behaviour
  • Emotional wellbeing
  • Risk (to self and others)
  • Relationships/attachments
  • Indicators (of psychiatric or neurodevelopmental conditions)

The first letters gave me the name for the scale: BERRI. I then piloted the scale with various carers, and then with a group of clinical psychologists involved with CPLAAC (the national network within the British Psychological Society that contained about 300 Clinical Psychologists working with Looked After and Adopted Children that I was chair of for about six years). I then added a life events checklist to set the issues we were identifying in context.

The working group I chaired in 2007 on the state of outcome measurement for Looked After and adopted children (on the invitation of CORC) came to the conclusion that no suitable metrics were available or widely used. We therefore agreed to further develop and validate the various tools that members of the group had home-brewed, including my BERRI. There was acknowledgement that it takes a lot of work to develop a new psychometric instrument in a valid way, but a consensus that this needed to be done. So I resolved to find a way to follow that proper process to validate and norm BERRI, despite the lack of any funding, ring-fenced time or logistical support to do so. The first challenge was to collect enough data to allow me to analyse the items on the measure, and the five themes I had sorted them into. But I didn’t have the resources to run a research trial and then enter all the data into a database.

My way around this barrier was to get my peers to use the measure and give me their data. To do this I took advantage of some of the technically skilled people in my personal network and developed a website into which people could type anonymous BERRI scores and receive back a report with the scores and some generic advice about how to manage each domain. I tested this out and found my peers were quite enthused about it. We then had a formal pilot phase, where 750 BERRIs were completed by Clinical Psychologists about children and young people they were working with. I then talked about it with some young people and care leavers to check that they felt the areas we were covering were relevant and helpful to know about. Then I started to use the system in a large pilot with residential care providers and developed tools to focus in on particular concerns as goals to work on, and track them day by day or week by week, as well as creating tools to give managers an overview of the progress of the children in their care. We’ve had a lot of feedback about how useful and game-changing the system is, and how it has the potential to revolutionise various aspects of commissioning and decision-making in children’s social care.

But I really wanted the process to be one in which we were truly scientific and based our claims on evidence. I’ve never marketed the BERRI or made claims about what it can do until very recently, when I finally reached a point where we had evidence to substantiate some modest claims*. But to me the process is critical and there is still a long way to go in making the data as useful as it can be. So from day one a process of iterative research was built in to the way we developed BERRI. As soon as it was being used by large numbers of services and we had collected a large data set we were able to look closely at how the items were used, the factor structure, internal consistency and which variables changed over time. We ran a series of validity and reliability analyses including correlations with the SDQ, Conners, and the child’s story – including ACEs, placement information and various vulnerability factors in the child’s current situation. But even then I worried about the bias, so a doctoral student is now running an independent study of inter-rater reliability and convergent/divergent validity across 42 children’s homes.

BERRI will always be developed hand in hand with research, so that there is an ongoing process of refining our outputs in light of the data. The first step in that is getting age and gender norms. But the data can also indicate what we need to do to improve the measure, and the usefulness of the output reports. For example, it seems that it might be meaningful to look at two aspects of “Relationships” being distinct from each other. If the evidence continues to show this, we will change the way we generate the reports from the data to talk about social skills deficits and attachment difficulties separately in our reports. We might also tweak which items fall into which of the five factors. We also want to check that the five factor model is not based on the a priori sorting of the items into the five headings, so we are planning a study in which the item order is randomised on each use to repeat our factor analysis. We also want to explore whether there are threshold scores in any factor or critical items within factors that indicate which types of placements are required or predict placement breakdown. We might also be able to model CSE risk.

The results to date have been really exciting. I have begun to present them at conferences and we are currently preparing some articles to submit for publication. For example, I am currently writing up a paper about the ADHD-like presentation so many traumatised children have, and how we have learnt from our BERRI research that this reflects early life ACEs priming readiness for fight-or-flight rather than proximal events or a randomly distributed organic condition. But the findings depend on all the groundwork of how BERRI was developed, our rigorous validation process and the data we have collected. It is the data that gives us the ability to interpret what is going on, and to give advice at the individual and organisational level.

So you’ll forgive me if I’m somewhat cynical about systems that request a subjective likert rating of five domains from Every Child Matters, or an equally subjective score out of 100 for twelve domains pulled from the personal experience of the consultant when working in children’s social care services, that then claim to be able to map needs and progress without any validation of their methodology, areas to rate, sensitivity to change or the meaning of their scores. Having gone through the process the long way might put me at a commercial disadvantage, rather than going straight to marketing, but I like my houses built on the foundations of good evidence. I can feel confident that the load bearing beams will keep the structure sound for a lifetime when they are placed with precision and underpinned by the calculations and expertise of architects, structural engineers, surveyors and buildings control, rather than cobbled together as quickly as possible, marketed with amorphous claims and sold on rapidly to anyone who will pay for them. After all, I’m not in it to make a quick buck. I know my work is a slow and cumulative thing, and BERRI still has a long way to go before it can create the greatest impact. But my goals are big: I want to improve outcomes for children and young people who have experienced adversity, and I want that impact to influence the whole culture of children’s social care provision in the UK and to continue to be felt through the generations. And to do that, I need to build the thing properly.

* that carers, therapists and managers find it useful and easy to use, that using the BERRI pathway demonstrated an improvement of 14% over 6 months for the first 125 children placed on the system, and that BERRI has a robust factor structure, good reliability between raters, and the basic statistical qualities that suggest sufficient validity for use. We also have some testimonials, including a commissioner who used BERRI to map the needs of 15 high tariff children and found four suitable to move to foster or family placements with support, saving nearly half a million pounds per year from his budget – a finding we would like to replicate with a much larger study, given the opportunity.