Feb. 8, 2017

Avoiding rambling data visualization

Three links that underscore why analysis is the key first step to data viz

In the true serendipitous fashion of the social Web, I came across a few really neat and thematically related posts on visualization that I think are just masterful ways of demonstrating how to use data to explain how the world works.

I'm normally more interested in discussions of data analysis than data visualization, but these three links demonstrate the importance of knowing your data well enough to craft a thesis before jumping to the step of creating something beautiful.

"Created by Du Bois and his students at Atlanta, the charts, many of which focus on economic life in Georgia, managed to condense an enormous amount of data into a set of aesthetically daring and easily digestible visualisations."

  • The classic starting point of a data journalism project (that is, once you actually have your data) is to "interview your data." You treat data like you would any other sources, attempting to understand why they might be wrong or flawed or lead you to inaccurate conclusions. You're also trying to suss out how the information might answer one of your big questions, or what kinds of interesting and new things it has to say about reality. Fail to do this, and your visualization step could muddy the waters because there are so many different options for presenting your conclusions. Case in point, from FlowingData: One dataset, visualized 25 different ways. In addition to illustrating his point brilliantly with, well, illustrations, author Nathan Yau also underscores his argument in a way I can certainly relate to:

"You have to guide the conversation though. You must help the data focus and get to the point. Otherwise, it just ends up rambling about what it had for breakfast this morning and how the coffee wasn’t hot enough."

Special thanks to BuzzFeed Reporter Peter Aldhous for sharing the links on Rosling, NYU professor and data journalist Meredith Broussard for the link on Du Bois and to BuzzFeed Data Editor Jeremy Singer-Vine for the FlowingData link. If you haven't signed up already, I highly recommend Jeremy's Data is Plural newsletter, which delivers all kinds of neat stuff to you every Wednesday.

 

Dec. 30, 2016

Takeaways: Weapons of Math Destruction

Statistical paradoxes, teaching ethics and more

I picked up a copy of Weapons of Math Destruction a few months back after hearing author Cathy O'Neil give a talk at the Shorenstein Center on Media, Politics and Public Policy.

It's a really compelling and quick read, and O'Neil leans heavily on her experience as a former financial quant and practicing data scientist to identify how algorithms embedded in industries from insurance to the criminal court can create harmful feedback loops. Those loops cause real harm to people -- especially the poor -- without their knowledge.

There's a ton of fascinating information in the book, but here are a few things I flagged so I wouldn't forget:

  • The origins of broken-window policing In her examination of New York City's stop-and-frisk program, O'Neil goes back to the original 1982 work of George Kelling and James Q. Wilson in The Atlantic. She points out what I didn't know: that Kelling and Wilson weren't arguing for an uptick in enforcement of nuisance crimes or an embrace of zero-tolerance policies. Instead, they examined a Newark, New Jersey, policing initiative that encouraged beat cops to be highly tolerant: "Their job was to adjust to the neighborhood's own standards of order and help uphold them."

  • Unconstitutional hiring practices In 1971, the Supreme Court ruled in Griggs v. Duke Power Company that employers could not use intelligence tests for hiring. This is relevant in light of moves by workforce management firms to use predictive analytics to screen job candidates.

  • Simpson's Paradox As a non-stats person, this one was new to me. Simpson's Paradox, O'Neil notes, is "when a whole body of data displays one trend, yet when broken into subgroups, the opposite trend comes into view for each of those subgroups." She emphasizes that stratifying data is crucial to spotting this paradox, a move statisticians failed to make in their 1983 A Nation at Risk report that faulted teachers for plummeting SAT scores. In fact, scores for every group of kids were rising, and the downturn in scores was the result of more students taking the test.
  • Teaching ethics in data science I really love O'Neil's strategy for walking through some of the morally dubious questions of building models with new data scientists. The task is to build an "e-score," a scoring system to predict whether a person will default on a loan. She starts by asking whether to include race, which is clearly off-limits. She then asks about zip code: "It doesn't take long for the students to see that they are codifying past injustices into their model."

  • Auditing algorithms O'Neil points to Princeton's Web Transparency and Accountability Project as one group working out how to audit algorithms with software. "They create software robots that masquerade online as people of all stripes -- rich, poor, male, female, or suffering from mental health issues. By studying the treatment these robots receive, the academics can detect biases in automated systems..." Neat.

ProPublica's already leading the way with great reporting on algorithms on a national scale through their "Machine Bias" series. I'm really interested in how smaller news orgs like mine might embrace some of these techniques to do investigations on the state and local level. Nick Diakopoulos, a journalism professor at the University of Maryland, has done extensive writing on the topic, including a 2014 report for the Tow Center for Digital Journalism at Columbia.

I suspect that report will be a good next stop.

Aug. 16, 2016

Let's do this

Starting something brand new up North

Parapets for days up there.

More than three months ago, I was selected for a Nieman Fellowship at Harvard. Between prepping for the move to Cambridge, Mass., and finishing up projects at WRAL News, those three months have been something of a blur.

Now we're here, and I really cannot wait to get started.

Given the time that's passed since I first sent in my application back in January, I figured it was worth revisiting my project pitch — the rough idea of what I wanted to work on for the next nine months.

I'm told it's pretty typical for fellows to alter and evolve their research plans as they progress through the program. I can already see that happening with my own pitch, and I imagine I'll continue to tweak it in the weeks ahead.

But I think it's important to remember where the general idea for the research began — and whether any thing I add or change is staying faithful to the original idea of spreading the practice of data journalism to more newsrooms.

Have ideas about how to do that? I would love to chat. Email me at tyler.dukes@gmail.com or hit me up on Twitter at @mtdukes.

In the meantime, I'll track the progress of that work in this space, as publicly as possible.

Here goes nothing.

Nieman study plan

Submitted Jan. 31, 2016

With the data-driven investigative techniques now at its disposal, the capabilities of the fourth estate have in some ways never been stronger.

But this technical expertise is largely consolidated among the world's most powerful newsrooms. And they can't be everywhere all the time.

When they miss something, like The New York Times and other national institutions did with the water crisis in Flint, Mich., that failure can have profound impacts on the public we serve.

I want to find a way fill these gaps, to distribute the most effective data-driven investigative techniques more uniformly among regional news organizations and the reporters who know their communities best.

My theory is that strengthening local journalism requires us to improve how we teach journalism students and train reporters. By equipping journalism students all over the country with data literacy skills and bringing some of the most advanced data journalism techniques to bear in small newsrooms, we can make impactful investigative journalism more ubiquitous in underserved communities that need it most.

I want to spend a year at Harvard studying how we get there.

I want to better understand the barriers to entry and how to break them down. I want to explore the challenges j-schools face in designing an education for the industry's next generation and the obstacles to expanding a time-constrained daily reporter's skillset. This could mean developing new tools for journalists and educators, new models for partnerships between universities and news organizations or best practices for building enterprise teams when resources are scarce.

At its core, this research will be about exploring the intersection of the classroom and the newsroom.

That's what makes Cambridge a perfect place for this work.

MIT's Center for Civic Media has valuable lessons to teach about uniting computer science, statistics and media disciplines to equip students with techniques familiar to the country's most tech-savvy reporters. Coursework at Harvard's Kennedy School on leading organizations through change could provide guidance on the practicality of evolving an industry often resistant to course correction. Even Harvard's Graduate School of Education could be helpful in modeling a curriculum that's easy to replicate at universities across the country. 

Right around the corner, leaders at the Globe have built a team of renowned investigative journalists and protected it amid cutbacks. And on the periphery of the greater Boston area, dozens of community news organizations can provide diverse perspectives on the obstacles to enhancing their enterprise reporting.

My theory may be dead wrong. It could be flawed in dozens of ways and impractical in dozens more.

But a Nieman Fellowship would give me the resources, access and time to figure that out — and hopefully learn more about how to strengthen local journalism in the process.

Load more posts