Data set complete

Data set completeDear all,
Bad news and great news. The bad news: this past week, my daughter (who loves math) caught a nasty stomach bug and had to stay home from school. So I stayed home from work to hang with her while she sipped juice and watched Minecraft videos. The great news: I was able to use much of this time to compile our final data set. (Also, my daughter is better now!)
We have over 13,000 records consisting of:
* editor's name
* editor's gender
* editor's title on editorial board
* editor's institutional affiliation and address (only available in some cases)
* journal on which they serve
* publishing entity (e.g., Springer, Elsevier, etc.)

* 5 year impact factor of journal
Here's what's in store next for Shilad and me.
1. There are various editor titles appearing in our database: "associate editor," "handling editor," "member of editorial board," "editor in chief," and many others. We need to work on grouping some of these categories together so that we get down to a manageable number of categories that we can use in our analysis.
2. In some cases, an editor's country of residence is available in our database. In other cases, only the name of their institution is available. We need to geocode these institutions in order to come up with countries of residence.
3. We need to resolve the names of the publishing entities. For instance, in our database, we have publishers such as "Elsevier," "Elsevier BV LTD," and "Elsevier Science LTD," which all boil down to Elsevier.

4, We'd like to de-duplicate our data. In other words, some individuals may appear in our data set more than once because they serve on more than one journal, and we'd like to identify these individuals. The challenge is that for such an individual, their name might appear differently on the journals. In my case, it would be something like "C. Topaz" versus "Chad M. Topaz" versus "Chad Higdon-Topaz." For small data sets, these cases can be handled by manual inspection, but for larger data sets like ours, we need automated de-duplication
One we clean up the data in these ways, we'll be ready to start running analyses.
Thanks to amazing support we met our original $6000 funding goal some time ago. Our data collection turned out to be more expensive than anticipated, so we have set a stretch goal of $8200. Right now, we are are about $1500 short. If you have not yet backed our project, please consider doing so. Whether or not you are a backer, please share our funding link with friends and colleagues:
 https://experiment.com/projects/gender-representat...

Again, many thanks for everything. I'll continue to share progress updates here!
With enthusiasm,
Chad

Women are grievously underrepresented in the mathematical sciences. Because publication of research is key to academic career advancement and because research has repeatedly uncovered gender bias penalizing women in professional circumstances, we use tools of data science to study 600 mathematical sciences journal editorial boards. We quantify gender representation on these boards and examine its association with characteristics such as impact factor, publishing house, and mathematical subfield.

2 comments

Join the conversation!Sign In

Chad TopazResearcher

Denny, thanks for your enthusiasm! Shilad and I should talk this over. We haven't even discussed any of the ethical/social issues of posting the data set (complete with individuals' names) and will want to give that careful consideration. Stay tuned!

May 02, 20160

Denny LuanBacker

Hey Chad, super interested in your dataset. Would you consider uploading the raw data at some point? We're thinking of adding data viz tools into Experiment's lab note platform, so just wondering what your data looks like.

Data set complete

2 comments

About This Project

More Lab Notes From This Project

Browse Other Projects on Experiment

Related Projects

AI analysis of soil pH with a smartphone camera

100%
funded

$10,245
goal

5
lab notes

Creating a neural network that classifies Dinoflagellate species

102%
funded

$1,378
goal

5
lab notes

Automated Monitoring for the Resilience of Marine Ecosystems in the Dominican Republic (MARE-RD)

100%
funded

$8,000
goal

0
lab notes

Data set complete

2 comments

About This Project

More Lab Notes From This Project

Browse Other Projects on Experiment

Related Projects

AI analysis of soil pH with a smartphone camera

100%funded

$10,245goal

5lab notes

Creating a neural network that classifies Dinoflagellate species

102%funded

$1,378goal

5lab notes

Automated Monitoring for the Resilience of Marine Ecosystems in the Dominican Republic (MARE-RD)

100%funded

$8,000goal

0lab notes

100%
funded

$10,245
goal

5
lab notes

102%
funded

$1,378
goal

5
lab notes

100%
funded

$8,000
goal

0
lab notes