Data set complete
Dear all,
Bad news and great news. The bad news: this past week, my daughter (who loves math) caught a nasty stomach bug and had to stay home from school. So I stayed home from work to hang with her while she sipped juice and watched Minecraft videos. The great news: I was able to use much of this time to compile our final data set. (Also, my daughter is better now!)
We have over 13,000 records consisting of:
* editor's name
* editor's gender
* editor's title on editorial board
* editor's institutional affiliation and address (only available in some cases)
* journal on which they serve
* publishing entity (e.g., Springer, Elsevier, etc.)
* 5 year impact factor of journal
Here's what's in store next for Shilad and me.
1. There are various editor titles appearing in our database: "associate editor," "handling editor," "member of editorial board," "editor in chief," and many others. We need to work on grouping some of these categories together so that we get down to a manageable number of categories that we can use in our analysis.
2. In some cases, an editor's country of residence is available in our database. In other cases, only the name of their institution is available. We need to geocode these institutions in order to come up with countries of residence.
3. We need to resolve the names of the publishing entities. For instance, in our database, we have publishers such as "Elsevier," "Elsevier BV LTD," and "Elsevier Science LTD," which all boil down to Elsevier.
4, We'd like to de-duplicate our data. In other words, some individuals may appear in our data set more than once because they serve on more than one journal, and we'd like to identify these individuals. The challenge is that for such an individual, their name might appear differently on the journals. In my case, it would be something like "C. Topaz" versus "Chad M. Topaz" versus "Chad Higdon-Topaz." For small data sets, these cases can be handled by manual inspection, but for larger data sets like ours, we need automated de-duplication
One we clean up the data in these ways, we'll be ready to start running analyses.
Thanks to amazing support we met our original $6000 funding goal some time ago. Our data collection turned out to be more expensive than anticipated, so we have set a stretch goal of $8200. Right now, we are are about $1500 short. If you have not yet backed our project, please consider doing so. Whether or not you are a backer, please share our funding link with friends and colleagues:
https://experiment.com/projects/gender-representat...
Again, many thanks for everything. I'll continue to share progress updates here!
With enthusiasm,
Chad
2 comments