Andrew Nute

Andrew Nute

Aug 28, 2015

Group 6 Copy 165
0

When someone asked about his game plan, Mike Tyson said, "everyone has a plan 'till they get punched in the mouth".

Side note:

It has been a while since I have posted a lab note and for that I apologize! I will quickly tell everyone what data I got, what I am doing with it, and what I hope to get out of it because it has changed since I last posted. I think you will find it interesting nonetheless.

What happened to the original plan:

Unfortunately, after getting to the Dominican Republic, it quickly became apparent to me that the data I was seeking was not as straight forward as I had thought. This meant that my original plan would not provide the power needed to confidently estimate the impact that World Water Relief is having without further data to which I was not granted access by the Ministry of Health. The data that is needed has started to be collected digitally by the Ministry of Health but it is both a new health surveillance system and consists of highly confidential patient information. It is not surprising that an intern (me) at a small, young non-profit organization (WWR) would not be granted access to data this sensitive.

So, at this point, I decided to gather as much data as possible using my original plan to scan paper records and digitize them with Optical Character Recognition (OCR) software. The goal of my time there became to start building a database that World Water Relief can use in future studies to make scientific judgments about their impact. Once this database is built, WWR will have something to bring to the table epidemiologically when they request the help of the Ministry of Health's data in the future. Starting this conversation and building this relationship was the primary objective of the first month and a half of my internship in Barahona. I had some success and was allowed to pursue any records that I found recorded on paper at the Batey clinics.

What data I gathered and why:

The second half of my summer was dedicated entirely to collecting two data sources that I identified as both eligible for digitization using OCR software and valuable to the intended database. The school attendance records were an obviously good candidate for inclusion in the database but the clinic records proved to be either poorly organized, missing or written in freehand form which OCR software can not recognize. In fact, it is partially for those reasons that the Ministry of Health is switching to computer based patient file management. Instead, I identified the census records (which are recorded for each individual household) as invaluable information to start a database. With this information, WWR will have much more precise numbers of target beneficiaries in their chosen communities. Armed with this information, future WWR studies will be able to do a couple important things. First, they will be able to weight each community by it's population and numbers of beneficiaries which is important because not all of the Batey communities are equal in size. Second, and more importantly, they will be able to design their future studies to control for factors that could effect both the children's exposure to the WWR interventions and the children's school attendance rates. These are two steps which will add legitimacy to future studies as WWR will be able to paint a clearer picture of what they are achieving. A blank copy of the census record can be found at the bottom of this lab note.

What I am doing now:

Now that I am home, I am beginning the data processing step. I have scanned both the census records and school attendance records that I found during my visits from six communities with a WWR filtration system and one community that will have a system installed in the coming year. Currently, I am reworking the file names after which I will upload the images to Captricity's website. Captricity will be asked identify the names and dates of birth written on both data sources because written text is the most difficult for OCR to correctly identify. Captricity's professional experience and manpower will ensure that the data from the two data sources will provide the most matches possible from the records scanned. Separately, I am working with my brother, Michael Nute who is a Ph. D student in statistics at the University of Illinois. Having worked extensively with image processing research as part of his degree, he will build a program in Python to recognize easier parts of the data sources such as the absence/presence fields represented by either an "A" or "P" on the attendance records. This program will also identify the numbers recorded in the table on the first page of the census which correspond to the codes below the table.

After the data processing:

Once I have everything established in a relational database, I will attempt to use the information on the first page of the census records to build a regression model to predict school attendance rates among children found on both the census and school records. Information that is of primary interest for this model on the census record includes access to toilet facilities (latrine vs toilet, exclusive vs shared), access to water (tap in the house, tap outside the house, or somewhere in the community) and water source for the community (river, lake, aqueduct, etc.). I will be sure to update these lab notes more often now that I have a reliable internet connection and I appreciate everyone's patience.

If you have any questions, please feel free to ask!

Andrew

0 comment

Join the conversation!Sign In

About This Project

World Water Relief (WWR)

Clean water is often unavailable in the Dominican Republic, and while the non-profit World Water Relief (WWR) has taken measures to provide access and education to rural communities, the effects of cleaner water on school attendance and illness rates have not been quantified. To evaluate WWR's impact, I will digitize data of school attendance and clinic patient records before and after WWR, and analyze the connections between clean water and education to better inform infrastructure investment.

Blast off!

Browse Other Projects on Experiment

Related Projects

Empowering coastal youth for enhanced coastal and mangrove monitoring amidst massive sargassum landings.

The Caribe Mexicano MPA, home to 50% of the western hemisphere's largest barrier reef, has faced massive...

Help Us Excavate a Dinosaur Bonebed in Wyoming's Bighorn Basin

In 2014, in a remote and barren corner of the Bighorn Basin, NJSM paleontologists and participants in the...

Pieris Project: using citizen science to learn how species will respond to climate change

Climate change will dramatically alter the planet’sbiodiversity, yet we know little about how most species...

Backer Badge Funded

Add a comment