This document describes the data collection, format, storage, data access and public sharing of The Bee Informed Partnership, Inc.SM (BIP Inc.) honey bee health database. It also delineates the roles and responsibilities of those who have developed this software, those who contribute and those who use the data. This document details the levels of security and encryption that BIP Inc. practices to protect all data. At all times, BIP Inc. is investing in and following the latest updates in technology to ensure the security of our participant’s data.
- Expected Data type
Honey bee health and peripheral data includes, but is not limited to, colony health and descriptive measures, lab diagnostic analysis, management actions, environmental variables such as dates, location, weather, proximity to landscape features, and participant contact information.
- Data Format
This data may initially be collected by web and mobile applications, custom electronic hardware such as hive scales, or paper records that are later transcribed into a web application. Additionally, data from external sources can be accessed through RESTful APIs. RESTful APIs are representational state transfer (REST) technology, an architectural style and methodology to communications often used in web services development. API is an application program interface, or set of tools used to build software applications.
- Data storage and preservation
Data is stored in the Bee Informed Partnership database, which is an online, relational PostgreSQL database of honey bee health information from numerous, federally funded projects. The public interface for this database is at bip2.beeinformed.org. Our approach is a customer oriented data management interface where data is entered directly to individual customer accounts from the various labs and locations BIP works. Using modern and secure web technology including SSL, user session security is comparable to e-commerce sites people use on a daily basis.
A backup of the database is automatically generated each night. The program runs daily backups, and these are kept for 25 days. One backup each week is kept for 25 weeks. One backup per year is kept for 25 years. The server hard drive containing these backups is additionally backed up weekly and stored for 4 weeks. The server is hosted through DigitalOcean.com which is a top tier, reliable SSD cloud server provider based out of New York.
- Data sharing and public access
The web server uses SSL encryption to protect data transferred over the network and login accounts with various levels of access configurable by the database administrators to allow or disallow users with access to the system to view, update, create, or delete records as appropriate. Few individuals have access to manage records collected through labs such as the University of Maryland and Tech Transfer Teams. These data are not publically accessible in their original form, while aggregate views of the data may be distributed to create broader understanding of trends in bee health. In these cases, no personally identifiable information is released and steps are taken to abstract identities and locations. For example, varroa levels may be reported on a county or state level if it is not generally known that a single beekeeper is participating in that county for the particular program (example: APHIS State Reports https://bip2.beeinformed.org/state_reports/). Extra attention is given to protect identities of commercial beekeepers due to the belief that reporting the number of colonies owned by an individual could identify that individual. For example, in the Colony Loss Survey, losses by state are redacted if fewer than 5 beekeepers respond in a state. (example: Colony Loss Map https://bip2.beeinformed.org/geo/).
In addition to providing ‘enterprise’ solutions for bee labs and coordinated projects such as the BIP Tech Transfer Teams, the BIP database also provides individual beekeeper data management. Any user can create an account on our system and utilize a selection of the services we provide. At this writing the most advanced feature for individuals is the hive scale platform at hivescales.beeinformed.org where users can map hive locations and collect hive scale data. In the future, users may be able to access their own personal lab records or collect additional metrics on colonies. These data share the same personal protection as other data in the database, however we build in abilities for the individual to share their data if desired. Individuals are free to obfuscate or “blur” their own location data if they wish by, for example, mapping their apiary location in nearby road intersection, instead of its actual location. However, we discourage and look for false or intentionally misleading records in datasets such as our loss survey and flag them as invalid.
In cases where a researcher requests data to perform analysis, the BIP Data Management Committee will evaluate the request. The committee will determine if we have the data that is being requested, if another researcher is already working on the topic and what steps would need to be taken to protect individual identities to provide the data. If approved, the researcher will have to agree to the BIP Data Request Protocol (COMING SOON) which defines how individual identities should be protected and fair use of the data. This researcher will also be required to take and pass the online ethics class (Collaborative Institutional Training Initiative (CITI)) required by all our team members. Costs will also incur from preparation of the data. In most cases, it will be more feasible to fund BIP to do the analysis of interest instead of external researchers obtaining the data and conducting analysis themselves (example: a company that produces varroacides wants to better understand what products people are using).
- Roles and responsibilities
University of Tennessee, Appalachian State University, and Grand Valley University co-operators all have responsibility for the IT, data management of this project. This provides redundancy in case one co-operator needs to leave the project. The IT roles work closely with the University of Maryland and the Bee Informed Inc. non-profit to ensure the database serves the needs of the partnership. Anyone that creates an account through our online services, must agree to our EULA (end user license agreement) to use the site.
- Monitoring and reporting
Create, update, and delete actions by users are recorded via an audit system that stores the details of the change as a record in the database. The code behind the data interface is managed through GitHub, a software version control system, where each change in the code is tracked and associated with the programmer that changed it. Aggregate data available on the public site are updated within 24hrs as new entries are made. Google Analytics are in place for us to be able to determine reach from these outputs. Custom database queries will be run at reporting times if needed to describe the data generated, such as number of records, locations, participants, etc.