What is Center for Embedded Networked Sensing (CENS)?
Center for Embedded Networked Sensing (CENS), conducting collaborative research on data practices in which such sensing systems were developed, tested, and deployed for scientific applications.
CENS was a National Science Foundation Science and Technology Center, funded from 2002 to 2012 and devoted to developing sensing systems for scientific and social applications through collaborations between engineers, computer scientists, and domain scientists.
By partnering across disciplinary boundaries, participants had to articulate their research practices, methods, and expectations explicitly.
Sensor-networked science, in the field research areas of CENS, is a canonical example of “little science,” in Price’s (1963) terms. While Price might view this type of science as less mature than the big science of astronomy owing to the inconsistency of research methods, it is best viewed as adaptive.
These are rigorous methods that meet the standards of evidence for their respective fields. Such data may be high invalidity but are not readily replicated.
CENS launched in 2002 with a core of investigators located at four universities in California, with a fifth added later. These investigators, in turn, had collaborators at other institutions. Membership varied from year to year as projects began and ended, and as the rosters of students, faculty, post-docs, and staff evolved.
The Center had about three hundred participants at its peak, with a diverse array of data practices. On average over the life of the Center, about 75 percent of CENS participants were concerned with the development and deployment of sensing technologies;
the rest were in science, medical, or social application domains. Technology researchers were developing new applications for science or other applications, whereas the scientists sought new technologies to advance their research methods.
The data collected in CENS field deployments were big in variety, if not in absolute volume or velocity. As volume and velocity increased, however, science teams experienced scaling problems. Sensor networks produced far more data than did the hand-sampling methods that dominated these domains.
In a biology study of root growth, for example, scientists had collected and manually coded about one hundred thousand images in a period of seven years, using cameras in clear plastic tubing that were placed in the ground near the plants of interest.
By automating the cameras and sending the images over the sensor network, they could capture up to sixty thousand images per day, totaling about ten gigabytes. Transferring the manual methods to automated coding was problematic for several reasons.
Manual coding relied on the expertise of graduate and undergraduate students, some of whom coded for hours on end. Coding was difficult because the roots were very small and grew slowly. When roots first touched the tube, they appeared only as tiny dots.
Once they grew enough to be visible along the tube, coders could study prior images to determine when the roots might first have appeared. Identifying the origins of some of those observations required digging through field notebooks, videotapes of root growth, and other records.
They did some testing of inter-coder reliability, but the margins of error made it difficult to codify the practices algorithmically. In marine biology studies, science teams usually captured water samples three to four times in each twenty-four-hour period.
Those observations were correlated as a time series. Sensor networks, however, sampled the water at five-minute intervals. Simple correlations and time series analyses did not suffice for these data rates, which led to the adoption of complex modeling techniques
When Are Data?
The notion of data in CENS was a moving target throughout the ten years of the Center’s existence. Each investigator, student, and staff member interviewed offered personalized descriptions of his or her data. These descriptions evolved in subsequent interviews, site visits, and examination of their publications.
Individuals on the same team had differing explanations of what were the team’s data, varying by their role in the team, experience, and the stages of research activity. Notions of data also evolved as the sensor-network technology improved and as the collaborations and research methods matured.
CENS research as a whole was a mix of exploratory, descriptive, and explanatory research methods. Scientists might formulate a hypothesis in the laboratory and test it in the field, and vice versa.
Technology researchers could test some theories in the lab and others in the field. Theoreticians, such as the electrical engineers who were modeling sensor networks, similarly could test theories in field deployments of sensor networks.
Scientists in biology, seismology, environment, and other areas brought their research questions and methods with them to CENS. Particularly in the biological and environmental applications addressed in this case study, their science practices exhibited characteristics of ecology identified by Bowen and Roth:
(1) research design has a highly emergent character; (2) tools and methods are developed in situ, often from locally available materials, and are highly context-specific;
(3) studies are not easily replicable because of the dynamic nature of ecological systems; and (4) social interactions between members of the community are highly important.
Researchers in computer science and engineering similarly brought their research questions and methods to CENS. Few of the technology research-ers had experience in designing hardware or software for scientific applications, particularly for field-based research that took place in unpredictable real-world settings.
Technology design was particularly daunting when the requirements and evaluation criteria remained fluid, adapting to scientific questions. Teams had to learn enough about the domain area of their partners to design, deploy, and evaluate the technologies and resulting data.
CENS research often combined commercially purchased and locally developed equipment. Decisions made in the design of sensing technologies, both hardware, and software, influence the types of data that can be acquired. As with telescope instruments in astronomy, design decisions made long before field research takes place may determine what can become data.
Although science and technology teams worked together in the field, neither had a full grasp of the origins, quality, or uses of the data collected jointly.
Some of the data collected in field deployments were of exclusive interest to the science teams, such as physical samples of water, whereas others were of exclusive interest to technology teams, such as proprioceptive data from robotic devices.
Thus data from the sensors were of mutual interest but applied to different research questions. Because these were new scientific applications of research-grade technologies, “ground truthing” the data against real-world benchmarks was a continual challenge.
Sources and Resources CENS researchers collected most of their own data, having few repositories or other external sources on which to draw.
The teams gathered an array of data from sensor networks and physical samples. Software code and models were essential to the design of instruments and interpretation of data. These sometimes were treated as data.
Embedded Sensor Networks Embedded sensor networks were not new technologies even at the time that CENS was founded in 2002. Sensor networks are used to run industrial processes such as chemical and petroleum plants and to monitor water flow and water quality.
What was new about CENS was the use of embedded sensor networks to ask new questions in science, and for science and technology researchers to collaborate on the design of technologies with real-world applications (Committee on Networked Systems of Embedded Computers 2001).
CENS was able to combine commercially available technologies with new devices and new research designs to collect new kinds of data.
A generation earlier, remote-sensing technologies revolutionized the environmental sciences. The ability to view the Earth from satellites, at levels of granularity that continue to improve, made possible a far more integrated view of environmental phenomena than was ever before possible.
Sensor networks could be deployed on land and in water, depending on the technology. sensors could be buried in soil, hung from buoys or boats in the water, attached to poles or other fixtures, or hung from cables to be moved in three dimensions over land or soil.
Sensors were used to detect indicators of nitrates in water, arsenic in rice fields, wind speed and direction, light levels, physical movements of earth or animals, and various other phenomena. Data were collected from the sensors
Sensor networked science either by hand, such as copying them to a flash drive or by sending them to a node with Internet access. The uses of sensor data and the means by which they were captured varied by the application, choice of technologies, and remoteness of the location.
Some deployments took place in urban areas with ready access to wireless networks, but many were in remote mountains, islands, and deserts.
CENS technology researchers used sensor data for multiple purposes: (1) observations of physical and chemical phenomena, including sounds and images; (2) observations of natural phenomena used to actuate or to guide the robotic sensors to a place in the environment;
(3) performance data by and about the sensors, such as the time that sensors are awake or asleep, the faults they detect, battery voltage, and network routing tables; and (4) proprioceptive data collected by the sensors—data to guide robotic devices, such as motor speed, heading, roll, pitch, yaw, and rudder angle.
Differences between the participating disciplines were most apparent in their criteria for evidence. Biologists, for example, measured variables such as temperature according to established metrics for their field.
Engineers and computer scientists tended to be unaware or unconcerned with those international standards. For their purposes, a local baseline of consistent measurements might suffice for calibration. When asked about measurement practices, one technology researcher stated simply, “temperature is temperature.”
When a partner biologist was asked independently about how to measure temperature, he gave a long and nuanced response involving the type of instrument, when and where the measurement was taken, the degree of control over the environment, the accuracy of the instrument, and calibration records.
The latter biologist installed three types of temperature instruments side by side at a field site, recording measurements for a full year before he trusted the instruments and their resulting data.
Physical Samples CENS science teams continued to collect physical samples of water, sand, and soil. These included observations of living organisms such as the distribution of phytoplankton and zooplankton in a lake. Samples were tested in wet labs on site, and some were tested further on campus, after the deployment.
Software, Code, Scripts, and Models Sensors do not measure wind, arsenic, nitrates, or other scientific variables directly; rather, they measure voltage and other detectable indicators.
Most sensor outputs are binary signals that must be interpreted through statistical models. Some are images from cameras. Statistical models of physical or chemical phenomena were used to interpret these indicators.
The technology teams sometimes used external information resources such as software code repositories. Code, software, and models tended to be viewed synonymously as data among the computer science and engineering researchers studied.
Background Data The scientific teams used some data from external resources to plan their collection of new sources at specific field sites. Because teams tended to return repeatedly to the same field sites, they needed extensive baseline data and background context about those sites.
Data collected by public agencies such as the Department of Fish and Game were important resources, as were data the team collected on earlier visits to the lake.
Valuable background information about the lake included peak months for algae, a topology of the lake bed, phytoplankton and zooplankton species they were likely to see, and nutrient presence and concentration. Engineering teams sometimes obtained calibration data from external sources.
Knowledge Infrastructures Whereas astronomy has accrued a sophisticated knowledge infrastructure to coordinate data, publications, tools, and repositories over the course of many decades of international cooperation, CENS was at the opposite end of the infrastructure spectrum. The Center itself served an essential convening function to assemble researchers with common interests.
It provided technical infrastructure in the form of equipment, networks, and staffing, but made little investment in shared information resources. Their publication records were contributed to the University of California eScholarship system, forming one of its largest repositories.
The CENS research trajectory, as originally proposed, was based on autonomous sensor networks and “smart dust” technologies. Had that trajectory continued, standardized structures for data and metadata would have been far more feasible.
As the Center matured, participants gained a better understanding of the science and technology issues involved in exploratory research of this nature.
Experimental technologies were deemed too fragile and temperamental to be left unattended in field conditions. CENS research shifted toward “human-in-the-loop” approaches that were adapt-able on site
Metadata Scientific investigations using sensor networks is a research area, not a field with a history comparable to that of astronomy. Each of the partners brought their own disciplinary practices, including their uses of meta-data, to the collaboration.
With the exception of seismology and genomics, few metadata standards existed for the data being produced by CENS teams.
Those that did exist were not necessarily adopted by the scientific community or by local research teams—a characteristic of small science research. Formal XML-based standards existed for environmental data and sensor data, for example, but were not used
Some teams assigned metadata for their own purposes, although not from these XML standards. Researchers created records that described the context of data collection, including precise times, locations, local conditions, the position of sensors, which sensors (make, model, serial number, and other characteristics), and scientific variables. File-naming conventions were the most common form of metadata.
Rarely were teamed satisfied with the degree of quality of metadata they used to manage their data and often were concerned with the difficulty of locating or reusing their older data.
Provenance Each CENS team maintained its own records of the origins and handling of their data. Teams often used data from prior deployments or from laboratory studies for comparisons.
CENS science teams tended to return repeatedly to the same field research sites, often over a period of many years. They developed cumulative knowledge and cumulative data about a site, enabling them to make longitudinal comparisons.
Technology researchers were far less dependent on authentic field sites or longitudinal comparisons. Initial testing of their instruments could be conducted in bathtubs, swimming pools, backyards, and open spaces on campus.
Lacking a common data pool as in astronomy, researchers in the fields affiliated with CENS had no equivalent of data papers to establish the provenance of a dataset. Rather, provenance information was locally maintained, as were the associated data. Those wishing to reuse CENS data usually contacted the authors of papers in which data were reported.
External Influences Sensor-networked science attracts scholars from many different academic backgrounds. Each individual and group brought its own set of economic values, property rights concerns, and ethical issues to the collaboration. Some of the most interesting and unexpected issues arose in dealing with data at the intersection of research domains.
Economics and Value The ways in which data were obtained and exchanged in CENS varied by domain and by the circumstances of each project. Researchers in seismology and marine biology had some common-pool resources for data.
Those in environmental research drew on local, state, and US government records for meteorology, water flow, and other aspects of field conditions. To the extent that these were investigators working inside the United States, most of these records would be public goods.
The same records may be licensed for use outside the United States, in which case they may become toll goods or common-pool resources, depending on how governed. CENS scientists collected observational data in many other countries, and their access to background data on local conditions varied accordingly.
Computer science and engineering researchers sometimes obtained software from or contributed code to, open software repositories such as GitHub and SourceForge, which serve as common-pool resources for this community.
The seismology community in the United States is supported by the Incorporated Research Institutions for Seismology (IRIS), a consortium of universities that operate science facilities for the acquisition, management, and distribution of seismological data (Incorporated Research Institutions for Seismology 2013).
Seismic data are used not only for scholarly research and education but also for earthquake hazard mitigation and for verification of the Comprehensive Nuclear-Test-Ban Treaty.
Data resulting from National Science Foundation grants must be made available in the IRIS repository within a specified period after the last piece of equipment from a grant project is removed from the field.
While rules on proprietary periods were respected, researchers had considerable flexibility about when to remove all remaining seismic equipment from a field site. Researchers might delay removing sensors to gain additional time to analyze their data.
CENS collected very little genomic data, but some of the research in harmful algal blooms and in water quality included DNA analyses. When required by funding agencies or journals, these data were contributed to GenBank, the Protein DataBank, or other archives.
Some environmental data collected by remote sensing on satellites have great commercial value, such as tracking schools of fish or weather conditions that predict crop yields.
Most of the CENS projects in environmental sciences collected small amounts of data about individual research sites. Few of the data were organized in ways that they could be readily combined or compared. In the aggregate, however, the potential existed for such data to be valuable to others
Property Rights As with astronomy, property rights in sensor-networked science and technology are associated more with instruments than with data. Whereas astronomers share large instruments, CENS scientists and technology researchers tended to purchase or build their own small devices for data collection.
Equipment purchased through grant funding usually becomes the property of the university to which the grant was given.
Several companies partnered with CENS, providing expertise, equipment, and additional funding. A few small companies were forced to sell some of the equipment, algorithms, and methods developed in CENS research. None became major commercial successes; their goal was largely technology transfer.
The Center’s overall approach was toward open science, preferring to release software as open source code. Among the most successful continuing ventures is a nonprofit enterprise founded by CENS alumni to design networked sensor technologies for environmental, health, and economic development applications.
Ethics issues in the creation of scientific and technical data from sensor networks arose in decisions about what, where, and how phenomena were studied, and in how precisely the findings were to be reported.
For example, some CENS researchers studied endangered species or habitats. Published findings included sufficient detail to validate the research but not enough for others to identify the exact location of the sites. Research that took place at protected natural reserves often was sensitive.
Research reserves usually are in isolated locations, although some may be open to the public for educational activities. Recreational and research visitors alike were expected to respect the habitats and ecosystem to ensure that flora, fauna, and phenomena could be studied under natural conditions.
Computer science and engineering researchers in CENS were presumed to respect the ethical standards of their fields.
The code of ethics of the Association for Computing Machinery (ACM), the largest professional organization for computer science, covers general moral imperatives, professional responsibility (high-quality work, know and respect applicable laws, evaluate systems and risks, etc.), leadership, and compliance with the code.
IEEE, which is a large professional organization for engineering, has a similar but less detailed code of ethics that mentions responsibilities such as “making decisions consistent with the safety, health, and welfare of the public,” “to acknowledge and correct errors,” and to avoid injury to others (Institute of Electrical and Electronics Engineers 2013).
Per these guidelines, researchers are to collect data responsibly, but notions of “responsibility” varied by domain. Engineers and biologists worked together to repurpose weapons-targeting algorithms to identify the location of bird sounds. Scientists who had avoided involvement with military applications found themselves deploying weapons technology for peaceful purposes.
A team of computer scientists adapting sensor cameras to visualize the movement of birds and animals in the field mounted their cameras in campus hallways for testing purposes—then found themselves challenged on human subjects grounds for capturing people’s behavior in hallways without consent.
As CENS expanded into other applications of sensor networks, expertise in scientific usage of these technologies turned to social uses of mobile devices. Cell phones became an important platform for data collection and for the study of network topology.
When participants began tracking their own behavior via applications on mobile devices for food intake, commuting routes, bicycle usage, and other purposes, privacy concerns became paramount.
Computer scientists and engineers faced a challenging set of decisions about what data they could collect versus what data they should collect. CENS became the site of a multiyear study on building values into the design of mobile technologies.
Conducting Research with Embedded Sensor Networks
CENS supported many independent projects at any one time, although some people, equipment, and practices were shared between projects.
These collaborations took researchers out of their comfort zones: technologists had to test new equipment in highly unpredictable field settings, and scientists had to rely on technologists to ensure that field excursions were successful.
Small teams of researchers—a mix of students, faculty, and research staff from multiple research areas—would conduct research together on a field site for periods ranging from a few hours to two weeks.
These events were known as “deployments,” since the technology was deployed to collect data of various sorts. Participation varied from day to day, with a maximum of twenty or so people in total.
The following composite scenario of a typical CENS field research deployment, published in more detail elsewhere, illustrates a set of activities commonly associated with the collection, management, use, and duration of these types of data.
This scenario involves a harmful algal bloom (HAB), a phenomenon in which a particular alga suddenly becomes dominant in the water, and can occur in freshwater and in oceans.
The bloom creates toxic conditions that kill fish and other animals such as sea lions by consuming the available dissolved oxygen that fish need or by releasing domoic acid, a harmful neurotoxin that affects large mammals.
HAB is an important phenomenon to study because they can cause severe damage, potentially killing tens of thousands of fish in a day. The HAB deployments were sited at a lake known for summer blooms.
Sensor networks enable marine biologists to study more variables than possible with hand-sampling techniques and to collect a much larger number of observations. Data collection can be adapted to local conditions through the choice and location of sensors.
Sensor network studies of HAB enable computer scientists and engineers to test the ability of physicians and biological sensors to collect large numbers of variables.
Roboticists find HABs of particular interest because the flow of observations can be used to trigger sensing systems on robotic boats, buoys, helicopters, cameras, and autonomous vehicles.
Research Questions The overall goal of CENS research was the joint development, or co-innovation, of new instruments that would enable new kinds of science bedded Computers Science and technology efforts were symbiotic, as in Licklider’s metaphor of the wasp and the fig tree. One could not proceed without the other; they were mutually interdependent and influencing.
Despite the interdependence of the science and technology teams, the long-term goals of their research were aligned with their respective disciplines rather than with the Center.
The biology researchers continued their study of biological phenomena associated with HAB before, during, and after CENS, and the technology researchers continued to improve their instruments, algorithms, and models for targeting phenomena before, during, and after CENS.
Their choices of data, how to manage their data, and where to publish their findings were more aligned with their respective research agendas, despite their substantial commitment to mutual participation.
In the HAB field research, the science team studied the distribution of phenomena in the lake, whereas the technology team studied robotic vision. The science team’s requirements provided a means to conduct technology research on algorithms for robotic guidance, network health, sensor fault detection, and the design of sensor technology interfaces.
The computer science and engineering researchers relied on discussions with the science team to guide their choices of equipment, specific sensors, and the time, place, and length of deployment of each.
Collecting Data The number of people and the distribution of skill sets varied considerably from deployment to deployment. In the four-day lake deployment to study HABs, participation varied from day to day.
On the first day, students and research staff arrived to set up equipment. On the second day, faculty investigators arrived to guide the data collection.
In this example, about twenty people came and went over the course of four days. These included about eight to ten from electrical engineering who built the sensing system, about four or five from the robotics team, two from statistics, and six to eight members of the marine biology team. Responsibilities over-lapped; thus, the figures for participation are approximate.
Although all parties came to the site with a set of research questions and associated instrumentation, data collection depended heavily upon local field conditions. Researchers chose and placed sensors carefully, as each sensor is suitable for certain types of conditions and can gather data at certain densities.
The placement of sensors is itself a topic of research. Factors such as soil moisture and pH levels of the lake would influence where to place sensors. Sensors might be moved multiple times during the deployment based on interim findings.
Some positions were based on conditional probabilities, such as the ability of roboticists to move sensors automatically to positions where HABs were predicted.
For the HAB research, both teams needed observations of chemical and physical phenomena (e.g., nitrate concentrations by time, location, and depth in the lake) and observations of natural phenomena (e.g., distribution of phytoplankton and zooplankton) that could be used to guide robotic sensors.
The science teams also needed physical samples of water that contained living organisms; these were tested in wet labs on site and some were tested further on campus, after the deployment. The technology teams also needed performance and proprioceptive data about the sensors.
Sensors varied considerably in reliability, which was a source of much frustration in the early years of CENS. Sensors would function erratically, cease to function or reboot themselves seemingly at random. In the latter case, time clocks would be reset, making it impossible to reconcile data collection from sensors across the network.
Sensor networks for the field-based science of the form conducted in CENS were much less reliable as autonomous networks than expected.
After a devastating loss of data from a field site on another continent, the Center shifted its research focus to human-in-the-loop methods. The latter approach was more suitable for assessing data quality in real time.
In practice, the sensor technology was always researched grade, continually adapting to new science questions and new technical capabilities. It never stabilized sufficiently to employ standardized data collection procedures. These were among the small science characteristics of CENS research.
Analyzing Data Data were processed before, during, and after field deployments. CENS teams devoted considerable effort to ground truthing the sensor instruments and data to ensure that they were trustworthy. Roughly speaking, ground truthing is the use of known measurement methods to test the validity of new measurement methods.
The science teams had full control over the physical samples of water, soil, and other materials they collected in the field. Some of these materials were processed on site, others back in campus labs. Technology teams had little interest in this data.
Making a scientific sense of sensor data requires scientific models, which are rendered as statistical algorithms. These scientific models, which were developed jointly by the science and technology teams, were considered to be among the most important products of CENS research.
In field deployments, such as those for harmful algal blooms, sensor data went to the computers of the technology teams that operated the sensor networks.
These teams calibrated and cleaned the sensor data, which included reconciling variant timestamps among sensors, removing artifacts such as sensor restarts due to computer faults, and adding notes for field decisions such as when, how, and where a sensor was moved.
Cleaned and calibrated data were provided to participating science teams. The scientists then compared those sensor data to their models and to other trusted data sources, such as the calibration curves they had established in laboratory and field testing. Most data from a field site were retained, if at all, only by the team that collected them.
Differences in long-term goals of the science and technology teams were most apparent in the processing of their data. Frictions arose in areas such as conflicting standards, data-sharing practices, and availability of support for managing data.
In science-only and technology-only deployments, the teams that collected the data also processed them. In joint science-technology deployments, handling and disposition varied by the type of data.
The research from CENS field deployments was published in the journals and conferences of each participating field. In many cases, the scientific findings and the technological findings of field deployments were published separately, each aimed at the audience of their respective fields.
In other cases, findings were published jointly with authors from multiple fields. A study of the authorship and acquaintanceship patterns of CENS researchers showed how new collaborations formed and how they evolved over the decade of the Center’s existence.
Participants in CENS research activities included faculty, postdoctoral fellows, graduate, and undergraduate students, and full-time research staff. Students and staff involved in the design and deployment of research equipment were considered part of the team, although staff did not always receive publication credit for their roles.
Authorship was a particularly sensitive topic for those who designed and maintained instruments for the project. Instrumentation in CENS changed continually since co-innovation of science and technology was central to the goals of the Center.
Curating, Sharing, and Reusing Data At the end of field deployment, teams dispersed, each taking the data for which they or their team were responsible.
As a result, data from joint field deployments were widely distributed and unlikely ever to be brought back together again. Few prove-nance records exist that could be used to reconstruct or replicate a particular field deployment.
Each team documented their data sufficiently for their own use. Some teams, especially in the sciences, maintained data for future comparisons. Others, more often in engineering, had little need for data from deployments after papers were published.
File-naming conventions were often the most elaborate form of data management employed by individual research teams. Spreadsheets tended to be the lowest common denominator for data exchange within and between groups.
The adaptive nature of CENS data collection methods led to datasets that were local in use and not easily combined with other datasets. Few researchers contributed their data to repositories, partly because of lack of standardization and partly the lack of repositories to which they could contribute their data.
In principle, existing metadata standards could be used individually or in combination to describe much of the Center’s data. Ecological observations being collected by sensors or by hand sampling could be described with a common structure and vocabulary.
Similarly, characteristics of sensors could be captured automatically if these XML standards were embedded in the algorithms for data collection.
However, formal metadata structures do not fit well into these local and adaptive forms of research activities. Metadata languages intended for use by professional catalogers and indexers are not easily adapted to lightweight use by researchers.
The Ecological Metadata Language, for example, is accompanied by an instructional manual of more than two hundred pages in length (Knowledge Network for Biocomplexity 2013).
A vocabulary identified for water research had more than ten thousand entries, with four hundred entries for nitrates alone. CENS researchers found the scale of these metadata languages to be daunting.
They could not justify the effort that would be required to implement them, even on a small scale. The Center was not staffed to support the level of professional data management that would be required.
Efforts to build a data repository that would support the interdisciplinary research with sensor networks were only minimally successful, largely due to the heterogeneity of data and a variety of local practices for data management.
Despite the informal nature of data management in most of CENS teams, researchers were generally willing to share their data. Conditions under which they were comfortable releasing data varied considerably, ranging from releasing all the raw data immediately to requiring coauthorship on papers resulting from the data.
Most were willing to release data after the resulting papers were published (Borgman, Wallis, and Enyedy 2006). Some data and some software code were contributed to public repositories, but the most common form of data sharing by the researchers was personal exchanges upon request.
Astronomy and embedded sensor networks are contrasting cases that illustrate the diversity of scholarship, research practices, and data.
Astronomy is a long-established field with a score of journals and conferences. Scientific applications of embedded sensor networks are more of a problem area than a field, but they too have several journals and conferences.
Scholars in astronomy and sensor network research rely on shared instrumentation.
However, telescopes and data archives are large infrastructure investments, governed with the expectation of spawning thousands of papers, shared knowledge, and data that may be valuable indefinitely. By comparison, infrastructure investments in sensor-networked science and technology are minimal.
The use of embedded networked sensing technologies to study emergent phenomena in field conditions represents an opposite extreme of scientific data scholarship from astronomy.
Sensor networks deployed by CENS were largely research-grade technologies. Some instruments were too delicate to be left unattended. Others were moved frequently to adapt to field conditions, based on human-in-the-loop research designs.
The convening function of CENS, which brought together the necessary scientific and technical expertise to address these research problems, was a component of their knowledge infrastructure.
Part of the convening function is to provide additional technical expertise, administrative support, and collaborative spaces. Otherwise, participating scholars relied on the infrastructures of their home departments or domains.
Despite the infrastructure investments in the Center, participating researchers lacked data standards, archives, and classification mechanisms to facilitate and integrate the exchange of data resources.
Data management responsibility fell to the investigators, with little apparatus on which to build. This is a chicken-and-egg problem, however. Sensor network projects tend to be exploratory, problems are emergent, and field situations are inherently dynamic.
Research teams may have little need to reconcile their own data from one deployment to the next, since each field trip may address new problems with new instruments. Although they do make comparisons between places and over time, rarely do they have data integration demands of the sort faced by the COMPLETE Survey.
Differences in the data scholarship of these two scientific domains contribute mightily to differences in who owns, controls have access to, or sustains research data.
Common-pool resources dominate astronomy, whereas privately held data will be the norm in embedded sensor network research for the foreseeable future. Within the domain areas of CENS, demand for common-pool resources exists only in specialized areas such as seismology and genomics.
These differences in data scholarship also underlie the contrasting ways in which data are exchanged, if at all. Common-pool resources, both instrumentation and information systems, rely on shared standards and on institutions that support interoperability such as the Astrophysics Data System, CDS, SIMBAD, and NED.
Celestial objects are linked to the publications in which they are mentioned; linking papers and datasets occurs much less frequently.
Data exchange in sensor network research relies largely on personal contacts and no formal means exist to link publications, data, and other research objects. However, both domains invest extensive human labor in making these knowledge infrastructures work effectively.
Temporal characteristics of these domains also influence the evolution of their knowledge infrastructures and relationships among stakeholders. Astronomy is among the oldest and most established fields in the sciences.
Today’s infrastructures span space and time, allowing legacy data to be integrated with observations taken today and far into the future. Astronomers today are less dependent upon private patronage than in centuries and decades past, but the support of wealthy donors is still sought for major instrumentation endeavors.
The growth of public funding contributed to the internationalization of the field, investments in common-pool resources, and greater equity in access to instruments and to data. Astronomy is the rare field that speaks with a unified voice, via the decadal survey.
Sensor-networked science and technology, in contrast, is a crossroads where scientists in need of better technology meet technology researchers in need of worthy application domains. Participants come from fields old and new, keeping a foot in each world for the duration of the collaboration.
While the problem area is exciting and has reached critical mass with conferences and journals, prospects for hiring, tenure, and promotion are centered elsewhere. Motivations to build a more extensive knowledge infrastructure or common-pool resources for sensor-networked science and technology have yet to emerge.