NGC 3718, NGC
3729 and other galaxies have been analyzed using machine learning
algorithms that can be "taught" to recognize astrophysical similarities.
The same technology is now being applied to cancer images, as well.
Image Credit: Catalina Sky Survey, U of Arizona, and Catalina Realtime
Transient Survey, Caltech. › Larger image
A lung specimen that was analyzed using the same machine learning
algorithms that were originally developed for space research. Image
Credit: Early Research Detection Network/University of Colorado. › Larger image
JPL and National Cancer Institute Renew Big Data Partnership
Every day, NASA spacecraft beam down hundreds of petabytes of data,
all of which has to be codified, stored and distributed to scientists
across the globe. Increasingly, artificial intelligence is helping to
"read" this data as well, highlighting similarities between datasets
that scientists might miss.
For the past 15 years, the big data techniques pioneered by NASA's
Jet Propulsion Laboratory in Pasadena, California, have been
revolutionizing biomedical research. On Sept. 6, 2016, JPL and the
National Cancer Institute (NCI), part of the National Institutes of
Health, renewed a research partnership through 2021, extending the
development of data science that originated in space exploration and is
now supporting new cancer discoveries.
The NCI-supported Early Detection Research Network (EDRN)
is a consortium of biomedical investigators who share anonymized data
on cancer biomarkers, chemical or genetic signatures related to specific
cancers. Their goal is to pool all their research data into a single,
searchable network, with the goal of translating their collective work
into techniques for early diagnosis of cancer or cancer risk.
In the time they've worked together, JPL and EDRN's efforts have led
to the discovery of six new Food and Drug Administration-approved cancer
biomarkers and nine biomarkers approved for use in Clinical Laboratory
Improvement Amendments labs. The FDA has approved each of these
biomarkers for use in cancer research and diagnosis. These
agency-approved biomarkers have been used in more than 1 million patient
diagnostic tests worldwide.
"After the founding of EDRN in 2000, the network needed expertise to
take data from multiple studies on cancer biomarkers and create a
single, searchable network of research findings for scientists," said
Sudhir Srivastava, chief of NCI's Cancer Biomarkers Research Group and
head of EDRN. JPL had decades of experience doing similar work for NASA,
where spacecraft transmit hundreds of petabytes of data to be coded,
stored and distributed to scientists across the globe.
Dan Crichton, the head of JPL's Center for Data Science and
Technology, a joint initiative with Caltech in Pasadena, California,
helped establish a JPL-based informatics center
dedicated to supporting EDRN's big data efforts. In the renewed
partnership, JPL is expanding its data science efforts to research and
applying technologies for additional NCI-funded programs. Those programs
include EDRN, the Consortium for Molecular and Cellular
Characterization of Screen-Detected Lesions, and the Informatics
Technology for Cancer Research initiative.
"From a NASA standpoint, there are significant opportunities to
develop new data science capabilities that can support both the mission
of exploring space and cancer research using common methodological
approaches," Crichton said. "We have a great opportunity to perfect
those techniques and grow JPL's data science technologies, while serving
our nation.
Crichton said JPL has led the way when it comes to taking data from
raw observations to scientific conclusions. One example: JPL often deals
with measurements from a variety of sensors -- say, cameras and mass
spectrometers. Both can be used to study a star, planet or similar
target object. But it takes special software to recognize that readings
from very different instruments relate to one another.
There's a similar problem in cancer research, where readings from
different biomedical tests or instruments require correlation with one
another. For that to happen, data have to be standardized, and
algorithms must be "taught" to know what they're looking for.
Since the time of its founding, EDRN's major challenge has been
access. Research centers all over the United States had large numbers of
biomarker specimens, but each had its own way of labeling, storing and
sharing their datasets. Ten sites may have high-quality specimens for
study, but if their common data elements -- age of patient, cancer type
and other characteristics - aren't listed uniformly, they can't be
studied as a whole.
"We didn't know if they were early-stage or late-stage specimens, or
if any level of treatment had been tried," Srivastava said. "And JPL
told us, 'We do this type of thing all the time! That's how we manage
our Planetary Data System.'"
As the network has developed, it has added members from dozens of
institutions, including Dartmouth College's Geisel School of Medicine;
Harvard Medical School's Massachusetts General Hospital; Stanford's NIST
Genome-Scale Measurements Group; University of Texas' MD Anderson
Cancer Center; and numerous others.
Christos Patriotis, program director at NCI's Cancer Biomarkers
Research Group, said the network's members now include international
researchers from the U.K., China, Japan, Australia, Israel and Chile.
"The more we expand, the more data we integrate," Patriotis said.
"Instead of being silos, now our partners can integrate their findings.
Each system can speak to the others."
As JPL and NCI's collaboration advances, next steps include image
recognition technology, such as helping EDRN archive images of cancer
specimens. Those images could be analyzed by computer vision, which is
currently used to spot similarities in star clusters and other
astrophysics research.
In the near future, Crichton said, machine learning algorithms could
compare a CT scan with an archive of similar images, searching for early
signs of cancer based on a patient's age, ethnic background and other
demographics.
"As we develop more automated methods for detecting and classifying
features in images, we see great opportunities for enhancing data
discovery," Crichton said. "We have examples where algorithms for
detection of features in astronomy images have been transferred to
biology and vice-versa."
For more information on the research, visit: http://edrn.cancer.gov
Caltech manages JPL for NASA.
News Media Contact
Andrew Good
Jet Propulsion Laboratory, Pasadena, Calif.
818-393-2433
andrew.c.good@jpl.nasa.gov