What You Should Know:
– AWS announced the launch of a new service, Amazon Omics, to help bioinformaticians, researchers, and scientists store, query, and analyze genomic, transcriptomic, and other omics data and generate insights to improve health and advance scientific discoveries.
– The explosion of “omics” data, such as genomic, transcriptomic, and proteomic data, is driving a new understanding of biology at the molecular level. This, in combination with clinical information, is being used in drug discovery, vaccine development, and to predict genetic predisposition to disease. But the size, rapid accumulation, complexity, and heterogeneity of omics data pose difficulties.
Amazon Omics Key Benefits
Amazon Omics supports large-scale analysis and collaborative research, without customers needing to worry about provisioning the underlying infrastructure. It enables customers to reduce time spent on setting up and running complex Extract-Transform-Load (ETL) pipelines by natively storing data in optimized query-ready formats (for example, Apache Parquet) with just a few API calls.
Customers can bring their own bioinformatics workflows and Amazon Omics manages the infrastructure to run it. This further reduces undifferentiated heavy lifting, enables customers to operate in a secure environment with built-in access control, logging and audit trails, while still complying with HIPAA, GDPR, and other regulations.
Amazon Omics enables customers to import and easily combine your own data with other publicly available reference datasets in the Registry of Open Data on AWS, such as the 1000 Genomes Project that can be used as a control to understand disease risk; the Genome Aggregation Database (gnomAD) to bring in population allele frequencies to unlock the door to disease detection; and more than 60 other genomic datasets.
Amazon Omics 3 Components
Amazon Omics provides customers with three components:
– Omics-aware object storage to store, discover, and share raw sequence data efficiently, securely, and at low cost.
– Omics Workflows, which allows customers to run reproducible bioinformatics workflows to process raw sequence data at scale either in the Omics Storage or in S3, removing all the undifferentiated heavy lifting associated with running these workflows.
– Omics Analytics, which simplifies analytics through query-ready variants (or mutations) and annotations. While these components will often be used together, customers can also leverage them in a standalone manner.
Why It Matters
Amazon Omics enables end-to-end omics storage, processing, and analysis by removing the need for organizations to setup and maintain specialized tools, workflows, and infrastructure. By natively integrating with analytics services like AWS Lake Formation and Amazon Athena, Amazon Omics enables customers to maintain management and governance over their omics data that is part of their multi-modal data lake. With a few API calls, customers can deploy a reproducible, production-grade infrastructure to accelerate innovation and time to derive medical insights.