Add a first (real) Method#
At the most basic level, a QIIME 2 Action is simply an annotation of a Python function that describes in detail the inputs and outputs to that function. Here weâll create a powerful action that illustrates this idea.
The type of action that weâll create is a method, meaning that it will take zero or more QIIME 2 artifacts as input, and it will generate one or more QIIME 2 artifacts as output.
tl;dr
The complete code that I developed to add this action to my plugin can be found here: caporaso-lab/q2-dwq2.
Pairwise sequence alignment#
One of the most fundamental algorithms in bioinformatics is pairwise sequence alignment. Pairwise sequence alignment forms the basis of BLAST, many genome assemblers, phylogenetic inference from molecular sequence data, assigning taxonomy to environmental DNA sequences, and so much more. The first real action that weâll add to our plugin is a method that performs pairwise sequence alignment using the Needleman-Wunsch global pairwise alignment algorithm [2].
You donât need to understand how the NW algorithm works interally to implement our method, because weâre going to work with a Python function that implements the algorithm for us. But, youâll need to understand what its input and outputs are to be able to annotate them, and youâll need to understand at a basic level what it does so that you can test that your code is working as expected.
If you do want to learn more about pairwise sequence alignment, the specific algorithm and implementation that weâre going to work with here is presented in detail in the Pairwise Sequence Alignment chapter of An Introduction to Applied Bioinformatics [3]. Briefly:
The goal of pairwise sequence alignment is, given two DNA, RNA, or protein sequences, to generate a hypothesis about which sequence positions derived from a common ancestral sequence position. [3]
Our method will take two DNA sequences as input.
It will attempt to align like positions with each other, inserting gap (i.e., -
) characters where it seems likely that insertion/deletion events have occurred over the course of evolution.
The output will be a pairwise sequence alignment (or more briefly, an alignment), which is a special case of a multiple sequence alignment that contains exactly two sequences.
Our method will also take a few parameters as input.
These parameters include things like the score that should be assigned when a pair of positions contains matching characters (weâll call this match_score
), or the score penalty that is incurred when a new gap is opened in the alignment (gap_open_penalty
).
The scikit-bio library implements Needleman-Wunsch global pairwise alignment algorithm as skbio.alignment.global_pairwise_align_nucleotide
.
Weâre going to make this accessible through our plugin by writing a simple wrapper of this function.
Write a wrapper function#
Technically we donât need to write a wrapper function of skbio.alignment.global_pairwise_align_nucleotide
, but rather we could register that function directly.
But for our first pass at this action there are a couple of things weâre going to need to do to get data from QIIME 2 into skbio.alignment.global_pairwise_align_nucleotide
.
Letâs add this wrapper to our q2-dwq2/q2_dwq2/_methods.py
.
(Remember that my package name is q2-dwq2
, and my module name is q2_dwq2
.
If yours are different, youâll need to adjust that relative file path accordingly.)
The _
at the beginning of _methods.py
is convention that conveys that this is intended to be a private submodule: in other words, consummers of this code outside of the this Python package shouldnât access anything in this file directly.
Since the duplicate_table
method that was defined in this plugin was only intended to get us started, remove it now, replacing all of the content of that file with the following code:
# ----------------------------------------------------------------------------
# Copyright (c) 2024, Greg Caporaso.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file LICENSE, distributed with this software.
# ----------------------------------------------------------------------------
from skbio.alignment import global_pairwise_align_nucleotide, TabularMSA
from q2_types.feature_data import DNAIterator
def nw_align(seq1: DNAIterator,
seq2: DNAIterator,
gap_open_penalty: float = 5,
gap_extend_penalty: float = 2,
match_score: float = 1,
mismatch_score: float = -2) -> TabularMSA:
seq1 = next(iter(seq1))
seq2 = next(iter(seq2))
msa, _, _ = global_pairwise_align_nucleotide(
seq1=seq1, seq2=seq2, gap_open_penalty=gap_open_penalty,
gap_extend_penalty=gap_extend_penalty, match_score=match_score,
mismatch_score=mismatch_score
)
return msa
Here I defined a new function, nw_align
(for Needleman-Wunsch Alignment).
At its core, what this function is doing is passing some inputs through to skbio.alignment.global_pairwise_align_nucleotide
.
The aspects of this that might make this look different from what you typically see in a Python function definition are the type hints, which are not used by Python directly, but are intended to be used by other tools (like QIIME 2).
The type hints define the data type associated with each input and output from our function, and are one of the ways that we annotate a function so that QIIME 2 knows how to interact with it.
Some of these are built-in types (in this case, we are providing four float
values).
The others, DNAIterator
and TabularMSA
, are defined in the q2-types QIIME 2 plugin and in scikit-bio, respectively.
A DNAIterator
is a Python object that enables iteration over a collection of zero or more skbio.DNA
objects (which represent DNA sequences), and a TabularMSA
object represents a multiple sequence alignment.
A little bit later weâll come back to how you decide what type hints to provide here.
The first couple of lines in this function are a little bit odd, and stem from the fact that (as of this writing) there isnât an existing QIIME 2 artifact class for individual DNA sequences, but rather only for collections of DNA sequences.
We are therefore going to work-around this right now, and in a subsequent lesson weâll define our own QIIME 2 artifact class to represent a single DNA sequence.
We need two sequences as input to pairwise sequence alignment (by definition), and weâll take those as inputs through the seq1
and seq2
parameters.
These will come in as DNAIterator
objects, and our work-around is that we read the first sequence from each of these two input sequence collections by getting the next()
item from each collection, one time each, and reassigning them to the variables seq1
and seq2
.
Next, we call skbio.alignment.global_pairwise_align_nucleotide
, passing in all of our inputs.
skbio.alignment.global_pairwise_align_nucleotide
returns three outputs.
For now, weâre only going to concern ourselves with the first: the multiple sequence alignment, which weâll store in a varriable called msa
.
By convention in Python, unused return values are assigned to a variable named _
.
Finally, weâll return the msa
variable.
So for the most part, this works like a normal Python function.
The unusual aspects are the type hints, and our workaround for getting the first sequence out of each of our input DNAIterators
.
Note
Note that in defining this wrapper function, we havenât yet touched the concept of QIIME 2 Artifacts. The underlying functions that we register as methods or visualizers donât know anything about Artifacts. You can also use this function just as you would any other Python function - as mentioned above, Python itself ignores the type hints, so you could just consider these detailed documentation of the input and output types of your function.
Register the wrapper function as a plugin action#
Now that we have a function, nw_align
, that we want to register as an action in our plugin, letâs do it.
Define a citation for this action#
If the action that youâre performing has a relevant citation, adding that during action registration will allow your users to discover what they should be citing when they use your action. This is a good way, for example, to ensure that your users know that they should be citing your work (and not just citing QIIME 2).
To associate a citation with our new action, the first thing weâll do is add the bibtex-formatted citation to q2-dwq2/q2_dwq2/citations.bib
.
Bibtex is a standard format recognizes by nearly all (or all) citation managers, including Paperpile and EndNote.
You can generally export a bibtex citation from those tools, or alternatively get one from Google Scholar.
The relevant citation for Needleman-Wunsch alignment is titled A general method applicable to the search for similarities in the amino acid sequence of two proteins, and was published in 1970.
Find this citation using your favorite tool.
If you use Google Scholar, search for the title of the article, and then identify it in the search results (it should be the first one).
As of this writing, you would next click âCiteâ under the search result, and then click âBibtexâ.
That should bring you to a page that contains the following bibtex-formatted citation:
@article{needleman1970general,
title={A general method applicable to the search for similarities in the amino acid sequence of two proteins},
author={Needleman, Saul B and Wunsch, Christian D},
journal={Journal of molecular biology},
volume={48},
number={3},
pages={443--453},
year={1970},
publisher={Elsevier}
}
I copied this and then pasted it at the bottom of the q2-dwq2/q2_dwq2/citations.bib
file.
I also changed the citation key on the first line of this bibtext record (needleman1970general
) to Needleman1970
.
This is how weâll reference this citation when we associate it with our action, and my version of the key is just a little easier for me to remember.
Register the action in plugin_setup.py
#
Now we have what we need to register our function as a plugin method.
By convention, this is done in q2-dwq2/q2_dwq2/plugin_setup.py
, and thatâs what weâll do here.
Start by removing the registration of the duplicate_table
action, now that we removed the underlying function from the plugin.
To register a function as a method, youâll use plugin.methods.register_function
, where plugin
is the qiime2.plugin.Plugin
action that is instantiated in this file.
Add the following code to your q2-dwq2/q2_dwq2/plugin_setup.py
file, and then weâll work through it line by line.
Note that this wonât work yet - we still need to add some import
statements to the top of the file, but weâll add those as we work through the code where the imported functionality is used.
plugin.methods.register_function(
function=nw_align,
inputs={'seq1': FeatureData[Sequence],
'seq2': FeatureData[Sequence]},
parameters={
'gap_open_penalty': Float % Range(0, None, inclusive_start=False),
'gap_extend_penalty': Float % Range(0, None, inclusive_start=False),
'match_score': Float % Range(0, None, inclusive_start=False),
'mismatch_score': Float % Range(None, 0, inclusive_end=True)},
outputs={'aligned_sequences': FeatureData[AlignedSequence]},
input_descriptions={'seq1': 'The first sequence to align.',
'seq2': 'The second sequence to align.'},
parameter_descriptions={
'gap_open_penalty': ('The penalty incurred for opening a new gap. By '
'convention this is a positive number.'),
'gap_extend_penalty': ('The penalty incurred for extending an existing '
'gap. By convention this is a positive number.'),
'match_score': ('The score for matching characters at an alignment '
'position. By convention, this is a positive number.'),
'mismatch_score': ('The score for mismatching characters at an '
'alignment position. By convention, this is a '
'negative number.')},
output_descriptions={
'aligned_sequences': 'The pairwise aligned sequences.'
},
name='Pairwise global sequence alignment.',
description=("Align two DNA sequences using Needleman-Wunsch (NW). "
"This is a Python implementation of NW, so it is very slow! "
"This action is for demonstration purposes only. đ"),
citations=[citations['Needleman1970']]
)
First, we call plugin.methods.register_function
.
This function call takes a number of parameters, and you can find full detail by following the [source]
link from the Plugin
API documentation.
Hereâs what each is:
function
: This is the Python function to be registered as a plugin action. We defined ours above asnw_align
. If you add the import statementfrom q2_dwq2._methods import nw_align
to the top of this file, you can providenw_align
.inputs
: This is a Pythondict
mapping the variable names of the inputs to the plugin action to the artifact classes of the inputs. As mentioned above, QIIME 2 doesnât define an artifact class for a single DNA sequence, so weâre going to use the type that is commonly used for defining collections of DNA sequences, and weâll just end up working with the first sequence in each input. The type we use here isFeatureData[Sequence]
. A little bit later weâll come back to how you identify the artifact classes that should be assigned to your input, and how to define your own artifact class if there isnât already a relevant one. Inputs to QIIME 2 actions are data in the form of Artifacts, and these are different than Parameters. It is at this stage, when registering a function as an action, that this distinction is made. The artifact classes weâre using here need to be imported from q2-types. To do this, add the linefrom q2_types.feature_data import FeatureData, Sequence
to the top of the file.parameters
: This is a Pythondict
mapping the names of parameters to their Primitive Type. This information is used to validate input provided by users of your plugin, but more importantly to allow QIIME 2 interfaces to determine how this information should be collected from a user. For example, in a graphical interface the value of aBoolean
parameter could be collected from a user using a checkbox, while aFloat
parameter could be collected using a text field that only accepts numbers. Our four parameters are allFloats
, and each has aRange
that values must fall in. Import these primitive types for use here by adding the linefrom qiime2.plugin import Float, Range
to the top of the file. Thegap_open_penalty
parameter, for example, is defined here as taking a floating point value greater than 0.outputs
: This is a Pythondict
mapping the variable names of the outputs from the plugin action to their artifact classes. The type we use here isFeatureData[AlignedSequence]
, representing a collection of aligned DNA sequences. A more appropriate type might represent a pair of aligned sequences specifically, rather than one or more aligned sequences which is what this artifact class implies, but again weâll come back to that a later. Weâll need to importAlignedSequence
here as well, which you can do by adding the linefrom q2_types.feature_data import AlignedSequence
to the top of the file (or addingAlignedSequence
to the imports you already added fromq2_types.feature_data
).input_descriptions
,parameter_descriptions
, andoutput_descriptions
: These are Pythondicts
that provide descriptions of each input, parameter, and output, respectively, for use in help text through different interfaces.name
: A brief name for this action. This shows up, for example, when listing the actions that are available in a plugin.description
: A longer description of the action. This is generally presented to a user when they request more detail on an action (for example, by passing a--help
parameter through a command line interface).citations
: A list of the citations that should be associated with this action. Earlier in theplugin_setup.py
file we instantiated acitations
lookup, and we can now use that to associate the citation we added tocitations.bib
with this action.
After adding this code and the corresponding import
statements to your plugin_setup.py
, you should be ready to try this action out.
My import
statements now look like the following:
from qiime2.plugin import Citations, Plugin, Float, Range
from q2_types.feature_data import FeatureData, Sequence, AlignedSequence
from q2_dwq2 import __version__
from q2_dwq2._methods import nw_align
Calling the action with q2cli and the Python 3 API#
Activate your development environment and run qiime dev refresh-cache
.
If your code doesnât have any syntax errors, and you addressed all of the additions described in this document, you should then be able to run qiime dwq2 --help
(replacing dwq2
with your pluginâs name, if itâs different), and see your new nw-align
action show up in the list of actions associated with your plugin.
It should look something like this:
$ qiime dwq2 --help
Usage: qiime dwq2 [OPTIONS] COMMAND [ARGS]...
Description: A prototype of a demonstration plugin for use by readers of
*Developing with QIIME 2* (DWQ2).
Plugin website: https://develop.qiime2.org/
Getting user support: Please post to the QIIME 2 forum for help with this
plugin: https://forum.qiime2.org
Options:
--version Show the version and exit.
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--help Show this message and exit.
Commands:
nw-align Pairwise global sequence alignment.
If you call qiime dwq2 nw-align --help
, youâll see the more detailed help text for the nw-align
action.
It should look something like this:
$ qiime dwq2 nw-align --help
Usage: qiime dwq2 nw-align [OPTIONS]
Align two DNA sequences using Needleman-Wunsch (NW). This is a Python
implementation of NW, so it is very slow! This action is for demonstration
purposes only. đ
Inputs:
--i-seq1 ARTIFACT FeatureData[Sequence]
The first sequence to align. [required]
--i-seq2 ARTIFACT FeatureData[Sequence]
The second sequence to align. [required]
Parameters:
--p-gap-open-penalty NUMBER Range(0, None, inclusive_start=False)
The penalty incurred for opening a new gap. By
convention this is a positive number. [default: 5]
--p-gap-extend-penalty NUMBER Range(0, None, inclusive_start=False)
The penalty incurred for extending an existing gap.
By convention this is a positive number.
[default: 2]
--p-match-score NUMBER Range(0, None, inclusive_start=False)
The score for matching characters at an alignment
position. By convention, this is a positive number.
[default: 1]
--p-mismatch-score NUMBER Range(None, 0, inclusive_end=True)
The score for mismatching characters at an
alignment position. By convention, this is a
negative number. [default: -2]
Outputs:
--o-aligned-sequences ARTIFACT FeatureData[AlignedSequence]
The pairwise aligned sequences. [required]
Miscellaneous:
--output-dir PATH Output unspecified results to a directory
--verbose / --quiet Display verbose output to stdout and/or stderr
during execution of this action. Or silence output
if execution is successful (silence is golden).
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--help Show this message and exit.
Similarly, if you start an iPython session by calling ipython
in your activated development environment, you can access the nw_align
method through its Python 3 API as follows.
In [1]: import qiime2.plugins.dwq2
In [2]: qiime2.plugins.dwq2.actions.nw_align?
This call should produce the following help text:
Call signature:
qiime2.plugins.dwq2.actions.nw_align(
seq1: FeatureData[Sequence],
seq2: FeatureData[Sequence],
gap_open_penalty: Float % Range(0, None, inclusive_start=False) = 5,
gap_extend_penalty: Float % Range(0, None, inclusive_start=False) = 2,
match_score: Float % Range(0, None, inclusive_start=False) = 1,
mismatch_score: Float % Range(None, 0, inclusive_end=True) = -2,
) -> (FeatureData[AlignedSequence],)
Type: Method
String form: <method qiime2.plugins.dwq2.methods.nw_align>
File: ~/miniconda3/envs/dwq2/lib/python3.8/site-packages/qiime2/sdk/action.py
Docstring: QIIME 2 Method
Call docstring:
Pairwise global sequence alignment.
Align two DNA sequences using Needleman-Wunsch (NW). This is a Python
implementation of NW, so it is very slow! This action is for demonstration
purposes only. đ
Parameters
----------
seq1 : FeatureData[Sequence]
The first sequence to align.
seq2 : FeatureData[Sequence]
The second sequence to align.
gap_open_penalty : Float % Range(0, None, inclusive_start=False), optional
The penalty incurred for opening a new gap. By convention this is a
positive number.
gap_extend_penalty : Float % Range(0, None, inclusive_start=False), optional
The penalty incurred for extending an existing gap. By convention this
is a positive number.
match_score : Float % Range(0, None, inclusive_start=False), optional
The score for matching characters at an alignment position. By
convention, this is a positive number.
mismatch_score : Float % Range(None, 0, inclusive_end=True), optional
The score for mismatching characters at an alignment position. By
convention, this is a negative number.
Returns
-------
aligned_sequences : FeatureData[AlignedSequence]
The pairwise aligned sequences.
There are now two interfaces to your method, and you didnât have to write either of them â cool! đ
Take a minute to review both the command line and Python help text, and relate it to the parameters we set when we registered the action.
Write unit tests#
Your code is not ready for use until you write unit tests, to ensure that itâs doing what you expect.
Weâll write unit tests for nw_align
in q2_dwq2/tests/test_methods.py
.
QIIME 2 provides a class, TestPluginBase
, that facilitates unit testing plugins.
Warning
Developing with QIIME 2 assumes that you have some background in software engineering. If writing unit tests or software testing in general are new to you, you should learn about these topics before developing software that you intend to use for ârealâ analysis. Small errors in code can have huge implications, including angry users, paper retractions, and even clinical errors that could impact someoneâs medical treatment.
I highly recommend reading The Pragmatic Programmer: Your Journey to Mastery (20th Anniversary Edition) [4]. Topic 41 discusses software testing, but the whole book is worth reading if youâre serious about developing high-quality software.
What to test and what not to test#
When testing a QIIME 2 plugin, your goal is to confirm that the functionality that you developed works as expected. You canât test that every possible input produces its expected output, so instead you need to think about what tests will convince you that itâs working across the range of inputs that would be expected. Itâs also a good idea to test that invalid input results in a failure, and ideally also provides an informative error message.
If youâre simply wrapping a function, like we are here, you donât need to test the underlying function in detail as that should have been tested already in the library that provides that function. (If thatâs not the case, you should reconsider whether this function is the one that you want to use!)
You also donât need to test things such as whether your method works through q2cli, the Python 3 API, and Galaxy. That is functionality that you get for free when developing QIIME 2 plugins: the developers of the QIIME 2 framework and the other related tools have already tested this, and this should work as long as youâre not adopting any plugin development antipatterns.
A first test of our plugin action#
The following is a first test of our nw_align
method.
Remove the tests of duplicate_table
that in this file, since we removed that action from the plugin.
from skbio.alignment import TabularMSA
from skbio.sequence import DNA
from qiime2.plugin.testing import TestPluginBase
from qiime2.plugin.util import transform
from q2_types.feature_data import DNAFASTAFormat, DNAIterator
from q2_dwq2._methods import nw_align
class NWAlignTests(TestPluginBase):
package = 'q2_dwq2.tests'
def test_simple1(self):
# test alignment of a pair of sequences
sequence1 = DNA('AAAAAAAAGGTGGCCTTTTTTTT')
sequence1 = DNAIterator([sequence1])
sequence2 = DNA('AAAAAAAAGGGGCCTTTTTTTT')
sequence2 = DNAIterator([sequence2])
observed = nw_align(sequence1, sequence2)
aligned_sequence1 = DNA('AAAAAAAAGGTGGCCTTTTTTTT')
aligned_sequence2 = DNA('AAAAAAAAGG-GGCCTTTTTTTT')
expected = TabularMSA([aligned_sequence1, aligned_sequence2])
self.assertEqual(observed, expected)
First, we import some functions and classes that weâll use in our tests.
Then, we define a class, NWAlignTests
, that inherits from TestPluginBase
, a class used to facilitate testing of QIIME 2 plugins.
TestPluginBase
has you define a class variable, package
, defining the submodule that weâre working in - weâll come back to how this is used shortly.
Finally, weâre ready to start defining unit tests.
The first test that I typically define for a function is a test of its default behavior on some very simple input where itâs easy for me to determine what the expected outcome should be.
In the example here, Iâm creating two skbio.DNA
sequence objects, turning them into DNAIterators
(remember that that is what nw_align
expects as input), and then calling nw_align
on those two inputs.
I call the return value from nw_align
observed
, because it is my observed output.
I very specifically chose the sequences here because I could tell based on my knowledge of Needleman-Wunsch alignment what the output would be: itâs clear that the two sequences differ only in that there appears to have either been an insertion of a T
character in sequence1
, or a deletion of a T
character in sequence2
since the ancestral sequence.
Alternatively, in a case like this, itâs also fair game to generate the expected output by calling the underlying function (skbio.alignment.global_pairwise_align_nucleotide
) directly.
This is because weâre not testing that skbio.alignment.global_pairwise_align_nucleotide
does what itâs supposed (again, we trust that it is, or we wouldnât be using it).
Weâre only testing that our wrapper generates the output that we would expect from skbio.alignment.global_pairwise_align_nucleotide
.
After getting my observed
output, I define my expected
output (i.e., what nw_align
should return if itâs working as expected).
In this case, thatâs a pairwise alignment of sequence1
and sequence2
with a -
character added where the insertion/deletion event is hypothesized to have occurred.
Finally, we compare our observed
output to our expected
output.
An important thing to note here is that I didnât need to load any data from file or use any QIIME 2 artifacts when testing my method.
Because my method is just a Python function that I registered with my Plugin
, I can provide input objects as I would to any other Python function.
It is possible to store QIIME 2 Artifacts in the repository and load them for use in the tests, but that gets a little clunky so itâs best avoided when possible.
For example, youâll often want to test multiple minor variations on the input to test edge cases (i.e., boundary conditions).
Thatâs much easier to do if youâre working with Python objects as input, rather than if you need to create a whole bunch of different QIIME 2 artifacts and store them in the repository.
Storing artifacts in the repository to use as inputs in unit tests can also increase the repository size, and itâs not straight-forward to compare how inputs have changed across different revisions of the code.
We can now run the test using make test
on the command line from your q2-dwq2
directory.
This should look something like the following:
$ make test
...
q2_dwq2/tests/test_methods.py::NWAlignTests::test_simple1
...
==== 1 passed, 5 warnings in 0.17s ====
At the moment weâre not concerned about the warnings that are being reported. We see from this output that we defined one test, and that one test passed. So weâre ready to move on.
A second test of our action#
All of that said, sometimes you do want to store data that you use in tests in files in the repository (for example, if they are large - in which case itâs a pain to store the test data in your unit test Python files). The following unit test illustrates how this can be achieved.
def test_simple2(self):
# test alignment of a different pair of sequences
# loaded from file this time, for demonstration purposes
sequence1 = transform(
self.get_data_path('seq-1.fasta'),
from_type=DNAFASTAFormat,
to_type=DNAIterator)
sequence2 = transform(
self.get_data_path('seq-2.fasta'),
from_type=DNAFASTAFormat,
to_type=DNAIterator)
observed = nw_align(sequence1, sequence2)
aligned_sequence1 = DNA('ACCGGTGGAACCGG-TAACACCCAC')
aligned_sequence2 = DNA('ACCGGT--AACCGGTTAACACCCAC')
expected = TabularMSA([aligned_sequence1, aligned_sequence2])
self.assertNotEqual(observed, expected)
In this example, data is loaded from a file path that is relative to the TestPluginBase.package
variable that we set above.
In this case, the data needs to be transformed from a fasta file, which QIIME 2 represents as an instance of q2-type
âs DNAFASTAFormat
object to a DNAIterator
, so it can be provided as input to nw_align
.
Here we use the qiime2.plugin.util.transform
method to perform the transformation.
After loading the files and transforming them, the test looks identical to the previous test case that we defined, except that the expected
output is different, because the sequences we loaded differ from those used in the test_simple1
method.
In order for this test to pass, you must actually have the two .fasta
files that are being loaded in the submodule as specified here (i.e., you should have the files q2-dwq2/q2_dwq2/tests/data/seq-1.fasta
and q2-dwq2/q2_dwq2/tests/data/seq-2.fasta
in your Python package).
Save the following text in a new file, q2-dwq2/q2_dwq2/tests/data/seq-1.fasta
.
>example-sequence-1
ACCGGTGGAACCGGTAACACCCAC
Save the following text in a new file, q2-dwq2/q2_dwq2/tests/data/seq-2.fasta
.
>example-sequence-2
ACCGGTAACCGGTTAACACCCAC
While adding these files, also remove q2-dwq2/q2_dwq2/tests/data/table-1.biom
, which weâre not using any more since we removed the duplicate_table
action and its tests, using the command:
rm q2_dwq2/tests/data/table-1.biom
Note
If you initialized a git repository for this plugin when you started working on it, you should instead remove the file with:
git rm q2_dwq2/tests/data/table-1.biom
After defining this test in your plugin, run the unit tests and confirm that you now have two passing tests.
A few additional tests#
Because we want to test that our function generates the results that we would expect from skbio.alignment.global_pairwise_align_nucleotide
, itâs good to check that the parameter values that we provide impact the results in the expected ways.
I did this with four additional unit tests, each focused on a different input parameter.
def test_alt_match_score(self):
s1 = DNA('AAAATTT')
sequence1 = DNAIterator([s1])
s2 = DNA('AAAAGGTTT')
sequence2 = DNAIterator([s2])
# call with default value for match score
observed = nw_align(sequence1, sequence2)
aligned_sequence1 = DNA('--AAAATTT')
aligned_sequence2 = DNA('AAAAGGTTT')
expected = TabularMSA([aligned_sequence1, aligned_sequence2])
self.assertEqual(observed, expected)
sequence1 = DNAIterator([s1])
sequence2 = DNAIterator([s2])
# call with non-default value for match_score
observed = nw_align(sequence1, sequence2, match_score=10)
# the following expected outcome was determined by calling
# skbio.alignment.global_pairwise_align_nucleotide directly. the
# goal isn't to test that the underlying library code (i.e.,
# skbio.alignment.global_pairwise_align_nucleotide) is working, b/c
# I trust that that is already tested (or I wouldn't use it). rather,
# the goal is to test that my wrapper of it is working. in this case,
# specifically, i'm testing that passing an alternative value for
# match_score changes the output alignment
aligned_sequence1 = DNA('AAAA--TTT')
aligned_sequence2 = DNA('AAAAGGTTT')
expected = TabularMSA([aligned_sequence1, aligned_sequence2])
self.assertEqual(observed, expected)
def test_alt_gap_open_penalty(self):
s1 = DNA('AAAATTT')
sequence1 = DNAIterator([s1])
s2 = DNA('AAAAGGTTT')
sequence2 = DNAIterator([s2])
observed = nw_align(sequence1, sequence2, gap_open_penalty=0.01)
aligned_sequence1 = DNA('AAAA-T-TT-')
aligned_sequence2 = DNA('AAAAG-GTTT')
expected = TabularMSA([aligned_sequence1, aligned_sequence2])
self.assertEqual(observed, expected)
sequence1 = DNAIterator([s1])
sequence2 = DNAIterator([s2])
observed = nw_align(sequence1, sequence2)
aligned_sequence1 = DNA('--AAAATTT')
aligned_sequence2 = DNA('AAAAGGTTT')
expected = TabularMSA([aligned_sequence1, aligned_sequence2])
self.assertEqual(observed, expected)
def test_alt_gap_extend_penalty(self):
s1 = DNA('AAAATTT')
sequence1 = DNAIterator([s1])
s2 = DNA('AAAAGGTTT')
sequence2 = DNAIterator([s2])
observed = nw_align(sequence1, sequence2, gap_open_penalty=0.01)
aligned_sequence1 = DNA('AAAA-T-TT-')
aligned_sequence2 = DNA('AAAAG-GTTT')
expected = TabularMSA([aligned_sequence1, aligned_sequence2])
self.assertEqual(observed, expected)
sequence1 = DNAIterator([s1])
sequence2 = DNAIterator([s2])
observed = nw_align(sequence1, sequence2, gap_open_penalty=0.01,
gap_extend_penalty=0.001)
aligned_sequence1 = DNA('AAAA--TTT')
aligned_sequence2 = DNA('AAAAGGTTT')
expected = TabularMSA([aligned_sequence1, aligned_sequence2])
self.assertEqual(observed, expected)
def test_alt_mismatch_score(self):
s1 = DNA('AAAATTT')
sequence1 = DNAIterator([s1])
s2 = DNA('AAAAGGTTT')
sequence2 = DNAIterator([s2])
observed = nw_align(sequence1, sequence2, gap_open_penalty=0.01)
aligned_sequence1 = DNA('AAAA-T-TT-')
aligned_sequence2 = DNA('AAAAG-GTTT')
expected = TabularMSA([aligned_sequence1, aligned_sequence2])
self.assertEqual(observed, expected)
sequence1 = DNAIterator([s1])
sequence2 = DNAIterator([s2])
observed = nw_align(sequence1, sequence2, gap_open_penalty=0.1,
mismatch_score=-0.1)
aligned_sequence1 = DNA('-AAA-ATTT')
aligned_sequence2 = DNA('AAAAGGTTT')
expected = TabularMSA([aligned_sequence1, aligned_sequence2])
self.assertEqual(observed, expected)
Wrapping up testing#
When these tests are all in place (or in the process of putting them in place), you can run them by calling make test
on the command line.
If everything is working as expected, you should see something like the following:
$ make test
...
q2_dwq2/tests/test_methods.py::NWAlignTests::test_alt_gap_extend_penalty
q2_dwq2/tests/test_methods.py::NWAlignTests::test_alt_gap_open_penalty
q2_dwq2/tests/test_methods.py::NWAlignTests::test_alt_match_score
q2_dwq2/tests/test_methods.py::NWAlignTests::test_alt_mismatch_score
q2_dwq2/tests/test_methods.py::NWAlignTests::test_simple1
q2_dwq2/tests/test_methods.py::NWAlignTests::test_simple2
...
==== 6 passed, 23 warnings in 1.17s ====
If your tests pass, and you can see the action on the command line, you should be in good shape so letâs try running the method.
Trying the new action#
To run the new action, youâll need two input files, each containing a DNA sequence to align.
At this stage, your inputs need to be QIIME 2 artifacts.
You should be able to do this with any FeatureData[Sequence]
artifacts you have access to.
If you donât have any, you can use the ones that we used in test_simple2
by importing.
To do this, change to a temporary directory and copy the two files referenced in test_simple2
to that directory.
You should then be able to run the following commands to import those files:
qiime tools import --input-path seq-1.fasta --type "FeatureData[Sequence]" --output-path seq-1.qza
qiime tools import --input-path seq-2.fasta --type "FeatureData[Sequence]" --output-path seq-2.qza
Then, you can apply your new action to these two inputs as follows:
qiime dwq2 nw-align --i-seq1 seq-1.qza --i-seq2 seq-2.qza --o-aligned-sequences aligned-seqs.qza
This should create a new output, aligned-seqs.qza
.
Since this is a method, itâs generating a new artifact as an output.
As always, artifacts arenât intended for human consumption, but rather to be used as input to other QIIME 2 actions or exported for use with other (non-QIIME 2) tools.
If you want to take a peek at whatâs in there, you can export it:
qiime tools export --input-path aligned-seqs.qza --output-path aligned-seqs
Then, you can view the file contents as you would with any .fasta file. For example:
$ cat aligned-seqs/aligned-dna-sequences.fasta
>example-sequence-1
ACCGGTGGAACCGG-TAACACCCAC
>example-sequence-2
ACCGGT--AACCGGTTAACACCCAC
So there you have it - a first (real) action in our QIIME 2 plugin. â
As a next step, letâs make this a little more user-friendly by adding a Visualizer that will let us look at the outcome of our pairwise alignment directly, without having to export it from QIIME 2.
An optional exercise#
Now that you have this method working, try adding a method for local pairwise alignment of nucleotide sequences using the Smith-Waterman (SW) algorithm.
scikit-bio provides an implementation of SW as skbio.alignment.local_pairwise_align_nucleotide
.
Donât forget to write your unit tests!
Throughout the next few chapters, additional exercises will build on this functionality.