Generating a gene id to gene name mapping#

It is often useful to perform analyses with gene names rather than gene identifiers. The convert command of pyroe allows you to specify an id to name mapping so that the converted output matrix will be labeled with gene names rather than identifiers. However, you must provide it with a 2-column tab-separated file mapping IDs to names. This command can help you with that task.

The id-to-name command takes as input 2 parameters, an annotation file and the location where an output file should be written. The annotation file can be either a GTF or GFF3 file (which, optionally, can be gzipped). The program will attempt to figure out the format automatically from the file suffix(es); if it cannot, you may also provide the parameter --format taking the argument either gtf or gff3 to specify the format of the input. The id-to-name command will use the PyRanges package to parse the file and extract the gene id to gene name mapping, and will write this mapping at the provided output path.

Note: The id-to-name sub-command assumes that the input annotation file has a field named gene_id and another named gene_name. Please make sure the input has these fields to produce a proper output mapping. We may allow specifying a custom gene_id identifier and gene_name identifier in the future.

id-to-name full usage#

usage: pyroe id-to-name [-h] [--format FORMAT] gtf_file output

positional arguments:
gtf_file         The GTF input file.
output           The path to where the output tsv file will be written.

optional arguments:
-h, --help       show this help message and exit
--format FORMAT  The input format of the file (must be either GTF or GFF3). This will be inferred from the filename, but if that fails it can be provided explicitly.