[Linux] Find all filenames pointing to the same file
In Linux/Unix systems, the same file can have multiple copies with different names, termed hardlinks. This provides convenience of modifying all the copies by modifying any copy. Today, I will introduce how to find all the hardlinks for a file.
How many copies does a file have? One can find this information by using the following command:
1 2 3 ls -l sample-file.txt # output # -rwxrw-r--. 2 user hacker 3351 Oct 23 11:11 sample-file.
[Python]Generators
Definition Generators are special python objects that have the following features:
iteratable. so one can use next() or with for loop to get the elements. return elements when asked, one element each time; unlike list, the elements are not generated until asked for. the local state is saved after each call/request, so it can resume when next call arrives. Advantages Because of the above features, generators are more memory efficient than lists, and thus used in handling large datastream.
[Linux Tips] get absolute paths
In my research, I frequently fell into occasions when I want to solve some tiny tasks or to solve tasks more efficiently. Facing the situations, the first thing I did was google search. However, from today on, I will record these solutions in my blog: for one thing, I can revisit them when I need them in future (very likely), and for another, others may also benefit from the compiled list of tips.
UCSC Genome Mappability: a resource for analyzing NGS data
Often, mapping sequenced reads to a reference genome is the first step of analyzing next-generation sequencing data. However, a genome may contain many pieces of similar regions, making the reads derived from these similar regions difficult to map back – having no idea which region they are from. But with the information of similar regions in mind, one may pay attention to such regions and make data analysis clearer.
In fact, the UCSC genome browser has provided such resources: Mappability and Uniqueness of genomes.
My software CUA is cited by a Cell Reports paper
Very exciting: my software CUA is cited by a paper published at Cell Reports. This is very encouraging for me to further improve the software.
My paper describes the software was published at BioRxiv. You can also find documentation (including a tutorial) at CPAN.
Recently, more and more studies show that codon usage is, in addition to translation, relevant to mRNA stability, transcription, etc, making the observation of codon usage bias in many genomes more mysterious.
[Python] Parse command arguments with orgparse
Parsing command-line arguments is common in programming. In Bash, one can use getopts. In Perl, one can use the module Getopt::Long. In Python, we can use the package argparse.
The major steps of using the package include 3 steps:
Create a parser.
Add arguments.
Parse the arguments.
Here I will describe each step in details, particularly the options of the functions.
Create a parser
Below is a simple example of creating a parser object.
WSL -- a perfect replacement of Cygwin and Mingw
As a program developer, I love working on the command of Linux for its richment in convenient tools, such as sed, awk, grep, vi, etc.
Every time when I bought a new windows computer, I would install cygwin or MinGW for using these linux tools. With windows 10 (version Fall Creators update or later), one can install Windows Subsystem for Linux (WSL) in windows 10 from store, just like installing an app.
Can we put genes in one chromosome?
Organisms often have more than one chromosomes to carry their genetic materials such as genes. For example, a human individual has 23 pairs of different chromosomes, and a baking yeast Saccharomyces cerevisiae has 16 different chromosomes when in haploid status.
Why do an organism have many chromosomes other than one? Particularly, if one thinks that only genes on chromosomes are useful and the rest may be junk, he may think that putting all genes in a chromosome will be completely fine and this is more efficient in saving energy of synthesizing “useless” portions of chromosomes.
Science without (applying for) funds
Nowdays, scientific research is often done at institutions such as universities and companies.
A fact is that research often costs a lot of money and research results mostly don’t generate profit. Therefore, researchers often suffer the lack of research funds, especially recent years.
I often ponder how research can be sustainable and whether one can do research with applying for funds.
I happened to find Dr. Robert C. Edgar’s web page today when I search literature.
Blog with "blogdown"
I am very excited that I can start writing blogs with Markdown (even better, Rmarkdown), and then push the posts to GitHub. Then done!! The posts will show up at https://zhenguozhang.zone/ hosted at https://netlify.com. I don’t need worry deployment of the website.
For this to work, I used the excellent R package blogdown, written by the R celebrity Yihui Xie. Since the process is so amazing, I would like to share it with you, so you can also benefit from it.