Zhenguo Zhang's Blog

[Linux] Caution: using command 'less' on gzipped file

A couple of years ago, I found that I can use Linux command less to view gzipped file, such as: 1 less in.gz This is very convenient to view gzipped file content. However, recently, I noticed a problem with it. For a file which has 42196516 lines, if I opened it with less, it gave me a count of 30356021 lines: 1 2 less test.fq.gz | wc -l; # giving 30356021 gzip -dc test.

2019-12-27 28 s

[Tips] Get the OS version of Linux

There are many variants of Unix/Linux systems, such as Ubuntu, Redhat, and even some versions in Windows system such as MinGW/Msys and Cygwin. Today I am going to introduce a way to get such information, which is simple, run the following command 1 uname -o Running this command in Ubuntu and Redhat will yield GNU/Linux, while running it under MinGW/Msys yields Msys. If you run 1 uname -r will give you kernel release.

2019-11-02 36 s

[Bioinfo] Extract data using NCBI E-utilities

Today I am going to introduce a powerful tool to retrieve data from NCBI – E-utilities. This is a REST API for NCBI databases. I used the API several years ago, but recently I picked it up again for my projects, so I think that this is a good opportunity to make some records for my future use as well as for internet users. So let’s start. Introduction The use of the API has the following format:

2019-10-16 2 min 55 s

[Linux] Input password automatically

In this post, I will introduce an approach to input password programatically, which is useful if a repeated command needs password. Note: this approach is subject to the risk of exposing your passwords to others, so use it with caution. Let’s use sudo as an example. Normally when you type 1 sudo ls It will prompt for a password if you set one. However, if you type

2019-10-07 28 s

[Windows] Get running processes using PowerShell

As a computational biologist, I may want to get the commands running in current system. This is pretty simple to achieve in Unix-like systems (using ps -ef or top command), but need some effort in Windows system. Today, I’d like to introduce two ways to get the running processes in using Windows PowerShell. The first is get-process, which will list running processes, generating a table like the below: Handles NPM(K) PM(K) WS(K) CPU(s) Id SI ProcessName 162 12 1660 5160 0.

2019-10-07 1 min 15 s

[R] format paragraph output

When programming in R, one may print some messages such as usage information. When it comes, the base function strwrap() becomes very convenient. It can wrap strings into equal-length lines and add indentation/prefixes to lines. Here I will demonstrate some features. First, let’s create a sample string by reading the first 5 lines of the ‘THANKS’ document in R documentation. # read 5 lines from the 'THANKS' document x <- paste(readLines(file.

2019-09-26 2 min 13 s

[Bioinfo] Nextflow framework

Recently, I was introduced to an amazing pipeline-writing framework – nextflow. It has the following features: it abstracts a pipeline, which can be written in any language and run on many computing platforms such as Linux Slurm, PBS, AWS, etc. it is composed of many processes, the executing order of which is determined by the dependencies of the input and output channels of each process. the nextflow language is extension of the Groovy programming language, which is a programming language for Java virtual machine.

2019-09-19 39 s

[R] fit polynomial regression in R

In R, we can fit a polynomial model by combinative use two functions poly() and lm(). Setup Let’s start with using the R internal data cars: data(cars) x<-cars$speed # use the speed as predictor y<-cars$dist # use the dist as response Fit the models Now, let’s get the polynomials for the variable x using the degree 3: x1<-poly(x, 3, raw=TRUE) x2<-poly(x, 3, raw=FALSE) # check the resulted polynomials head(x) ## [1] 4 4 7 7 8 9 head(x1) ## 1 2 3 ## [1,] 4 16 64 ## [2,] 4 16 64 ## [3,] 7 49 343 ## [4,] 7 49 343 ## [5,] 8 64 512 ## [6,] 9 81 729 head(x2) ## 1 2 3 ## [1,] -0.

2019-07-12 3 min 19 s

test Rmarkdown

This post is to test whether Rmarkdown can be deployed correctly data(cars) plot(dist ~ speed, data=cars)

2019-07-12 4 s

[Resource] Open courses

With the soaring advance of internet, now we can access courses remotely. Better than that, many courses from the most reputable universities provide free courses online. Start from today, I will update the list of open courses in this post. General Name URL Description MIT OpenCourseWare https://ocw.mit.edu/index.htm the main site of hosting MIT open courses Harvard Online Learning https://online-learning.harvard.edu/ not all courses free Class central https://www.

2019-03-11 20 s