[R] format paragraph output
When programming in R, one may print some messages such as usage information. When it comes, the base function strwrap() becomes very convenient. It can wrap strings into equal-length lines and add indentation/prefixes to lines. Here I will demonstrate some features. First, let’s create a sample string by reading the first 5 lines of the ‘THANKS’ document in R documentation. # read 5 lines from the 'THANKS' document x <- paste(readLines(file.
2019-09-26   schedule 2 min 13 s  
[Bioinfo] Nextflow framework
Recently, I was introduced to an amazing pipeline-writing framework – nextflow. It has the following features: it abstracts a pipeline, which can be written in any language and run on many computing platforms such as Linux Slurm, PBS, AWS, etc. it is composed of many processes, the executing order of which is determined by the dependencies of the input and output channels of each process. the nextflow language is extension of the Groovy programming language, which is a programming language for Java virtual machine.
2019-09-19   schedule 39 s  
[R] fit polynomial regression in R
In R, we can fit a polynomial model by combinative use two functions poly() and lm(). Setup Let’s start with using the R internal data cars: data(cars) x<-cars$speed # use the speed as predictor y<-cars$dist # use the dist as response Fit the models Now, let’s get the polynomials for the variable x using the degree 3: x1<-poly(x, 3, raw=TRUE) x2<-poly(x, 3, raw=FALSE) # check the resulted polynomials head(x) ## [1] 4 4 7 7 8 9 head(x1) ## 1 2 3 ## [1,] 4 16 64 ## [2,] 4 16 64 ## [3,] 7 49 343 ## [4,] 7 49 343 ## [5,] 8 64 512 ## [6,] 9 81 729 head(x2) ## 1 2 3 ## [1,] -0.
2019-07-12   schedule 3 min 19 s  
test Rmarkdown
This post is to test whether Rmarkdown can be deployed correctly data(cars) plot(dist ~ speed, data=cars)
2019-07-12   schedule 4 s  
[Resource] Open courses
With the soaring advance of internet, now we can access courses remotely. Better than that, many courses from the most reputable universities provide free courses online. Start from today, I will update the list of open courses in this post. General Name URL Description MIT OpenCourseWare https://ocw.mit.edu/index.htm the main site of hosting MIT open courses Harvard Online Learning https://online-learning.harvard.edu/ not all courses free Class central https://www.
2019-03-11   schedule 20 s  
[Biology] Data resources
In the big-data era, data are a key to drive innovation. In this page, I will continue to update the biological resources I find in my research, hoping it helps others. Databases Name URL Category Description OncoKB http://oncokb.org Cancer compilation of mutations, drugs, and other tumor data. factorbook http://www.factorbook.org Regulation TF binding and histone marks compiled by ENCODE consortium ClinGen https://www.
2018-11-29   schedule 30 s  
[Linux] Find all filenames pointing to the same file
In Linux/Unix systems, the same file can have multiple copies with different names, termed hardlinks. This provides convenience of modifying all the copies by modifying any copy. Today, I will introduce how to find all the hardlinks for a file. How many copies does a file have? One can find this information by using the following command: 1 2 3 ls -l sample-file.txt # output # -rwxrw-r--. 2 user hacker 3351 Oct 23 11:11 sample-file.
2018-10-25   schedule 1 min 4 s  
[Python]Generators
Definition Generators are special python objects that have the following features: iteratable. so one can use next() or with for loop to get the elements. return elements when asked, one element each time; unlike list, the elements are not generated until asked for. the local state is saved after each call/request, so it can resume when next call arrives. Advantages Because of the above features, generators are more memory efficient than lists, and thus used in handling large datastream.
2018-10-17   schedule 1 min 16 s  
[Linux Tips] get absolute paths
In my research, I frequently fell into occasions when I want to solve some tiny tasks or to solve tasks more efficiently. Facing the situations, the first thing I did was google search. However, from today on, I will record these solutions in my blog: for one thing, I can revisit them when I need them in future (very likely), and for another, others may also benefit from the compiled list of tips.
2018-08-31   schedule 41 s  
UCSC Genome Mappability: a resource for analyzing NGS data
Often, mapping sequenced reads to a reference genome is the first step of analyzing next-generation sequencing data. However, a genome may contain many pieces of similar regions, making the reads derived from these similar regions difficult to map back – having no idea which region they are from. But with the information of similar regions in mind, one may pay attention to such regions and make data analysis clearer. In fact, the UCSC genome browser has provided such resources: Mappability and Uniqueness of genomes.
2018-08-31   schedule 1 min 13 s