瀏覽代碼

Process bib to remove multiple URLs.

BibTeX doesn't seem to like bib entries containing multiple URLs, so
now we preprocess the bib file to keep only the first URL for each
entry.
Ryan C. Thompson 5 年之前
父節點
當前提交
2ec14fc88e
共有 4 個文件被更改,包括 40 次插入33 次删除
  1. 18 2
      Snakefile
  2. 6 15
      code-refs.bib
  3. 15 15
      refs.bib
  4. 1 1
      thesis.lyx

+ 18 - 2
Snakefile

@@ -139,10 +139,13 @@ def lyx_bib_deps(lyxfile):
     with open(lyxfile) as f:
         lyx_text = f.read()
     bib_names = regex.search('bibfiles "(.*?)"', lyx_text).group(1).split(',')
+    # Unfortunately LyX doesn't indicate which bib names refer to
+    # files in the current directory and which don't. Currently that's
+    # not a problem for me since all my refs are in bib files in the
+    # current directory.
     for bn in bib_names:
         bib_path = bn + '.bib'
-        if os.path.exists(bib_path):
-            yield bib_path
+        yield bib_path
 
 def lyx_gfx_deps(lyxfile):
     '''Return an iterator over all graphics files included by a LyX file.'''
@@ -190,6 +193,19 @@ rule lyx_to_pdf:
     output: pdf='{basename,(?!graphics/).*}.pdf'
     shell: '{LYXPATH:q} --export-to pdf4 {output.pdf:q} {input.lyxfile:q}'
 
+rule process_bib:
+    '''Preprocess bib file for LaTeX.
+
+Currently, this just filters out additional URLs. from the url field,
+since the BibTeX setup in LyX can't handle them.'''
+    input: '{basename}.bib'
+    output: '{basename,.*(?<!-PROCESSED)}-PROCESSED.bib'
+    run:
+        with open(input[0]) as infile, open(output[0], 'w') as outfile:
+            for line in infile:
+                line = regex.sub('url = {(.*?) .*},', 'url = {\\1},', line)
+                outfile.write(line)
+
 rule pdf_extract_page:
     '''Extract a single page from a multi-page PDF.'''
     # Input is a PDF whose basename doesn't already have a page number

+ 6 - 15
code-refs.bib

@@ -1,7 +1,7 @@
 %% This BibTeX bibliography file was created using BibDesk.
 %% http://bibdesk.sourceforge.net/
 
-%% Created for Ryan C. Thompson at 2019-08-01 02:17:26 -0700 
+%% Created for Ryan C. Thompson at 2019-08-28 09:54:42 -0700 
 
 
 %% Saved with string encoding Unicode (UTF-8) 
@@ -9,19 +9,15 @@
 
 
 @misc{gh-cd4-csaw,
-	Abstract = {epic is a software package for finding medium to diffusely enriched domains in chip-seq data. It is a fast, parallel and memory-efficient implementation of the incredibly popular SICER algorithm. By running epic on a set of data ("ChIP") files and control ("Input") files, epic is able to quickly differentially enriched regions.
-
-epic is an improvement over the original SICER by being faster, more memory efficient, multicore, and significantly much easier to install and use.},
 	Author = {Ryan C. Thompson},
 	Date-Added = {2019-08-01 02:15:39 -0700},
-	Date-Modified = {2019-08-01 02:15:39 -0700},
+	Date-Modified = {2019-08-28 09:49:36 -0700},
 	Howpublished = {\url{https://github.com/DarwinAwardWinner/CD4-csaw}},
 	Keywords = {chipseq, rnaseq},
 	Month = {nov},
 	Publisher = {GitHub, Inc.},
 	Title = {Reproducible reanalysis of a combined ChIP-Seq \& RNA-Seq data set},
-	Year = {2018},
-	Bdsk-Url-1 = {https://doi.org/10.5281/zenodo.806811}}
+	Year = {2018}}
 
 @manual{greylistchip,
 	Author = {Gord Brown},
@@ -44,19 +40,14 @@ epic is an improvement over the original SICER by being faster, more memory effi
 	Month = {nov},
 	Publisher = {GitHub, Inc.},
 	Title = {epic: diffuse domain ChIP-Seq caller based on SICER},
-	Year = {2018},
-	Bdsk-Url-1 = {https://doi.org/10.5281/zenodo.806811}}
+	Year = {2018}}
 
 @misc{gh-hg38-ref,
-	Abstract = {epic is a software package for finding medium to diffusely enriched domains in chip-seq data. It is a fast, parallel and memory-efficient implementation of the incredibly popular SICER algorithm. By running epic on a set of data ("ChIP") files and control ("Input") files, epic is able to quickly differentially enriched regions.
-
-epic is an improvement over the original SICER by being faster, more memory efficient, multicore, and significantly much easier to install and use.},
 	Author = {Ryan C. Thompson},
 	Date-Added = {2019-08-01 01:44:09 -0700},
-	Date-Modified = {2019-08-01 02:17:22 -0700},
+	Date-Modified = {2019-08-28 09:49:47 -0700},
 	Howpublished = {\url{https://github.com/DarwinAwardWinner/hg38-ref}},
 	Month = {dec},
 	Publisher = {GitHub, Inc.},
 	Title = {Workflow to download/generate various mapping indices for the human hg38 genome},
-	Year = {2016},
-	Bdsk-Url-1 = {https://doi.org/10.5281/zenodo.806811}}
+	Year = {2016}}

File diff suppressed because it is too large
+ 15 - 15
refs.bib


+ 1 - 1
thesis.lyx

@@ -11467,7 +11467,7 @@ Check in-text citation format.
 \begin_inset CommandInset bibtex
 LatexCommand bibtex
 btprint "btPrintCited"
-bibfiles "refs,code-refs"
+bibfiles "code-refs,refs-PROCESSED"
 options "bibtotoc,unsrt"
 
 \end_inset

Some files were not shown because too many files changed in this diff