arXiv:1405.1072v2 [astro-ph.CO] 5 Dec 2014 · 2018. 9. 19. · At time of writing, the full BOSS...

arX

iv:1

405.

1072

v2 [

astr

o-ph

.CO

] 5

Dec

201

4Accepted for publication by ApJPreprint typeset using LATEX style emulateapj v. 5/2/11

IGM CONSTRAINTS FROM THE SDSS-III/BOSS DR9Lyα FOREST TRANSMISSION PROBABILITY DISTRIBUTION FUNCTION

Khee-Gan Lee1,2, Joseph F. Hennawi1, David N. Spergel2, David H. Weinberg3, David W. Hogg4,Matteo Viel5,6, James S. Bolton7, Stephen Bailey8, Matthew M. Pieri9, William Carithers8,

David J. Schlegel8, Britt Lundgren10, Nathalie Palanque-Delabrouille11, Nao Suzuki12,Donald P. Schneider13,14, Christophe Yeche11

Accepted for publication by ApJ

ABSTRACT

The Lyα forest transmission probability distribution function (PDF) is an established probe ofthe intergalactic medium (IGM) astrophysics, especially the temperature-density relationship of theIGM. We measure the transmission PDF from 3393 Baryon Oscillations Spectroscopic Survey (BOSS)quasars from SDSS Data Release 9, and compare with mock spectra that include careful modelingof the noise, continuum, and astrophysical uncertainties. The BOSS transmission PDFs, measuredat 〈z〉 = [2.3, 2.6, 3.0], are compared with PDFs created from mock spectra drawn from a suite ofhydrodynamical simulations that sample the IGM temperature-density relationship, γ, and temper-ature at mean-density, T0, where T (∆) = T0∆

γ−1. We find that a significant population of partialLyman-limit systems with a column-density distribution slope of βpLLS ∼ −2 are required to explainthe data at the low-transmission end of transmission PDF, while uncertainties in the mean Lyα foresttransmission affect the high-transmission end. After modelling the LLSs and marginalizing over mean-transmission uncertainties, we find that γ = 1.6 best describes the data over our entire redshift range,although constraints on T0 are affected by systematic uncertainties. Within our model framework,isothermal or inverted temperature-density relationships (γ ≤ 1) are disfavored at a significance ofover 4σ, although this could be somewhat weakened by cosmological and astrophysical uncertaintiesthat we did not model.Subject headings: intergalactic medium — quasars: emission lines — quasars: absorption lines —

methods: data analysis

1. INTRODUCTION

Remarkably soon after the discovery of the firsthigh-redshift (zqso & 2) quasars (Schmidt 1965),

[email protected] Max Planck Institute for Astronomy, Konigstuhl 17, D-69117

Heidelberg, Germany2 Department of Astrophysical Sciences, Princeton University,

Princeton, New Jersey 08544, USA3 Department of Astronomy and Center for Cosmology and

Astro-Particle Physics, Ohio State University, Columbus, OH43210, USA

4 Center for Cosmology and Particle Physics, New York Uni-versity, 4 Washington Place, Meyer Hall of Physics, New York,NY 10003, USA

5 INAF, Osservatorio Astronomico di Trieste, Via G. B.Tiepolo 11, 34131 Trieste, Italy

6 INFN/National Institute for Nuclear Physics, Via Valerio 2,I-34127 Trieste, Italy

7 School of Physics and Astronomy, University of Nottingham,University Park, Nottingham NG7 2RD, UK

8 E.O. Lawrence Berkeley National Lab, 1 Cyclotron Rd.,Berkeley, CA, 94720, USA

9 Institute of Cosmology & Gravitation, University ofPortsmouth, Dennis Sciama Building, Portsmouth PO1 3FX,UK

10 Department of Astronomy, University of Wisconsin, Madi-son, WI 53706, USA

11 CEA, Centre de Saclay, Irfu/SPP, F-91191 Gif-sur-Yvette,France

12 Kavli Institute for the Physics and Mathematics of the Uni-verse (IPMU), The University of Tokyo, Kashiwano-ha 5-1-5,Kashiwa-shi, Chiba, Japan

13 Department of Astronomy and Astrophysics, The Pennsyl-vania State University, University Park, PA 16802, USA

14 Institute for Gravitation and the Cosmos, The PennsylvaniaState University, University Park, PA 16802, USA

Gunn & Peterson (1965) realized that the amount ofresonant Lyman-α (Lyα) scattering off neutral hydro-gen structures observed in the spectra of these quasarscould be used to constrain the state of the inter-galacticmedium (IGM) at high-redshifts: they deduced that thehydrogen in the inter-galactic medium had to be highlyphoto-ionized (neutral fractions of nHI/nH < 10−4) andhot (temperatures, T > 104 K).Lynds (1971) then discovered that this Lyα ab-

sorption could be separated into discrete absorp-tion lines, i.e. the Lyα “forest”. Over the nexttwo decades, it was recognized that the individ-ual Lyα forest lines have Voigt absorption pro-files corresponding to Doppler-broadened systems withT ∼ 1 − 3 × 104 K (see, e.g., Rauch et al. 1992;Ricotti et al. 2000; Schaye et al. 2000; McDonald et al.2001; Tytler et al. 2004; Lidz et al. 2010; Becker et al.2011) and neutral column densities of N ∼ 1013 −1017cm−2 (Petitjean et al. 1993; Penton et al. 2000;Janknecht et al. 2006; Rudie et al. 2013), and in-creasingly precise measurements of mean Lyα for-est transmission have been carried out (Theuns et al.2002; Bernardi et al. 2003; Faucher-Giguere et al. 2008;Becker et al. 2013). However, the exact physical natureof these absorbers was unclear for many years (see Rauch1998, for a historical review of the field).Beginning in the 1990s, detailed hydrodynamical sim-

ulations of the intergalactic medium led to the currentphysical picture of the Lyα forest arising from baryonsin the IGM which trace fluctuations in the dark mat-

http://arxiv.org/abs/1405.1072v2

mailto:[email protected]

2 Lee et al.

ter field induced by gravitational collapse, in ioniza-tion balance with a uniform ultraviolet ionizing back-ground (see, e.g., Cen et al. 1994; Miralda-Escude et al.1996; Croft et al. 1998; Dave et al. 1999; Theuns et al.1998). A physically-motivated analytic description ofthis picture is the fluctuating Gunn-Peterson approxi-mation (FGPA, Croft et al. 1998; Hui et al. 1997), inwhich the Lyα optical depth, τ , scales with underlyingmatter density, ρ, through a polynomial relationship:

τ ∝ T−0.7

Γ∆2 ∝ T−0.7

0

Γ∆2−0.7(γ−1), (1)

where Γ is the background photoionization rate, and∆ ≡ ρ/〈ρ〉 is the matter density relative to the meandensity of the universe at the given epoch. In the sec-ond proportionality above, we have made the assumptionthat the local temperature of the gas has a polynomialrelationship with the local density,

T = T0∆γ−1, (2)

where T0 is the gas temperature at mean-densityand γ parametrizes the temperature-density relation,which encodes the thermal history of the IGM (e.g.,Hui & Gnedin 1997, Schaye et al. 1999, Ricotti et al.2000, McDonald et al. 2001, Hui & Haiman 2003; seeMeiksin 2009 for a detailed overview on the relevantphysics).Over the the past decade-and-a-half, the 2000-2008

Sloan Digital Sky Survey (SDSS-I and -II, York et al.2000; Stoughton et al. 2002, http://www.sdss.org)spectroscopic data has represented a dramatic improve-ment in the statistical power available to Lyα forest stud-ies: McDonald et al. (2006) measured the 1-dimensionalLyα forest transmission power spectrum from ≈ 3000SDSS quasar sightlines. This measurement was usedto place significant constraints on cosmological parame-ters and large-scale structure (see, e.g., McDonald et al.2005b; Seljak et al. 2005; Viel & Haehnelt 2006).The McDonald et al. (2006) quasar sample, which

in its time represented a ∼ 100 increase in sam-ple size over previous data sets, is superseded by theBaryon Oscillations Sky Survey (BOSS, part of SDSS-III;Eisenstein et al. 2011; Dawson et al. 2013) quasar sur-vey. This spectroscopic survey, which operated betweenfall 2009 and spring 2014, is aimed at taking spectraof ∼ 150, 000 zqso & 2.2 quasars (Dawson et al. 2013)with the goal of constraining dark energy at z > 2 us-ing transverse correlations of Lyα forest absorption (see,e.g., Slosar et al. 2011) to measure the baryon acousticoscillation (BAO) scale15. At time of writing, the fullBOSS survey is complete, with ∼ 170, 000 high-redshiftquasars observed, although this paper is based on theearlier sample of ∼ 50, 000 BOSS quasars from SDSSData Release 9 (DR9 Ahn et al. 2012; Paris et al. 2012;Lee et al. 2013).The quality of the individual BOSS Lyα forest spectra

might appear at first glance inadequate for studying theastrophysics of the IGM, that have to-date been carriedout largely with high-resolution, high-S/N spectra: the

15 There is also a simultaneous effort to observe ∼ 1.5 millionluminous red galaxies, to measure the BAO at z ∼ 0.5. See, e.g.,Anderson et al. (2014).

typical BOSS spectrum has S/N ∼ 2 per pixel16 , sincethe BAO analysis is optimized with large numbers of lowsignal-to-noise-ratio sightlines, densely-sampled on thesky (McDonald & Eisenstein 2007; McQuinn & White2011). It is therefore interesting to ask whether it is pos-sible to model the various instrumental and astrophysicaleffects seen in the BOSS Lyα forest spectra, to sufficientaccuracy level to exploit the unprecedented statisticalpower.In this paper, we will measure the probability distribu-

tion function (PDF) of the Lyα forest transmission, F ≡exp(−τ), from BOSS. This one-point statistic, which wasfirst studied by Jenkins & Ostriker (1991), is sensitive toastrophysical parameters such as the amplitude of matterfluctuations and the thermal history of the IGM. How-ever, the transmission17 PDF is also highly sensitive toeffects such as pixel noise level, resolution of the spec-tra, and systematic uncertainties in the placement of thequasar continuum level, especially in moderate resolu-tion spectra such as SDSS or BOSS. Desjacques et al.(2007) studied the transmission PDF from a sample of∼ 3500 Lyα forest spectra from SDSS Data Release 3(Abazajian et al. 2005). Using mock spectra generatedfrom a log-normal model of the Lyα forest with pa-rameters tuned to reproduce high-resolution, high-S/Nspectra, they fitted for the estimated pipeline noise leveland continuum-fitting errors in the SDSS spectra. Theyconcluded that the noise levels reported by the SDSSpipeline were underestimated by ∼ 10%, consistent withthe findings of McDonald et al. (2006). They also foundthat the quasar continuum-level was systematically lowerby ∼ 10% in comparison with a power-law extrapolatedfrom redwards of the quasar Lyα line, with a RMS vari-ance of ∼ 20%, although certain aspects of their study,e.g., the noise modelling and quasar continuum model,were rather crude.We intend to take an approach distrinct from that of

Desjacques et al. (2007): instead of treating the noiseand continuum as free parameters, we will attempt tomeasure the BOSS Lyα forest transmission PDF usinga rigorous treatment of the noise and continuum-fitting,and then adopt a “forward-modeling” approach of tryingto model the various instrumental effects as accurately aspossible in mock spectra generated from detailed hydro-dynamical simulations. Using the raw individual expo-sures and calibration data from BOSS, we will first im-plement a novel probabilistic method for co-adding theexposures, which will yield more accurate noise estimatesas well as enable self-consistent noise modelling in mockspectra. Similarly, we will use a new method for con-tinuum estimation called mean-flux regulated/principalcomponent analysis (MF-PCA; Lee et al. 2012). Thistechnique provides unprecedented continuum accuracyfor noisy Lyα forest spectra: < 10% RMS errors forS/N ∼ 2 and < 5% RMS errors for S/N & 5 spectra.On the modeling side, we will use the detailed hydrody-

namical IGM simulations of Viel et al. (2013a) as a basis.The mock spectra are then smoothed to BOSS resolution,

16 All spectral signal-to-noise ratios quoted in this paper are per69 km s−1 SDSS/BOSS pixel unless noted otherwise

17 The Lyα forest transmitted flux fraction is sometimes alsoreferred to as ’flux’ in the literature; but we do however use thevariable F to refer to this quantity.

http://www.sdss.org

Lyα Forest Transmission PDF from BOSS 3

have Lyman-limit systems (LLS) and metal contamina-tion added, followed by the introduction of pixel noisebased on our improved noise estimates. We will thenself-consistently introduce continuum errors by applyingour continuum-estimation procedure on the mock spec-tra.With the increase in statistical power from the sheer

number of BOSS spectra, and our improved model-ing of the noise and continuum, we expect to signifi-cantly reduce the errors on the measured transmissionPDF in comparison with Desjacques et al. (2007). Thisshould enable us to place independent constraints on theshape of the underlying transmission PDF, and the ther-mal history of IGM as parametrized by the power-lawtemperature-density relation, γ and T0.The IGM temperature-density relationship is a topic

of recent interest, as Bolton et al. (2008) and Viel et al.(2009) have found evidence of an inverted temperature-density relation, γ < 1, implying that voids are hot-ter than overdensities, the IGM at z ∼ 2 − 3 fromthe transmission PDF from high-resolution, high-S/NLyα forest spectra (Kim et al. 2007). This resultis in contrast with theoretical expectations of γ ≈1.6 (Miralda-Escude & Rees 1994; Hui & Gnedin 1997;Theuns et al. 1998; Hui & Haiman 2003), which arisesfrom the balance between adiabatic cooling in the lower-density IGM and photoheating in the higher-density re-gions. Even inhomogeneous He II reionization, which isexpected to flatten the IGM temperature-density rela-tion (see, e.g., Furlanetto & Oh 2008; Bolton et al. 2009;McQuinn et al. 2009), is insufficient to account for theextremely low values of γ ∼ 0.5 estimated by the afore-mentioned authors (although inversions could occur athigher densites, see, e.g., Meiksin & Tittley 2012).Indeed, earlier papers studying the temperature-

density relationship using either the transmission PDF(McDonald et al. 2001) or by measuring the Doppler pa-rameters and hydrogen column densities of individualforest absorbers (the so-called b − NHI relation, e.g.,Schaye et al. 1999; Ricotti et al. 2000; Rudie et al. 2012)have found no evidence of an inverted γ. In recent years,the decay of blazar gamma rays via plasma instabilities(Broderick et al. 2012, Chang et al. 2012; although seeSironi & Giannios 2014) has been invoked as a possiblemechanism to supply the heat necessary to flatten γ tothe observed levels (Puchwein et al. 2012).It would be desirable to perform an independent

re-analysis of high-resolution data taking into accountcontinuum-fitting bias (Lee 2012), to place these claimson a firmer footing. However, Lee & Spergel (2011) haveargued that the complete SDSS DR7 (Abazajian et al.2009) Lyα forest data set could have sufficient statis-tical power to place interesting constraints on γ, evenassuming continuum-fitting errors at the ∼ 10% RMSlevel. Therefore, with the current BOSS data, we hopeto model noise and resolution, as well as astrophysicalsystematics, at a sufficient precision to place interestingconstraints on the IGM thermal history.This paper is organized as follows: we first give a broad

overview of the BOSS Lyα forest data set, followed by ourmeasurement of the BOSS transmission PDF with de-tailed descriptions of our method of combining multipleraw exposures and continuum estimation. We then dis-cuss how we include various instrumental and astrophys-

ical effects into our modeling of the transmission PDFstarting with hydrodynamical simulations. The modeltransmission PDF is then compared with the observedPDF to obtain constraints on the thermal parametersgoverning the IGM.

2. DATA

2.1. Summary of BOSS

BOSS (Dawson et al. 2013) is part of SDSS-III(Eisenstein et al. 2011; the other surveys are SEGUE-2, MARVELS, and APOGEE). The primary goal of thesurvey is to carry out precision baryon acoustic oscilla-tions at z ∼ 0.5 and z ∼ 2.5, from the luminous redgalaxy distribution and Lyα forest absorption field, re-spectively (see, e.g., Anderson et al. 2014; Busca et al.2013; Slosar et al. 2013). Its eventual goal is to ob-tain spectra of ∼ 1.5 million luminous red galaxies and∼ 170, 000 z > 2.15 quasars over 4.5 years of operation.BOSS is conducted on upgraded versions of the twin

SDSS spectrographs (Smee et al. 2013) mounted on the2.5m Sloan telescope (Gunn et al. 2006) at Apache PointObservatory, New Mexico. One thousand optical fibersmounted on a plug-plate at the focal plane (spanninga 3 field of view) feed the incoming flux to the twoidentical spectrographs, of which 160-200 fibers per plateare allocated to quasar targets (see Ross et al. 2012;Bovy et al. 2011, for a detailed description of the quasartarget selection). Both spectrographs split the light intoa blue and red camera that cover 3610− 10140 A, withthe dichroic overlap region occurring at around 6000 A.The resolving power R ≡ λ/∆λ ranges from 1300 at theblue end to the 2600 at the red end.Each plate is observed for sufficiently long to achieve

the S/N requirements set by the survey goals; typi-cally, 5 individual exposures of 15 minutes are taken.Thedata are processed, calibrated, and combined into co-added spectra by the “idlspec2d” pipeline, followed bya pipeline which operates on the 1D spectra to classifyobjects and assign redshifts (Bolton et al. 2012). How-ever, as described later in this paper, we will generateour own co-added spectra from the individual exposuresand other intermediate data products.

2.2. Data Cuts

In this paper we use data from the publicly-availableSDSS Data Release 9 (DR9 Ahn et al. 2012). This in-cludes 87,822 quasars at all redshifts, that have been con-firmed by visual inspection as described in Paris et al.(2012). In Lee et al. (2013), we have defined a furthersubset of 54,468 quasars with zqso ≥ 2.15 that are suit-able for Lyα forest analysis, and have provided in indi-vidual FITS files for each quasar various products suchas sky masks, masks for damped Lyα absorbers (DLAs),noise corrections, and continua; these are designed toameliorate systematics in the BOSS spectra and aid inLyα forest analysis (see Table 1 in Lee et al. 2013 for afull listing). While we use this Lee et al. (2013) catalogas a starting point, in this paper we will generate ourown custom co-added spectra and noise estimates.The typical signal-to-noise ratio of the BOSS Lyα for-

est quasars is low: 〈S/N〉 ≈ 2 per pixel within theLyα forest; this criterion is driven by a strategy toensure a large number of sightlines over a large area

4 Lee et al.

in order to optimize the 3D Lyα forest BAO analy-sis. (McDonald & Eisenstein 2007; McQuinn & White2011), rather than increasing the S/N in individual spec-tra. However, for our analysis we wish to select a subsetof BOSS Lyα forest sightlines with reasonably high S/Nin order to reduce the sensitivity of our PDF measure-ment to inaccuracies in our modeling of the noise andcontinuum of the BOSS spectra. We therefore make acut on S/N, including only sightlines that have a me-dian 〈S/N〉 ≥ 6 per pixel within the Lyα forest18, de-fined with respect to the pipeline noise estimate (seeLee et al. 2013) — this selects only ∼ 10% of the spec-tra with the highest S/N. The 1041− 1185 A Lyα forestregion of each quasar must also include at least 30 pixels(∆v = 2071 km s−1) within one of our absorption red-shift bins of 〈z〉 = 2.3, 〈z〉 = 2.6, and 〈z〉 = 3.0, with binwidths of ∆z = 0.3 (see § 3.3).We discard spectra with identified DLAs in the sight-

line, as listed in the ‘DLA Concordance Catalog’ usedin the Lee et al. (2013) sample. This DLA catalog (W.Carithers 2014, in prep.) includes objects with columndensities NHI > 1020 cm−2; however, the completenessof this catalog is uncertain below NHI = 1020.3 cm−2.We therefore discard only sightlines containing DLAswith NHI ≥ 1020.3 cm−2, and take into account lowercolumn-density absorbers in our subsequent modellingof mock spectra. At the relatively high S/N that wewill work with (see below), the detection efficiency ofDLAs is essentially 100% (see, e.g., Prochaska et al.2005; Noterdaeme et al. 2012) and thus we expect ourrejection of NHI ≥ 1020.3 cm−2 DLAs to be quite thor-ough.Measurements of the Lyα forest transmission PDF are

known to be sensitive to the continuum estimate (Lee2012), but in this paper we use an automated continuum-fitter, MF-PCA (Lee 2012), that is less susceptible to bi-ases introduced by manual continuum estimation. More-over, unlike the laborious process of manually-fitting con-tinua on high-resolution spectra, the automated contin-uum estimation can be used to explore various biases incontinuum estimation. For this purpose, we will use thesame MF-PCA continuum estimation used in Lee et al.(2013), albeit with minor modifications as described in§ 3.2. We select only quasars that appear to be well-described by the continuum basis templates, based on thegoodness-of-fit to the quasar spectrum redwards of Lyα.This is flagged by the variable CONT_FLAG= 1 as listedin the Lee et al. (2013) catalog (see Table 3 in that pa-per). Broad Absorption Line (BAL) quasars, which aredifficult to estimate continua due to broad intrinsic ab-sorption troughs, have already been discarded from theLee et al. (2013) sample.Another consideration is that the shape of the trans-

mission PDF is affected by the resolution of the spec-trum, especially since the BOSS spectrographs do notresolve the Lyα forest. The exact spectral resolutionof a BOSS spectrum at a given wavelength varies asa function of both observing conditions and row posi-tion on the BOSS CCDs. The BOSS pipeline reportsthe wavelength dispersion at each pixel, σdisp, in unitsof the co-added wavelength pixel size (binned such that

18 Defined as the 1041 − 1185 A region in the quasar restframe

Fig. 1.— Wavelength dispersions, σdisp, for 236 BOSS quasarspectra randomly-selected from the 〈z〉 = 2.3, 6 < S/N < 8 PDFbin. The ordinate axis on the right shows the equivalent spectralresolution, R ≡ λ/∆λ. The dashed-red lines are objects that havebeen discarded from the analysis on account of being outliers inspectral dispersion.

ln(10) ∆(λ)/λ = 10−4). This is related to the re-solving power by R ≈ (2.35 × 1 × 10−4 ln 10 σdisp)

−1.Palanque-Delabrouille et al. (2013) have recently found,using their own analysis of the width of the arc-lamp linesand bright sky emission lines, that the spectral disper-sion reported by the pipeline had a bias that dependedon the CCD row and increased with wavelength, up to10% at λ ≈ 6000 A. We will correct for this bias whencreating mock spectra to compare with the data, as de-scribed in § 4. Figure 1 shows the (uncorrected) pixeldispersions from 236 BOSS quasars from the 〈z〉 = 2.3,S/N = 6 − 8 bin, as a function of wavelength at theblue end (λ = 3700 − 4200A) of the spectrograph. Atfixed wavelength, there are outliers that contribute to thelarge spread in σdisp, e.g., ranging from σdisp ≈ 0.9− 1.8

at 3700 A. We therefore discard spectra with outlyingvalues of σdisp based on the following criterion: we firstrank-order the spectra based on their σdisp value eval-uated at the central wavelength of each PDF bin (i.e.λ = [4012, 4377, 4863] A at 〈z〉 = [2.3, 2.6, 3.0]), and thendiscarded spectra below the 5th percentile and above the90th percentile. This is illustrated by the red-dashedlines in Figure 1.Finally, since our noise estimation procedure uses the

individual BOSS exposures, we discard objects that haveless than three individual exposures available.Our final data set comprises 3373 unique quasars with

redshifts ranging from zqso = 2.255 to zqso = 3.811, anda median S/N of S/N = 8.08 per pixel. This data set rep-resents only a small subsample of the BOSS DR9 quasarspectra, but is over two orders-of-magnitude larger thanhigh-resolution quasar samples previously used for trans-mission PDF analysis. Table 1 summarizes our data sam-ple, and the statistics of the redshifts and S/N bins forwhich we measure the transmission PDF. Figure 2 showshistograms of the pixels used in our analysis, as a func-tion of absorption redshift.

3. MEASURING THE TRANSMISSION PDF FROM BOSS


TABLE 1Binning of BOSS Lyα Forest transmission PDFs

Lyα Forest S/Na Nspecb Npix

c ∆vd ∆ze ∆Xf

Redshift (per pixel) ( km s−1)

2.15 < z < 2.456-8 1109 288442 1.99× 107 219 7048-10 501 129141 8.90× 106 97.9 315> 10 561 146478 1.01× 107 111 357

2.45 < z < 2.756-8 1004 229898 1.59× 107 191 6468-10 490 107001 7.38× 106 88.6 300> 10 604 140843 9.71× 106 117 396

2.85 < z < 3.156-8 511 108443 7.48× 106 99.7 3588-10 326 72448 5.00× 106 66.7 239> 10 341 74284 5.12× 106 68.3 245

a Median S/N within Lyα forest.b Number of contributing spectra.c Number of ∆v = 69 km s−1 pixels.d Velocity path length.e Redshift path length.f Absorption distance, where dX/dz = (1+z)2(ΩM (1+z)3+ΩΛ)

−1/2. For this conversion, we assume ΩM = 0.3and ΩΛ = 0.7.

Fig. 2.— Pixel distribution of Lyα absorber redshifts in the BOSSLyα forest sample used in this paper, shown in bin sizes of ∆z =0.05. The different colors and line-styles denote the three redshiftbins used in this paper. We have chosen these redshift bins — withthe gap at 2.75 < z < 2.85 — to match the simulation redshifts(§ 4.1).

In this section, we will measure the Lyα forest trans-mission PDF from BOSS. In principle, the transmis-sion PDF is simply the histogram of the transmittedflux in the Lyα forest after dividing by the quasar con-tinuum. However, with the comparatively noisy BOSSdata we need to ensure an accurate estimate of the pixelnoise. We will therefore first describe a new probabilis-tic method for co-adding the individual BOSS exposuresthat will enable us to have an accurate noise estimate.We will also describe the continuum-estimation methodwith which we normalize the forest transmission.

3.1. Co-addition of Multiple Exposures and NoiseEstimation

Since we intend to model BOSS spectra with mod-est S/N, we need an accurate estimate of the pixel noisethat also allows us to separate out the contributions fromPoisson noise due to the background and sky as well as

read noise from the the detector. In this subsection, wewill construct an accurate probabilistic model of the fluxand noise of the BOSS spectrograph, based on the indi-vidual exposure data that BOSS delivers.The basic BOSS spectral data consists of a spectrum

of each raw exposure, fλi (inclusive of noise), an estimateof the sky sλi, and a calibration vector Sλi, where i in-dicates the exposure of the nexp exposures taken19. Thequantity sλi is the actual sky model that was subtractedfrom the fiber spectra in the extraction. The calibrationvector is defined as Sλi ≡ fλi/fNi, with fNi being theflux of exposure i in units of photoelectrons. The idl-spec2d pipeline then estimates the co-added spectrum ofthe true object flux, Fλ, from the raw individual expo-sures, sky estimates, and calibration vectors.The BOSS data reduction pipeline also deliv-

ers noise estimates in the form of variance vec-tors, which are however known to be inaccurate(McDonald et al. 2006; Desjacques et al. 2007; Lee et al.2013; Palanque-Delabrouille et al. 2013).To quantify the fidelity of the BOSS noise estimate,

we used the so-called ‘side-band’ method describedin Lee et al. (2014a) and Palanque-Delabrouille et al.(2013), which uses the variance in flat, absorption-free,regions of the quasar spectra to quantify the fidelityof the noise estimate. First, we randomly selected10,000 BOSS quasars (omitting BAL quasars) from theParis et al. (2012) catalog in the redshift range 1.4 ≤zqso < 3.4, evenly distributed into 20 redshift bins ofwidth ∆zqso = 0.1 (i.e., 500 objects per bin). We then

consider the flat 1460 A < λrest < 1510 A spectral re-gion in the quasar restframe, which is dominated by thesmooth power-law continuum and relatively unaffectedby broad emission lines (e.g., Vanden Berk et al. 2001;Suzuki 2006) or absorption lines. The pixel variance inthis flat portion of the spectrum should therefore be dom-

19 Typically there are nexp = 5 exposures of 15 minutes each,although this can vary due to the requirements to achieve a given(S/N)2 over each individual plug-plate, as determined by the over-all BOSS survey strategy (see Dawson et al. 2013).

6 Lee et al.

Fig. 3.— A quantitative test of the noise estimation fidelity inthe spectra. Each point shows the ratio of the pixel variance di-vided by the estimated noise variance, averaged over the restframe1460 A < λrest < 1510 A flat spectral region of 500 BOSS quasarswithin redshift bins of ∆zqso = 0.1 and plotted as a function of thecorresponding observed wavelength of the flat spectral region. Ifthere is no bias in the noise estimation, this ratio should be unity.The black asterisks show this quantity estimated using the BOSSpipeline co-added spectra and noise estimates, while the red trian-gles show the results from the MCMC co-addition and noise esti-mation procedure described in § 3.1. The MCMC method clearlyprovides a better noise estimation than the BOSS pipeline.

inated by spectral noise, allowing us to examine whetherthe noise estimate provided by the pipeline is accurate.We then evaluate the ratio of, σside, the pixel flux RMSin the restframe 1460 A < λrest < 1510 A region dividedby the average pipeline noise estimate, σλ:

⟨

σside

σλ

⟩

=

[∑

f2λ − f2

λ

]1/2

∑

σλ, (3)

where the summations and average flux is evaluated inthe quasar restframe 1460 A < λrest < 1510 A.In Figure 3, this quantity is averaged over the 500

individual quasars per redshift bin and plotted as afunction of the observed wavelength corresponding toλ = (1 + 〈zqso〉)1485 A. With a perfect noise estimate,〈σside/σλ〉 should be unity at all wavelengths, but we seethat the BOSS pipeline underestimates the true noise inthe spectra at λ . 5000 A, by up to ∼ 15% at the blueend of the spectra, with an overall tilt that changes overto an overestimate at λ & 4500 A. Lee et al. (2013) andPalanque-Delabrouille et al. (2013) provide a set of cor-rection vectors that can be applied to the pipeline noiseestimates to bring the latter to within several percent ofthe true noise level across the wavelength coverage of theblue spectrograph.Unfortunately, these noise corrections are inadequate

for our purposes, since we want to generate realistic mockspectra that have different realizations of the Lyα foresttransmission field from the actual spectra, i.e., a differentFλ. We therefore require a method that not only accu-rately estimates the noise in a given BOSS spectrum, butalso separates out the photon-counting and CCD termsin the variance, that results from applying the Horne(1986) optimal spectral extraction algorithm:

σ2λ = Sλ (Fλ + sλ) + S2

λσ2RN, (4)

where σRN is the CCD read-noise.To resolve this issue, we apply our own novel statis-

tical method to the individual BOSS exposures to gen-erate co-added spectra while simultaneously estimatingthe corresponding noise parameters for each individualspectrum. This procedure, which uses a Gibbs-sampledMarkov-Chain Monte Carlo (MCMC) algorithm, is de-scribed in detail in the Appendix. Initially, we attemptedto model the noise with just a single constant noise pa-rameter which rescales the read-noise term of Equation 4,but this was found to be inadequate. This is likely be-cause an optimal extraction algorithm weights by theproduct of the S/N and object profile, causing the cor-responding variance to have a non-linear dependence onthe flux and sky level. Furthermore, systematic errorsin the reduction, sky-subtraction and calibration will re-sult in additional noise contributions which could dependon sky level, object flux, or wavelength, hence deviatingfrom this simple model.After considerable trial-and-error to find a model that

best minimizes the bias illustrated in Figure 3, we settledon the form:

σ2λi = A1Sλi (Fλ + sλi

) +A2S2λiσ

2RN,effσdisp(λ) (5)

whereSλi = Sλi (1− exp(−A3λ+A4)) , (6)

where the Aj are free parameters in our noise model,while the σdisp(λ) factor in the 2nd term (the pixeldispersion) provides a rough approximation for thewavelength-dependence of the spot-size (i.e. the size ofthe raw CCD image in the spatial direction). Meanwhile,σdisp = 12 is the average CCD read-noise per wavelengthbin in the BOSS spectra (D.J. Schlegel et al., in prepa-ration). The quantities sλ,i, Sλ,i, and σdisp(λ) (skyflux, calibration vector, and dispersion, respectively) aretaken directly from the BOSS pipeline.In addition, we assume that the pixel noise can be mod-

eled as a Gaussian distribution with a variance given byEquation 5. The first, photon counting, term in the equa-tion should formally be modeled as a Poisson distribu-tion, but since the BOSS spectrograph always receives& 30 − 40 counts even at the blue end of the spectro-graph where the counts are the lowest, it is reasonableto use the Gaussian approximation because even in thelimit of low S/N (i.e. when the spectrum is dominated bythe sky flux), the moderate resolution ensures that thereare at least several dozen sky photons per pixel in eachexposure.For each BOSS spectrum, we use the MCMC proce-

dure described in the Appendix to combine the multipleexposures while simultaneously estimating the noise pa-rameters Aj and true observed spectrum, Fλ. With theoptimal estimates of Aj and Fλ for a given spectrum, theestimated noise variance is then simply Equation 5.An important advantage of the form in Equation 5 is

that the object photon noise ∝ Fλ is explicitly sepa-rated out. This facilitates the construction of a mockspectrum with the same noise characteristics as a truespectrum, but with a different spectral flux. For example,a mock spectrum of the Lyα forest will have a very dif-ferent transmission field than the original data, and sothe variance due to object photon counting noise can beadded appropriately, in addition to contributions from


the known sky, and the read noise term (Equation 5).Our empirical determination of the parameters goveringthis noise model for each individual spectrum form a cru-cial ingredient in our forward model, which we will de-scribe in § 4.Our MCMC procedure works for spectra from a sin-

gle camera, either red or blue; we have not yet general-ized it to combine blue and red spectra of each object.However, the spectral range of the blue camera alone(≈ 3600− 6400A) covers the Lyα forest up to z ∼ 5, i.e.,most practical redshifts for Lyα forest analysis. For thepurposes of this paper, we restrict ourselves to spectrafrom the blue camera alone.In Figures 4 and 5, we show examples of co-added

BOSS quasar spectra, using both the MCMC procedureand the standard BOSS pipeline. In the upper panels,the MCMC co-adds are not noticeably different from theBOSS pipeline, although the numerical values are differ-ent. In the lower panels, we show the estimated noisefrom both methods — the differences are larger than inthe fluxes but still difficult to distinguish by eye.We therefore return to the statistical analysis by calcu-

lating 〈σside/σλ〉, the ratio of the pixel variance againstthe estimated noise from the flat 1460 A < λrest < 1510 Aregion of BOSS quasars; this ratio, computed for ourMCMC coadds, is plotted in Figure 3. With these newco-adds, we see that this ratio is within roughly ±3% ofunity across the entire λ ∼ 3800 − 5000 A wavelengthrange relevant to our subsequent analysis, with an over-all bias of 1% (i.e. the noise is still underestimated bythis level). Crucially, we have removed the strong wave-length dependence of 〈σside/σλ〉 that was present in thestandard pipeline, and we suspect most of the scatterabout unity is caused by the limited number of quasars(500 per bin) available for this estimate, which will bemitigated by the larger number of quasars spectra avail-able in the subsequent BOSS data releases. In principle,we could correct the remaining 1% noise bias, but sinceour selected spectra have S/N > 6, this remaining 1%noise bias would smooth the forest transmission PDF byan amount roughly 1/25 of the average PDF bin width(∆F = 0.05). As we shall see, there are other systematicuncertainties in our modeling that have much larger ef-fects than this, therefore we regard our noise estimates asadequate for the subsequent transmission PDF analysis,without requiring any further correction.

3.2. Mean-Flux Regulated Continuum Estimation

In order to obtain the transmitted flux F of the Lyαforest20 we first need to divide the observed flux, Fλ,by an estimate for the quasar continuum, c. We usethe version of mean-flux regulated/principal componentanalysis (MF-PCA) continuum fitting (Lee et al. 2012)described in Lee et al. (2013). Initially, PCA fitting with8 eigenvectors is performed on each quasar spectrum red-wards of the Lyα line (λrest = 1216 − 1600A) in or-der to obtain a prediction for the continuum shape inthe λrest < 1216A Lyα forest region (e.g., Suzuki et al.2005). The slope and amplitude of this initial continuumestimate is then corrected to agree with the Lyα forest

20 Note that the ideal/model observed flux described in the noisemodelling section, Fλ, and the Lyα forest transmission F , are com-pletely different quantities.

mean transmission, 〈F 〉cont(z), at the corresponding ab-sorber redshifts, using a linear correction function.The only difference in our continuum-fitting with that

in Lee et al. (2013) is that here we use the latest mean-flux measurements of Becker et al. (2013) to constrainour continua. Their final result yielded the power-lawredshift evolution of the effective optical depth in theunshielded Lyα forest, defined in their paper NHI ≤1017.2 cm−2 (although they only removed contributionsfrom NHI ≥ 1019 cm−2 absorbers). This is given by

τLyα,B13(z) ≡ − ln(〈F 〉(z)) = τ0

(

1 + z

1 + z0

)β

+ C, (7)

with best-fit values of [τ0, β, C] = [0.751, 2.90,−0.132] atz0 = 3.5.However, the actual raw measurement made by

Becker et al. (2013) is the effective total absorptionwithin the Lyα forest region of their quasars, which alsocontain contributions from metals and optically-thicksystems:

τeff(z) ≡ τLyα,B13(z) + τmetals + τLLS(z), (8)

where τmetals and τLLS(z) denote the IGM optical depthcontributions from metals and Lyman-limit systems, re-spectively. For the purposes of our continuum-fitting,the quantity we require is τeff(z), since the τmetals andτLLS(z) contributions are also present in our BOSS spec-tra. Becker et al. (2013) did not publish their raw τeff(z),therefore we must now ‘uncorrect’ the metal and LLScontributions from the published τLyα,B13(z). The dis-cussion below therefore attempts to retrace their foot-steps and does not necessarily reflect our own beliefs re-garding the actual level of these contributions.We find τmetals = 0.02525 by simply averaging over

the Schaye et al. (2003) metal correction tabulated byFaucher-Giguere et al. (2008) (i.e., the 2.2 ≤ z ≤ 2.5 val-ues in ∆z = 0.1 bins from their Table 4), that were usedby Becker et al. (2013) to normalize their relative mean-flux measurements. Note that there is no redshift de-pendence on τmetals in this context, because Becker et al.(2013) argued that the metal contribution does not varysignificantly over their redshift range. Whether or notthis is really true is unimportant to us at the moment,since we are merely ‘uncorrecting’ their measurement.The LLS contribution to the optical depth is re-

introduced by integrating over f(NHI, b, z), the column-density distribution of neutral hydrogen absorbers:

τLLS(z)≈1 + z

λLyα

∫ Nmax

Nmin

dNHI

∫

db

× f(NHI, b, z)W0(NHI, b), (9)

where b is the Doppler parameter and W0(NHI, b) is therest-frame equivalent width (we use the analytic approx-imation given by Draine 2011, valid in the saturatedregime).Following Becker et al. (2013), we adopted a fixed

value of b = 20 km s−1 and assumed that f(NHI, z) =f(NHI)dn/dz, where f(NHI) is given by the z =3.7 broken power-law column density distribution ofProchaska et al. (2010) and dn/dz ∝ (1 + z)2.Becker et al. (2013) had corrected for super-LLSs andDLAs in the column-density range [Nmin, Nmax] =

8 Lee et al.

Fig. 4.— Examples of co-added BOSS spectra from the MCMC procedure described in § 3.1 (red), and from the BOSS pipeline (black) areshown in the upper panels, in the restframe interval 1035− 1260 A. The corresponding pixel noise estimates are shown in the upper panels.The blue line shows the MF-PCA continuum used to extract the Lyα forest transmitted flux, while the vertical dotted lines delineate the1041−1185A restframe interval which we define as the Lyα forest. The continuum discontinuity at λrest = 1185A is where we have appliedthe ‘mean-flux regulation’ correction to the Lyα forest. In the top figure, masked pixels have had their flux and noise set to zero. Thesignal-to-noise ratios for the two spectra are S/N ≈ 11 (top) and S/N ≈ 6 (bottom) within the Lyα forest.

[1019 cm−2, 1022 cm−2], but as discussed above wehave discarded all sightlines that include NHI ≥1020.3 cm−2 DLAs, therefore we reintroduce the opticaldepth contribution for super-LLSs, i.e., [Nmin, Nmax] =[1019 cm−2, 1020.3 cm−2]. We find τLLS(z) = 0.0022 ×[(1+z)/3]3. This is a small correction, giving rise to onlya 0.5% change in 〈F 〉 at z = 3.0.This estimate of the raw absorption, 〈F 〉eff(z) =

exp[−τeff(z)], is now the constraint used to fit the con-tinua of the BOSS quasars, i.e. we set 〈F 〉cont =〈F 〉eff(z). Note that in our subsequent modelling of thedata, we will use the same 〈F 〉cont(z) to fit the mockspectra to ensure an equal treatment between data andmocks. Since 〈F 〉cont(z) includes a contribution fromNHI < 1020.3 cm−2 optically-thick systems, our mockspectra will need to account for these systems as we shalldescribe in §4.2.

The MF-PCA technique requires spectral coverage inthe quasar restframe interval 1000 − 1600A. However,as noted in the previous section, we work with co-addedBOSS spectra from only the blue cameras covering λ .

6400A; this covers the full 1000−1600A interval requiredfor the PCA fitting only for z . 3 quasars. However, thedifferences in the fluxes between our MCMC co-adds andthe BOSS pipeline co-adds are relatively small, and wedo not expect the relative shape of the quasar spectrumto vary significantly. We can thus carry out PCA fit-ting on the BOSS pipeline co-adds, which cover the fullobserved range (3700 − 10000A), to predict the overallquasar continuum shape. This initial prediction is thenused to perform mean-flux regulation using the MCMCco-adds and noise estimates, to fine-tune the amplitudeof the continuum fits.The observed flux, fλ, is divided by the continuum esti-


Fig. 5.— Same as Figure 4, but the 1050 A < λrest < 1090 A restframe region is expanded to better illustrate the differences between theMCMC and pipeline co-added spectra.

mate, c, to derive the Lyα forest transmission, F = fλ/c.For each quasar, we define the Lyα forest as the restwavelength interval 1041−1185A. This wavelength rangeconservatively avoids the quasar’s Lyβ/O VI emissionline blend by ∆v ∼ 3000 km s−1 on the blue end, aswell as the proximity zone close to the quasar redshift bystaying ∆v ∼ 10, 000 km s−1 from the nominal quasarsystemic redshift. We are now in a position to measurethe transmission PDF, which is simply the histogram ofpixel transmissions F ≡ exp(−τ).

3.3. Observed transmission PDF from BOSS

Since the Lyα forest evolves as a function of redshift,we measure the BOSS Lyα forest transmission PDF inthree bins with mean redshifts of 〈z〉 = 2.3, 〈z〉 = 2.6,and 〈z〉 = 3.0, and bin sizes of ∆z = 0.3. These redshiftsbins were chosen to match the simulations outputs (§ 4.1)that we will later use to make mock spectra to comparewith the observed PDF; this choice of binning leads to thegap at 2.75 < z < 2.85 as seen in Figure 2. In this paper,we restrict ourselves to z . 3 since the primary purpose

is to develop the machinery to model the BOSS spectra.In subsequent papers, we will apply these techniques toanalyze the transmission PDF in the full 2 . z . 4range using the larger samples of subsequent BOSS datareleases (DR10, Ahn et al. 2014).Another consideration is that the transmission PDF is

strongly affected by the noise in the data. While we willmodel this effect in detail (§ 4), there is a large distribu-tion of S/N within our subsample ranging from S/N = 6per pixel to S/N ∼ 20 per pixel. We therefore furtherdivide the sample into three bins depending on the me-dian S/N per pixel within the Lyα forest: 6 < S/N < 8,8 < S/N < 10, S/N > 10. The consistency of our resultsacross the S/N bins will act as an important check forthe robustness of our noise model (§ 3.1).We now have nine redshift and S/N bins in which we

evaluate the transmission PDF from BOSS; the samplesizes are summarized in Table 1. For each bin, we haveselected quasars that have at least 30 Lyα forest pixelswithin the required redshift range, and which occupy thequasar restframe interval 1041 − 1185A. The co-added

10 Lee et al.

Fig. 6.— Lyα forest transmission probability distribution functions, p(F ), measured from different subsamples of our BOSS sample, atvarious redshift (with ∆z = 0.3) and S/N. Both the upper- and lower-panels show the PDF, but with linear and logarithmic ordinate axes,respectively. The different colors and line-styles denote our different S/N subsamples at each redshift. The error bars are estimated frombootstrap resampling over ∆v = 2 × 104 km s−1 segments from the contributing spectra. Table 1 summarizes the number of spectra andpixels which contribute to each bin.

spectrum is divided with its MF-PCA continuum esti-mate (described in the previous section) to obtain thetransmitted flux, F , in the desired pixels. We then com-pute the transmission PDF from these pixels.Physically, the possible values of the Lyα forest trans-

mission range from F = 0 (full absorption) to F = 1(no absorption). However, the noise in the BOSS Lyαforest pixels, as well as continuum fitting errors, lead topixels with F < 0 and F > 1. We therefore measure thetransmission PDF in the range −0.2 < F < 1.5, in 35bins with width ∆(F ) = 0.05, and normalized such thatthe area under the curve is unity. The statistical errorson the transmission PDF are estimated by the followingmethod: we concatenate all the individual Lyα forestsegments that contribute to each PDF, and then carryout bootstrap resampling over ∆v = 2× 104 km s−1 seg-ments with 200 iterations. This choice of ∆v correspondsto ∼ 250− 300 A in the observed frame at z ∼ 2 − 3 —according to Rollinde et al. (2013), this choice of ∆v andnumber of iterations should be sufficient for the errors toconverge (see also Appendix B in McDonald et al. 2000).In Figure 6, we show the Lyα forest transmission PDF

measured from the various redshift- and S/N subsamplesin our BOSS sample. At fixed redshift, the PDFs fromthe lower S/N data have a broader shape as expectedfrom increased noise variance. With increasing redshift,there are more absorbed pixels, causing the transmissionPDFs to shift towards lower F values. As discussed pre-viously, there is a significant portion of F > 1 pixels dueto a combination of pixel noise and continuum errors,with a greater proportion of F > 1 pixels in the lower-S/N subsamples as expected. Unlike the high-resolutiontransmission PDF, at 〈z〉 . 3 there are few pixels thatreach F = 0. This effect is due to the resolution of theBOSS spectrograph, which smooths over the observedLyα forest such that even saturated Lyα forest absorberswith NHI & 1014 − 1016 cm−2 rarely reach transmissionvalues of F . 0.3. The pixels with F . 0.3 are usu-ally contributed either by blends of absorbers or opticallythick LLSs (see also Pieri et al. 2014).An advantage of our large sample size is that also

able to directly estimate the error covariances, Cboot, via

bootstrap resampling— an example is shown in Figure 7.In contrast to the Lyα forest transmission PDF fromhigh-resolution data which have significant off-diagonalcovariances (Bolton et al. 2008), the error covariancefrom the BOSS transmission PDF is nearly diagonal withjust some small correlations between neighboring bins,although we also see some anti-correlation between trans-mission bins at F ∼ 0.8 and F ∼ 1.It is interesting to compare the transmission PDF from

our data with that measured by Desjacques et al. (2007)from SDSS DR3. This comparison is shown in Figure 8,in which the transmission PDFs calculated from SDSSDR3 Lyα forest spectra with S/N > 4 (kindly providedby Dr. V. Desjacques) are shown for two redshift bins,juxtaposed with the BOSS transmission PDFs calculatedfrom spectra with the same redshift and S/N cuts.While there is some resemblance between the two

PDFs, the most immediate difference is that theDesjacques et al. (2007) PDFs are shifted to lower trans-mission values, i.e., the mean transmission, 〈F 〉, isconsiderably smaller than that from our BOSS data:〈F 〉(z = 2.4) = 0.73 and 〈F 〉(z = 3.0) = 0.64 from theirmeasurement, whereas the BOSS PDFs have 〈F 〉(z =2.4) = 0.80 and 〈F 〉(z = 3.0) = 0.70. This differencearises because the Desjacques et al. (2007) used a power-law continuum (albeit with corrections for the weak emis-sion lines in the quasar continuum) extrapolated fromλrest > 1216A in the quasar restframe; this does nottake into account the power-law break that appears tooccur in low-redshift quasar spectra at λrest ≈ 1200A(Telfer et al. 2002; Suzuki 2006). Later in their paper,Desjacques et al. (2007) indeed conclude that this mustbe the case in order to be consistent with other 〈F 〉(z)measurements. Our continua, in contrast, have been con-strained to match existing measurements of 〈F 〉(z), forwhich there is good agreement between different authorsat z . 3 (e.g., Faucher-Giguere et al. 2008; Becker et al.2013).Another point of interest in Figure 8 is that the er-

ror bars of the BOSS sample are considerably smallerthan those of the earlier measurement. This differenceis largely due to the significantly larger sample size of


Fig. 7.— (Top) 2D density plot of the error covariance matrix forthe Lyα forest transmission PDF from the 〈z〉 = 2.6, S/N=8− 10BOSS subsample as a function of transmission bins, along with(bottom) the corresponding correlation function. The covariancematrix was estimated through bootstrap resampling, and the val-ues been multiplied by 104 for clarity. The covariances are largelydiagonal, except for some cross-correlations between neighboringbins.

BOSS. The proportion of pixels with F . 0 appears tobe smaller in the BOSS PDFs compared with the olderdata set, but this is because Desjacques et al. (2007) didnot remove DLAs from their data.We next describe the creation of mock Lyα absorption

spectra designed to match the properties of the BOSSdata.

4. MODELING OF THE BOSS TRANSMISSION PDF

In this section, we will describe simulated Lyα forestmock spectra designed, through a ‘forward-modelling’process, to have the same characteristics as the BOSSspectra, for comparison with the observed transmis-sion PDFs described in the previous section. For eachBOSS spectrum which had contributed to our trans-mission PDFs in the previous section, we will take the

Fig. 8.— A comparison between the Lyα forest transmissionPDFs measured from our BOSS DR9 sample (black solid lines), andthe SDSS DR3 sample from Desjacques et al. (2007) (red dashed-lines). Only sightlines with S/N > 4 were used in evaluating thesePDFs. The lower average transmission of the DR3 PDFs is be-cause Desjacques et al. (2007) had directly extrapolated a power-law from λrest > 1216A for continuum estimates, which does nottake into account a flattening of the quasar continuum that oc-curs at λrest ∼ 1200A; our BOSS spectra, in contrast, have beennormalized to mean-transmission values in agreement with latestmeasurements and takes this effect into account.

Lyα absorption from randomly selected simulation sight-lines, then introduce the characteristics of the observedspectrum using auxiliary information returned by ourpipeline.Starting with simulated spectra from a set of detailed

hydrodynamical IGM simulations, we carry out the fol-lowing steps, which we will describe in turn in the sub-sequent subsections:

1. Introduce LLS absorbers

2. Smooth the spectrum to BOSS resolution

3. Add metal absorption via an empirical method us-ing lower-redshift SDSS/BOSS quasars

4. Add pixel noise, based on the noise properties ofthe real BOSS spectrum using parameters esti-mated by our MCMC noise estimation technique

5. Simulate continuum errors by refitting the noisymock spectrum

12 Lee et al.

Fig. 9.— Cumulative effect of various aspects of our forwardmodel that attempts to reproduce the Lyα forest transmission PDFfrom BOSS. Starting with the ‘raw’ transmission PDF from thesimulations (top), the black curve in each panel shows the PDFfrom the prior panel, while the red curve shows the effect from: (a)the addition of LLS; (b) smoothing from the finite spectrographresolution; (c) contamination from lower-redshift metals; (c) pixelnoise; (e) continuum fitting errors. The transmission PDF modeledin this figure is from the 〈z〉 = 2.3, 8 < S/N < 10 bin.

In the subsequent subsections, we will describe eachstep in detail. The effect of each step in on the observedtransmission PDF is illlustrated in Figure 9.

4.1. Hydrodynamical Simulations

TABLE 2Evolution of T0 in Hydrodynamical Simulations

〈z〉 T COLD T REF T HOT

2.3 13000K 18000K 23000K2.6 11000K 16000K 21500K3.0 9000K 14000K 19000K

As the basis for our mock spectra, we use hydrody-namic simulations run with a modification of the publiclyavailable GADGET-2 code. This code implements a sim-plified star formation criterion (Springel et al. 2005) thatconverts all gas particles that have an overdensity above1000 and a temperature below 105 K into star particles(see Viel et al. 2004). The simulations used are describedin detail in Becker et al. (2011) and in Viel et al. (2013a).The reference model that we use is a box of length 20

h−1 comoving Mpc with 2× 5123 gas and cold DM par-ticles (with a gravitational softening length of 1.3 h−1

kpc) in a flat ΛCDM universe with cosmological pa-rameters Ωm = 0.274, Ωb = 0.0457, ns = 0.968, H0 =70.2 km s−1Mpc−1 and σ8 = 0.816, in agreement bothwith WMAP-9yr (Komatsu et al. 2011) and Planck data(Planck Collaboration et al. 2013). The initial conditionpower spectra are generated with CAMB (Lewis et al.2000). For the boxes considered in this work, we haveverified that the transmission PDF has converged interms of box size and resolution.We explore the impact of different thermal histories on

the Lyα forest by modifying the ultraviolet (UV) back-ground photo-heating rates in the simulations as donein e.g., Bolton et al. (2008). A power-law temperature-density relation, T = T0∆

γ−1, arises in the low densityIGM (∆ < 10) as a natural consequence of the interplaybetween photo-heating and adiabatic cooling (Hui et al.1997; Gnedin & Hui 1998). The value of γ within a sim-ulation can be modified by varying a density-dependentheating term (see, e.g., Bolton et al. 2008). We considera range of values for the temperature at mean density, T0,and the power-law index of the temperature-density re-lation, γ, based on the observational measurements pre-sented recently by Becker et al. (2011). These consist ofa set of three different indices for the temperature-densityrelation, γ(z = 2.5) ∼ 1.0, 1.3, 1.6, that are kept roughlyconstant over the redshift range z = [2 − 6] and threedifferent temperatures at mean density, T0(z = 2.5) ∼[11000, 16000, 21500]K,which evolve with redshift, yield-ing a total of nine different thermal histories. Betweenz = 2 and z = 3 there is some temperature evolutionand the IGM becomes hotter at low redshift; at z = 2.3,the models have T0 ∼ [13000, 18000, 23000]K. We referto the intermediate temperature model as our ‘reference’model, or T REF, while the hot and cold models are re-ferred to as T HOT and T COLD, respectively. The valuesof T0 of our simulations at the various redshifts are sum-marized in Table 2.Approximately 4000 core hours were required for each

simulation run to reach z = 2. The physical proper-ties of the Lyα forest obtained from the TreePM/SPHcode GADGET-2 are in agreement at the percent levelwith those inferred from the moving-mesh code AREPO(Bird et al. 2013) and with the Eulerian code ENZO(O’Shea et al. 2004).


For this study, the simulation outputs were savedat z = [2.3, 2.6, 3.0], from which we extract 5000optical depth sightlines binned to 2048 pixels each.To convert these to transmission spectra, the opti-cal depths were rescaled such that the skewers collec-tively yielded a desired mean-transmission, 〈F 〉Lyα ≡exp(−τLyα). For our fiducial models, we would liketo use the mean-transmission values estimated byesti-mated by Becker et al. (2013), which we denote as for〈F 〉Lyα,B13 ≡ exp(−τLyα,B13). However, their estimatesassume certain corrections from optically-thick systemsand metal absorption. We therefore add back in the cor-rections they made (see discussion in §3.2) to get their‘raw’ measurement for 〈F 〉 that now includes all opti-cally thick systems and metals, and then remove thesecontributions assuming our own LLS and metal absorp-tion models (see below).Later in the paper, we will argue that our PDF analysis

in fact places independent constraints on 〈F 〉Lyα.

4.2. Lyman-limit systems

In principle, all optically-thick Lyα absorbers such asLyman-limit systems (LLSs) and damped Lyα absorbers(DLAs) should be discarded from Lyα forest analyses,since they do not trace the underlying matter densityfield in the same way as the optically-thin forest (Equa-tion 1), and require radiative transfer simulations to ac-curately capture their properties (e.g., McQuinn et al.2011; Rahmati et al. 2013).While DLAs are straightforward to identify through

their saturated absorption and broad damping wingseven in noisy BOSS data (see, e.g., Noterdaeme et al.2012), the detection completeness of optically-thick sys-tems through their Lyα absorption drops rapidly atNHI . 1020 cm−2. Even in high-S/N, high-resolutionspectra, optically thick systems can only be reliablydetected through their Lyα absorption at NHI &1019 cm−2 (“super-LLS”). Below these column densi-ties, optically-thick systems can be identified eitherthrough their restframe 912 A Lyman-limit (albeit onlyone per spectrum) or using higher-order Lyman-serieslines (e.g., Rudie et al. 2013). Neither of these ap-proaches have been applied in previous Lyα forest trans-mission PDF analyses (McDonald et al. 2000; Kim et al.2007; Calura et al. 2012; Rollinde et al. 2013), so ar-guably all these analyses are contaminated by LLSs.Instead of attempting to remove LLSs from our ob-

served spectra, we instead incorporate them into ourmock spectra through the following procedure. For eachPDF bin, we evaluate the total redshift pathlength of thecontributing BOSS spectra (and corresponding mocks)— this quantity is summarized in Table 1. This ismultiplied by lLLS(z), the number of LLS per unit red-shift, to give the total number of LLS expected withinour sample. We used the published estimates of thisquantity by Ribaudo et al. (2011)21 which is valid over0.24 < z < 4.9:

lLLS(z) = lz0(1 + z)γLLS , (10)

21 Note that the value lz0 = 0.30 given in Table 6 ofRibaudo et al. (2011) is actually erroneous, and the correct nor-malization is in fact lz0 = 0.1157, consistent with the data in theirpaper, which is used in Equation 10. Dr. J. Ribaudo, in privatecommunication, has concurred with this conclusion.

where lz0 = 0.1157 and γLLS = 1.83.After estimating the total number of LLSs in our

mock spectra, lLLS(z)∆z, we add them at random pointswithin our set of simulated optical depth skewers. Wealso experimented with adding LLSs such that they arecorrelated with regions that already have high columndensity (e.g., Font-Ribera & Miralda-Escude 2012), butwe found little significant changes to the transmissionPDF and therefore stick to the less computationally-intensive random LLSs.For each model LLS, we then draw a column den-

sity using the published LLS column density distribu-tion, f(NHI), from Prochaska et al. (2010). This distri-bution is measured at z ≈ 3.7, so we make the assump-tion that f(NHI) does not evolve with redshift between2 . z . 3.7. For our column densities of interest, thisdistribution is represented by the broken power-laws:

f(NHI) =

k1N−0.8HI if 1017.5 < NHI < 1019.0

k2N−1.2HI if 1019.0 < NHI < 1020.3

. (11)

For the normalizations k1 and k2, we demand that

∫ 1019.0

1017.5k1N

−0.8HI dNHI +

∫ 1020.3

1019.0k2N

−1.2HI dNHI = 1,

(12)and require both power-laws to be continuous at NHI =1019.0 cm−2. These constraints produce k1 = 10−4.505

and k2 = 103.095. After drawing a random value for thecolumn density of each LLS, we add the correspondingVoigt profile to the optical depth in the simulated skewer.In addition to the LLS with column densities of

1017.5 cm−2 < NHI < 1020.3 cm−2 that are defined tohave τHI ≥ 2, there is also a population of partial Lyman-limit systems (pLLSs) that are not well-captured in ourhydrodynamical simulations since they have column den-sities (1016.5 cm−2 . NHI < 1017.5 cm−2) at which ra-diative transfer effects become significant (τHI & 0.1).However, the incidence rates and column-density distri-bution of pLLSs are ill-constrained since they are dif-ficult to detect in normal LLS searches. We there-fore account for the pLLS by extrapolating the low-endof the power-law distribution in Equation 11 down toNHI = 1016.5 cm−2, i.e.

f(1016.5 cm−2 < NHI < 1017.5 cm−2) = k1N−0.8HI . (13)

This simple extrapolation does not take into account con-straints from the mean-free path of ionizing photons (e.g.,Prochaska et al. 2010) which predicts a steeper slope forthe pLLS distribution, but we will explore this later in§5.2.Comparing the integral of this extrapolated pLLS dis-

tribution with Equation 12 leads us to conclude that

lpLLS(z) = 0.197 lLLS(z), (14)

and we proceed to randomly add pLLSs to our mockspectra in the same way as LLSs.The other free parameter in our LLS model is their

effective b-parameter distribution. However, due to theobservational difficulty in identifying NHI . 18.5 cm−2

LLSs the b-parameter distribution of this distributionhas, to our knowledge, never been quantified. Dueto this lack of knowledge, it is common to simplyadopt a single b-value when attemping to model LLSs

14 Lee et al.

Fig. 10.— Simulated 〈z〉 = 2.3 Lyα forest skewer from our hydrodynamical simulations, without smoothing (top panel) and smoothedto BOSS resolution (bottom panel). The black curve is the simulated transmission directly extracted from the simulations, while the redcurve is the same transmission field but with a LLS added at λ = 4057 A or z = 2.337. The blue curve in the bottom panel shows the effectof the metal absorbers added using our empirical method. For illustrative purposes, we have specifically chosen to this simulated sightlineto have significant LLS and metal absorption; it is possible for a sightline to have neither. The dashed-horizontal line denotes F = 0.3,below which our fiducial transmission PDF model disagrees with BOSS (see § 5).

(e.g., Font-Ribera & Miralda-Escude 2012; Becker et al.2013). We therefore assume that all our pLLSs and LLSshave a b-parameter of b = 70 km s−1 similar to DLAs(Prochaska & Wolfe 1997), an ‘effective’ value meant tocapture the blending of multiple Lyα components. How-ever, the b-parameter for this population of absorbers isa highly uncertain quantity and as we shall see, it willneed to be modified to provide a satisfactory fit to thedata although it will turn out to not strongly affect ourconclusions regarding the IGM temperature-density re-lationship.

4.3. Spectral Resolution

The spectral resolution of SDSS/BOSS spectra is R ≡λ/∆λ ≈ 1500−2500 (Smee et al. 2013). The exact valuevaries significantly both as a function of wavelength, andacross different fibers and plates depending on observingconditions (Figure 1).For each spectrum, the BOSS pipeline provides an esti-

mate of the 1σ wavelength dispersion at each pixel, σdisp,in units of the co-added wavelength grid size (∆ log10 λ =10−4). The spectral resolution at that pixel can then beobtained from the dispersion, through the following con-version: R ≈ (2.35 × 1 × 10−4 ln 10 σdisp)

−1. Figure 1shows the pixel dispersions from 236 randomly-selectedBOSS quasar as a function of wavelength at the blue endof the spectrograph. Even at fixed wavelength, there is aconsiderable spread in the dispersion, e.g., ranging fromσdisp ≈ 0.9 − 1.8 at 3700A. The value of σdisp typicallydecreases with wavelength (i.e., the resolution increases).In their analysis of the Lyα forest 1D transmis-

sion power spectrum, Palanque-Delabrouille et al. (2013)made their own study of the BOSS spectral resolutionby directly analysing the line profiles of the mercury andcadmium arc lamps used in the wavelength calibration.They found that the pipeline underestimates the spec-tral resolution as a function of fiber position (i.e. CCD

row) and wavelength: the discrepancy is < 1% at bluewavelengths and near the CCD edges, but increases to asmuch as 10% at λ ∼ 6000 A near the center of the blueCCD (c.f. Figure 4 in Palanque-Delabrouille et al. 2013).Our analysis is limited to λ ≤ 5045 A, i.e. z ≤ 3.15,where the discrepancy is under 4%. Nevertheless, weimplement these corrections to the BOSS resolution esti-mate to ensure that we model the spectral resolution toan accuracy of < 1%.For each BOSS Lyα forest segment that contributes

to the observed transmission PDFs discussed in § 3.3,we concatenate randomly-selected transmission skew-ers from the simulations described in the previous sec-tion. This is because the simulation box size of L =20 h−1 Mpc (∆v ∼ 2, 000 km s−1) is significantly shorterthan the path length of our redshift bins (∆z = 0.3, or∆v ≈ 27, 000 km s−1). This ensures that each BOSSspectrum in our sample has a mock spectrum that is ex-actly matched in pathlength.We then directly convolve the simulated skewers

by a Gaussian kernel with a standard deviation thatvaries with wavelength, using the estimated reso-lution from the real spectrum, multiplied by thePalanque-Delabrouille et al. (2013) resolution correc-tions. The effect of smoothing on the transmissionPDF is illustrated by the red-dashed curve in Figure 9b.Smoothing decreases the proportion of pixels with high-transmission (F ≈ 1) and with high-absorption (F ≈ 0),and increases the number of pixels with intermediatetransmission values.

4.4. Metal Contamination

Metal absorption along our observed Lyα forest sight-lines acts as a contaminant since their presence alters theobserved statistics of the Lyα forest. In high-resolutiondata, this contamination is usually treated by directlyidentifying and masking the metal absorbers, although


Fig. 11.— An illustration of our empirical ‘sideband’ model ofmetal contamination in our mock Lyα forest spectra. The lowerpanel shows the zqso = 2.7 quasar along with its Lyα forest region(red) which we wish to model. To its corresponding mock spec-trum, we add metals observed in the λrest ≈ 1260 − 1390 A regionof a lower-redshift (zqso = 2.0) quasar (blue region in top panel).

in the presence of line blending it is unclear how thor-ough this approach can be.With the lower S/N and moderate resolution of the

BOSS data, direct metal identification and masking isnot a viable approach. Furthermore, most of the weakmetal absorbers seen in high-resolution spectra are notresolved in the BOSS data.Rather than removing metals from the BOSS Lyα

forest spectra, we instead add metals as observed inlower-redshift quasar spectra. In other words, we addabsorbers observed in the restframe λrest ≈ 1260 −1390 A region of lower-redshift quasars with 1 + zqso ≈(1216 A/1300 A)(1 + 〈z〉), such that the observed wave-lengths are matched to the Lyα forest segment with av-erage redshift 〈z〉. Figure 11 is a cartoon that illustratesthis concept. This method makes no assumption aboutthe nature of the metal absorption in the Lyα forest,and includes all resolved metal absorption spanning thewhole range of redshifts down to z ∼ 0. The disadvan-tage of this method is that it does not include metals withintrinsic wavelengths λ . 1300 A, but the relative con-tribution of such metal species towards the transmissionPDF should be small22 since most of the metal contam-ination comes from low-redshift (z . 2) C IV and Mg II.We use a metal catalogue generated by B. Lundgren

et al. (in prep; see also Lundgren et al. 2009), whichlists absorbers in SDSS (Schneider et al. 2010) and BOSSquasar spectra (Paris et al. 2012) — the SDSS spec-tra were included in order to increase the number ofzqso ≈ 1.9− 2.0 quasars needed to introduce metals intothe 〈z〉 = 2.3 Lyα forest mock spectra, which are not wellsampled by the BOSS target selection (Ross et al. 2012).We emphasize that we work with the ‘raw’ absorber cat-alog, i.e. the individual absorption lines have not beenidentified in terms of metal species or redshift. For eachquasar, the catalog provides a line list with the observedwavelength, equivalent width (EW, Wr), full-width athalf-maximum (FWHM), and detection S/N, Wr/σWr

.

22 Si III an obvious exception, but we will later account for thisomission in our error bars (§5.3).

To ensure a clean catalog, we use only Wr/σWr≥ 3.5

absorbers in the catalog that were identified from quasarspectra with S/N > 15 per angstrom redwards of Lyα.The latter criterion ensures that even relatively weaklines (with EW & 0.5 A) are accounted for in our cat-alog. Figure 12 shows an example of the lower-redshiftquasar spectra that we use for the metal modelling.However, we want to add a smooth model of the metal-

line absorption to add to our mock spectra, rather thanadding in a noisy spectrum. We therefore use a simplemodel as follows: For each Lyα forest segment we wish tomodel at redshift 〈z〉, we select an absorber line-list froma random quasar with 1 + zqso ≈ (1216 A/1300 A)(1 +〈z〉). We next assume that all resolved metals in theSDSS/BOSS spectra are saturated and thus in the flatregime of the curve-of-growth. The equivalent width isthen given by

Wr ≈(

2b

c

)

√

ln(τ0/ ln 2), (15)

where τ0 is the optical depth at line center, b is the ve-locity width and c is the speed of light. In the saturatedregime, Wr is mostly sensitive to changes in b while beinghighly insensitive to changes in τ0. We can thus adoptτ0 as a global constant and solve for b, given the Wr

of each listed absorber in the selected ’sideband’ quasar.We have found that τ0 = 3 provides a good fit for mostof the absorbers.We then add the Gaussian profile into our simulated

optical depth skewers:

τ = τ0 exp

[

−(c

b

)

(

∆λ

λ

)2]

(16)

centered at the same observed wavelength, λ, as the realabsorber. The red curve in Figure 12 shows our modelfor the observed absorbers, using just the observed wave-length, λ, and equivalent width, Wr, from the absorbercatalog.Our method for incorporating metals is somewhat

crude since one should, in principle, first deconvolvethe spectrograph resolution from the input absorbers,and then add the metal absorbers into our mock spec-tra prior to convolving with the BOSS spectral resolu-tion. In contrast, we fit b-parameters to the absorbercatalog without spectral deconvolution, therefore theseb-parameters can be thought of as combinations of thetrue absorber width, babs and the spectral dispersion,σdisp, i.e. b

2 ∼ b2abs + σ2disp. While technically incorrect,

this seems reasonable since the template quasar spectraand forest spectra that we are attempting to model bothhave approximately the same resolution, and in practicalterms this ad hoc approach does seem to be able to re-produce the observed metals in the lower-redshift quasarspectra (Figure 12). The other possible criticism of ourapproach is that it does not incorporate weak metal ab-sorbers, although we attempted to mitigate this by set-ting a very high S/N threshold on the template quasarsfor the metals. However, we have checked that such weakmetals do not significantly change the forest PDF (andindeed metals in general do not seriously affect the PDF,c.f. Figure 9c).We also tried adding metals with similar redshifts to

16 Lee et al.

Fig. 12.— A continuum-normalized spectrum of a BOSS quasar showing the metal absorbers in the 1300 A < λrest < 1390 A ‘sideband’region, which would be used to add metals to 〈z〉 = 2.6 mock Lyα forest spectra. The red curve shows our metal model for thisspectrum, generated from the observed wavelengths and equivalent widths in the absorber catalog generated by the automatic algorithm ofLundgren et al. (2009). We also assume that the absorbers all lie on the saturated portion of the curve-of-growth and have τ0 = 3, with theequivalent width (labeled above each absorption line) proportional to the b-parameter. The model absorption profiles represented by thered curve would be added to our mock Lyα forest spectra. We have chosen to plot this particular ‘sideband’ because it has more absorbersthan average — the typical spectrum has less metal absorption than this.

— and correlated with — forest absorbers (e.g., absorp-tion by Si II and Si III) measured in Pieri et al. (2010)and Pieri et al. (2014) using a method described in theappendix of Slosar et al. (2011). We found a negligibleimpact on the transmission PDF owing mainly to thefact that these correlated metals contribute only ∼ 0.3%to the overall flux decrement, so we neglect this contri-bution in our subsequent analysis.

4.5. Pixel Noise

It is non-trivial to introduce the correct noise to a sim-ulated Lyα forest spectrum: given a noise estimate fromthe observed spectrum, one needs to first ensure that themock spectrum has approximately the same flux normal-ization as the data. This is challenging, as the Lyα foresttransmission at any given pixel, which ranges from 0 to1, will vary considerably between the simulated spectrumand the real data.The simplest method of adding noise to a mock spec-

trum is simply to introduce Gaussian deviates using thepipeline noise estimate for each spectrum— this was es-sentially the method used by Desjacques et al. (2007)and the BOSS mocks described in Font-Ribera et al.(2012). However, with the MCMC co-addition proce-dure described in § 3.1, we are in a position to model thenoise in a more robust and self-consistent fashion.Recall that the MCMC procedure returns posterior

probabilities for two quantities: the true underlying spec-tral flux density, Fλ, and the four free parameters Aj ,which parametrize the noise in each spectrum. This es-timate of the Aj from each quasar spectrum allows us toaccurately model the pixel noise using Equation 5.The MF-PCA method (§ 3.2) produces an estimate of

the quasar continuum, c, providing approximately the

correct flux level at each point in the spectrum. Wecan now multiply c with the simulated Lyα forest trans-mission spectra, F , which had already been smoothedto the same dispersion as its real counterpart (the es-timated quasar continuum is already at approximatelythe correct smoothing, since it was fitted to the observedspectrum).This procedure produces a noiseless mock spectrum

with the correct flux normalization and smoothing. Wecan now generate noisy spectra corresponding to a givenBOSS quasar, using our MCMC noise estimation de-scribed in Section 3.1. First, we substitute our mockspectrum as Fλ into Equation 5, and then combine theAj noise parameters (estimated through our MCMC pro-cedure) as well as the calibration vectors Sλ,i and skyestimates sλ,i. This lets us generate self-consistent noisevectors corresponding to each individual exposure thatmake up the mock quasar spectrum, σλi. The noise vec-tors are then used to draw random Gaussian deviatesthat are added to the mock spectrum, on a per-pixelbasis, to create the mock spectral flux density, fλi. Fi-nally, we combine these individual mock exposures intothe optimal spectral flux density for the mock spectrum,through the expression (see Appendix):

fopt,λ ≡ 1

σ2opt,λ

∑

i

fλiσ2λi

, (17)

where1

σ2opt,λ

≡∑

i

1

σ2λi

. (18)

Figure 9c illustrates the effect of adding pixel noise tothe smoothed Lyα forest transmission PDF. As expected,


Fig. 13.— Simulating the noise properties and continuum errors of a BOSS quasar. The top panel shows the observed spectrum of aBOSS quasar, and its associated continuum fit, c, in blue. The middle panel shows the simulated transmission spectra (after adding LLS,smoothing and adding metals) multiplied by the quasar continuum fitted to the true spectrum. In the lower panel, we have added noise tothe mock spectrum using the noise parameters estimated from the true spectrum (see § 3.1). A new continuum, c′, (red) is re-fitted to thenoisy mock spectrum. The difference between new continuum c′ and ‘true’ continuum, c, of the mock (blue) introduces continuum errorsto our model. The vertical dotted lines indicates the range of pixels that contribute to the 〈z〉 = 3.0 subsample in our transmission PDF;a small segment between (1 + zqso)1040 A = 4461 A and (1 + 2.75)1216 A = 4560 A also contributes to the 〈z〉 = 2.6 bin.

this scatters a significant fraction of pixels to F > 1, andalso to F < 0 to a smaller extent.

4.6. Continuum Errors

With the noisy mock spectrum in hand (see, e.g.,bottom panel of Figure 13), we can self-consistentlyinclude the effect of continuum errors into our modeltransmission PDFs by simply carrying out our MF-PCA continuum-fitting procedure on the individual noisymock spectra. Dividing out the mock spectra with thenew continuum fits then incorporates an estimate of thecontinuum errors (estimated by Lee et al. 2012 to be atthe ∼ 4−5% RMS level) into the evaluated model trans-mission PDF. This estimated error includes uncertaintiesstemming from the estimation of the quasar continuum

shape due to pixel noise, as well as the random variancein the mean Lyα forest absorption in individual lines-of-sight.Note that regardless of the overall mean-absorption in

the mock spectra (i.e. inclusive of our models for metals,LLSs, and mean forest absorption— see § 5.4), we alwaysuse 〈F 〉cont(z), the same input mean-transmission de-rived from Becker et al. (2013) (described in § 3.2) to fitthe continua in both the data and mock spectra. Whilethe overall absorption in our fiducial model is consistentwith that from Becker et al. (2013), as we shall see later,the shape of the transmission PDF retains informationon the true underlying mean-transmission even if fittedwith a mean-flux regulated continuum with a wrong in-put 〈F 〉(z).

18 Lee et al.

The effect of continuum errors on the transmissionPDF is shown in Figure 9e: like pixel noise, it degradesthe peak of the PDF, but only near F ∼ 1.

5. MODEL REFINEMENT

In an ideal world, one would like to do a blind anal-ysis by generating the transmission PDF model (§4) inisolation from the data, before ‘unblinding’ to comparewith data — this would then in principle yield resultsfree from psychological bias in the model building. How-ever, as we shall see in §5.1, this does not give acceptablefits to the data so we have to instead modify our modelto yield a better agreement, in particular our LLS model(§5.2) and assumed mean-transmission (§5.4).

5.1. Initial Comparison with T REF Models

For each of our 9 hydrodynamical simulations (sam-pling 3 points each in T0 and γ), we determine the trans-mission PDF from the Lyα forest mock spectra that in-clude the effects described in the previous section, forthe various redshift & S/N subsamples in which we hadmeasured the PDF in BOSS (§3.3). In Figure 14, weshow the transmission PDFs for all our redshift and S/Nsubsamples in BOSS, compared with the correspondingsimulated transmission PDFs from the T REF simulationwith γ = [1.0, 1.3, 1.6]. Note that the error bars shownare the diagonal elements of the covariance matrix esti-mated through bootstrap resampling on the data.At first glance, the model transmission PDFs seem to

be a reasonably match for the data, especially consider-ing we have carried out purely forward modelling with-out fitting for any parameters. However, when compar-ing the ‘pull’, (pdata,i − pmodel,i)/σp,i, between the dataand model (bottom panels of Figure 14), we see signif-icant discrepancies in part due to the extremely smallbootstrap error bars. Nevertheless, it is gratifying to seethat the shape of the residuals are relatively consistentacross the different S/N subsamples at fixed redshift andγ, since this indicates that our spectral noise model isrobust.We proceed to quantify the differences between the

simulated transmission PDFs, pmodel, and observedtransmission PDFs, pdata, with the χ2 statistic:

χ2 =∑

ij

(pmodel,i−pdata,i)TC−1

ij (pmodel,j−pdata,j), (19)

where we use the bootstrap error covariance matrix,Cboot. Note that we also include a bootstrap error termthat accounts for the sample variance in the model trans-mission PDFs, since our pipeline for generating mockspectra is too computationally expensive to include suf-ficiently large amounts of skewers to fully beat down thesample variance in the models23.We limit our model comparison to the range −0.1 ≤

F ≤ 1.2, i.e. 27 transmission bins with bin width ∆(F ) =0.05. This range covers pixels that have been scattered to‘unphysical’ values of F < 0 or F > 1 due to pixel noise,as is expected from the low-S/N of our BOSS data, andalso captures > 99.8% of the pixels within each of our

23 We aim for 3−4× more mock spectra than in the correspond-ing data sample, but later when we have to compute large modelgrids we are limited to models with the same size as the data.

data subsets. In particular, it is important to retain thebins with F > 1 because the F ∼ 1 transmission bins arehighly sensitive to γ (Lee 2012) and therefore we want tofully sample that region of the PDF even if it will requirecareful modeling of pixel noise and continuum errors.There are two constraints on all our transmission

PDFs: the normalization convention∫

p(F ) dF = 1 (20)

and the imposition of the same mean transmission dueto the mean-flux regulated continuum-fitting

∫

F p(F ) dF = 〈F 〉cont (21)

such that all the mock spectra have the same absorp-tion, 〈F 〉cont(z). This is because the mock spectra havebeen continuum-fitted (§4.6) in exactly the same wayas the BOSS spectra, which assumes the same meanLyα transmission inferred from the Becker et al. (2013)measurements (§3.2). The ‘true’ optically-thin mean-transmission, 〈F 〉Lyα, imposed on the simulation skew-ers is in principle a different quantity from 〈F 〉cont, sincethe latter includes contribution from metal contamina-tion and optically-thick LLSs.This leaves us with ν = 27 − 1 − 2 = 24 degrees of

freedom (d.o.f.) in our χ2 comparison. The χ2 for allthe models shown in Figure 14 are shown in the corre-sponding figure legends.In this initial comparison, the χ2 values for the models

in Figure 14 are clearly unacceptable: we find χ2 & 200for 24 d.o.f. in all cases. However, it is interesting tonote that the γ = 1.6 or γ = 1.3 models are preferred atall redshifts and S/N cuts. Note that the S/N=8 − 10subsamples (middle column in Figure 14) tends to have aslightly better agreement between model and data com-pared to the other S/N cuts at the same redshift: thissimply reflects the smaller quantity of data of the sub-sample (c.f. Table 1) and hence larger bootstrap errors.A closer inspection of the residuals in Figure 14 in-

dicate that there are two major sources of discrep-ancy between the models and data: firstly, at the low-transmission end, we underproduce pixels at 0.1 . F .0.4 while simultaneously over-producing F . 0.1 pixels,especially at 〈z〉 = 2.3 and 〈z〉 = 2.6. This seems to af-fect all γ models equally. Pieri et al. (2014) found thatat BOSS resolution, pixels with F . 0.3 come predom-inantly from saturated Lyα absorption from LLS. Wetherefore investigate possible modifications to our LLSmodel in §5.2.The other discrepancy in the model transmission PDFs

manifests at the higher-transmission end in the 〈z〉 = 2.6and 〈z〉 = 3 subsamples, where we see a sinusoidal shapein the residuals at F > 0.6 that appears consistent acrossdifferent S/N. This portion of the transmission PDF de-pends on both γ and, as we shall see, on the assumedmean-transmission 〈F 〉(z), which we shall discuss in moredetail in §5.4.Finally, our transmission PDF model includes vari-

ous uncertainties in the modelling of metals, LLSs, andcontinuum-fitting which have not yet been taken into ac-count. In §5.3, we will estimate the contribution of theseuncertainties, by means of a Monte-Carlo method, in our


Fig. 14.— An initial comparison between the transmission PDFs observed from BOSS Lyα forest data (error bars) and simulated PDFsgenerated from the T REF hydrodynamical simulations (curves) with the method described in § 4; each row is at the same redshift, whilethe different columns display the different S/N cut. The points with the error bars are the PDFs measured from the BOSS data (estimatedfrom bootstrap resampling, while the black, dotted-red and dashed-blue curves denote simulated PDFs with γ = [1.5, 1.3, 1.0] respectively.The top and middle panels show the transmission PDFs with linear and logarithmic axes, while the lower panels show the pull, i.e. residualsbetween the simulated PDF and the data PDF, divided by the error. The χ2 values indicated in these plots are for 24 d.o.f., and clearlyindicate unacceptable fits to the data — modifications to the model are required.

20 Lee et al.

Fig. 15.— LLS and pLLS column-density power-law distribu-tions used in our initial model (black; §4.2) and steeper modifi-cation (red; §5.2). The distributions are normalized assuming theoverall LLS incidence rate at z = 2.25 (c.f. Eq. 10). The verticaldashed-lines denotes the NHI = 1017.5 cm−2 boundary betweenpLLS and LLS, and NHI = 1019 cm−2 boundary between LLS andsuper-LLS. The shaded regions show the range of possible distri-butions as determined by Prochaska et al. (2010), but there arefew robust constraints in the 1016.5 cm−2 ≤ NHI ≤ 1017.5 cm−2

pLLS regime. The ‘initial’ distribution was used in the preliminarydata comparisons in §5.1, but all subsequent analysis (after §5.2)assumes the ‘steep’ distribution.

error covariances.

5.2. Modifying the LLS Column Density Distribution

With the moderate spectral resolution of BOSS, thereare few individual pixels in the optically-thin Lyα forestthat reach transmission values of F . 0.4. Such low-transmission pixels are typically due to either the blend-ing of multiple absorbers (see, e.g., Figure 2 in Pieri et al.2014), or optically-thick systems (see Figure 10 in thispaper).As we have seen in Figure 14, at low-transmission val-

ues the discrepancy between data and model has a dis-tinct shape, which is particularly clear at 〈z〉 = 2.3: themodels underproduce pixels at 0.1 . F . 0.4 while at thesame time overproducing saturated pixels with F ≈ 0.To resolve this particular discrepancy would therefore

require either drastically increasing the amount of clus-tering in the Lyα forest, or modifying our assumptions onthe LLSs in our mock spectra. The first possibility seemsrather unlikely since the Lyα forest power on relevantscales are well-constrained (Palanque-Delabrouille et al.2013), and would in any case require new simulationsuites to address — beyond the scope of this paper.On the other hand, it is not altogether surprising that

our fiducial column density distribution (§ 4.2) — whichwas measured at z ≈ 3.7 (Prochaska et al. 2010) — donot reproduce the BOSS data at 〈z〉 = 2.3 − 2.6. Wetherefore search for a LLS model that better describesthe low-transmission end of the BOSS Lyα forest. Look-ing at the 〈z〉 = 2.3 PDFs in Figure 14, we see that ourfiducial model over -produces pixels at F = 0, yet is defi-cient at slightly higher F . This suggests that our modelis over-producing super-LLS (NHI > 1019 cm−2) thatcontribute large absorption troughs with F = 0, whilenot providing sufficient lower-column density absorbersthat can individually reach minima of 0.1 . F . 0.4

Fig. 16.— Variation of the transmission PDF as a function ofLLS b-parameter. All model transmission PDFs here are computedfrom the T REF, γ = 1.6 model assuming the revised pLLS/LLSdistribution described in §5.2 (curves), compared with the S/N =6−8 BOSS transmission PDF at 〈z〉 = 2.3 (error bars). The quotedχ2 values are for 24 d.o.f., and evaluated using only bootstrap errorcovariances. We find that b = 45 km s−1 gives the best fit to thedata.

when smoothed to BOSS resolution. In other words, ourfiducial model appears to have an excessively ‘top-heavy’LLS column density distribution.For a change, we will try a LLS column density dis-

tribution with a more ample bottom-end, using thesteepest power-laws within the 1σ limits estimated byProchaska et al. (2010):

f(NHI) =

k1N−1.2HI if 1017.5 < NHI < 1019.0

k2N−1.4HI if 1019.0 < NHI < 1020.3

. (22)

We use the same lLLS(z) as before, and obey the inte-gral constraints from Prochaska et al. (2010) that de-mand that the ratios of

∫

f(NHI) dNHI between thetwo column-density regimes be fixed. This gives usk1 = 102.819 and k2 = 107.039, although the new distribu-tion is no longer continuous at NHI = 1019 cm−2. Thisnew distribution is illustrated by the red power-laws inFigure 15.Another change we have made is to the partial LLS

model, which was possibly too conservative in the fidu-cial model. Instead of extrapolating from the LLS dis-tribution, we now adopt the pLLS power-law slope ofβpLLS = −2.0 inferred from the total mean-free path toionizing photons by Prochaska et al. (2010). This dra-matically increases the incidence of pLLS in our spectrarelative to LLS: we now have lpLLS = 1.8 lLLS, wherelLLS is the same value we used previously (Equation 10).


Fig. 17.— Variation of the transmission PDF as a function ofthe IGM temperature at mean-density, T0. All model transmissionPDFs here have the same temperature-density relationship, γ =1.6, and are compared with the S/N = 6-8 BOSS transmission PDFat 〈z〉 = 2.3 (error bars). The quoted χ2 values are for 23 degrees offreedom. Note that in these models we have already implementedthe improved LLS/pLLS model decribed in §5.2, hence the muchimproved χ2 values compared those quoted in Fig. 14.

This increase, while large, is not unreasonable in light ofthe large uncertainties in direct measurements on the H I

column-density distribution from direct Lyα line-profilefitting (e.g., Janknecht et al. 2006; Rudie et al. 2013).Note also that even this increased pLLS incidence onlyamounts to, on average, less than one pLLS per quasar(∆(z) ∼ 0.3− 0.4 per quasar at our redshifts).We found that while increasing the number of pLLS re-

lieves the tension between data and model at 0.1 . F .0.4, it does not resolve the excess at the fully absorbedF ≈ 0 pixels in the models. However, changing the b-parameter of the LLS and pLLS from our original fidu-cial value of b = 70 km s−1 modifies the PDF in a waythat improves the agreement. This is a reasonable step,since the effective b-parameter is otherwise observation-ally ill-constrained for the LLS and pLLS populations.This is because LLSs are typically complexes of multi-ple systems separated in velocity space, and while therehave been analyses of the b-parameter in these individ-ual components, the ‘effective’ b-parameter for completeLLS systems has never been quantified to our knowledge.We therefore search for the best-fit b-parameter with

respect to the T REF, γ = 1.3 model at 〈z〉 = 2.3, focus-ing primarily on the agreement in the 0 ≤ F ≤ 0.4 bins(Figure 16). Our choice of model for this purpose shouldnot significantly affect our subsequent conclusions re-garding the IGM temperature-density slope, since there

Fig. 18.— Grey curves show 50 model transmission PDFs witha random sampling of different LLS incidence rates, metal ab-sorption, and continuum scatter, evaluated for the 〈z〉 = 2.3,S/N=8− 10 BOSS subsample and using the T REF simulation withγ = 1.6. The red curve shows the transmission PDF at our fiduciallevel of LLS incidence, metal absorption, and continuum scatter.The top panel is has a linear abscissa, while the lower panel has alogarithmic abscissa.

is little sensitivity towards the latter in the relevant low-transmission bins (c.f. Figure 14). However, there willbe some degeneracy between the LLS b-parameter andT0 (Figure 17) since changing the latter does somewhatchange the low-transmission portion of the PDF — wewill come back to this point in §7.As shown in Figure 16, a value b = 45 km s−1 gives the

best agreement with the data at 0 ≤ F ≤ 0.4. This yieldsχ2 = 116 for 24 d.o.f., which is dramatically improvedover those quoted in Figure 14, but still not quite a goodfit. In the subsequent results, we will adopt this steeperpLLS/LLS model and b-parameter as the fiducial modelin our analysis, and will correspondingly decrease thedegrees of freedom in our χ2 analysis to account for thefitting of b.Note that while significantly improving the PDF fit,

this new b-parameter still does not give a perfect fit tothe low-transmission (F < 0.4) end. This is probablydue to the simplified nature of our LLS model, whichneglects the finite distribution of b-parameters and inter-nal velocity dispersion of individual components. Theseproperties are currently not well-known, and it seemslikely that an improved model would allow a better fit tothe low-transmission end of the PDF.

5.3. Estimation of Systematic Uncertainties

While we have estimated the sample variance of ourBOSS transmission PDFs by bootstrap resampling onthe spectra, there are significant uncertainties associatedwith each component of our transmission PDF model asdescribed above, e.g., the LLS incidence rate and levelof continuum error. These uncertainties can be incorpo-

22 Lee et al.

Fig. 19.— (Top) 2D density plot of the error covariance matrixrepresenting our systematic uncertainties in the LLS incidence rate,pLLS column-density distribution, LLS b-parameter, metal absorp-tion, and continuum scatter, as estimated through the Monte Carlomethod described in § 5.3. The bottom plot shows the correspond-ing correlation function. This particular covariance matrix wasestimated for the 〈z〉 = 2.6, S/N = 8 − 10 subsample, and thevalues in the covariance have been multiplied by 104 for clarity.

rated into a systematics covariance matrix, Csys that canthen be added to the bootstrap covariance, Cboot, whencomputing the model likelihoods. This requires assum-ing that Csys and Cboot are uncorrelated, and that theerrors are Gaussian distributed.We adopt a Monte Carlo approach to estimate Csys by

generating 200 model transmission PDFs that randomlyvary the systematics. We then evaluate the covariance ofthe transmission PDFs, pi, relative to the fiducial model,pref,i at each transmission bin i. This allows us to con-struct a covariance matrix with the elements

Csys,ij = 〈(pi − pref,i)(pj − pref,j)〉 (23)

that encompasses the errors from the uncertainties inthe LLS model, metal absorption, and continuum scat-

ter. Note that estimation of systematic uncertainties istypically a subjective process, and for most of these con-tributions we can only make educated guesses as to theiruncertainty.Our Monte Carlo iterations sample the various compo-

nents of our model as follows:

LLS Incidence: We sample the uncertainty in thepower-law exponent γLLS of the redshift evolu-tion in LLS incidence rate (Equation 10), which isσγLLS

± 0.21 as reported by Ribaudo et al. (2011).We assume this uncertainty is Gaussian and drawlLLS(z) accordingly. This primarily affects the low-flux regions −0.1 . F . 0.3 of the PDF.

partial-LLS Slope: Our choice of slope for the distri-bution of partial LLS (NHI < 1017.5 cm−2 ab-sorbers is from an indirect constraint with sig-nificant uncertainty (Prochaska et al. 2010). Wetherefore vary the pLLS slope around the fiducialβpLLS = −2.0 by ±0.5 assuming a flat prior in thisrange, which primarily alters the 0 . F . 0.4 por-tion of the PDF since pLLS typically do not satu-rate at BOSS resolution.

LLS b-parameters: Also in the previous section, wefound that a global b-parameter of b = 45 km s−1

gives the best agreement with the data, but this isan ad hoc approach with significant uncertainties.In our Monte Carlo Sampling we therefore adopta conservative b = 45 km s−1 ± 20 km s−1 with auniform prior. This primarily affects the PDF at−0.1 ≤ F ≤ 0.4 as can be seen in Figure 16.

Intervening Metals: Although we used an empiricalmethod to model intervening metals (§ 4.4), wemay have missed metals with rest wavelengths λ .1300 A. Furthermore, we have a relatively smallset (∼ 300− 400) of ‘template’ quasars from whichour metal model is derived, which may contributesome sampling variance. We therefore guess at anGaussian error of ±30% for the metal incidencerate. This modulates the extent to which metalspulls the overall PDF towards lower F -values (c.f.Figure 9c).

Continuum Errors: The overall r.m.s. scatter in ourcontinuum estimation also affect the flux PDF(Figure 9e). This can be varied in our model byrescaling the quantity c′(λ)/c(λ)−1, where c is the‘true’ continuum used to generate the mock spec-trum, while c′ is the model continuum which wesubsequently fit (Figure 13). For each iteration inour Monte Carlo systematics estimation, we dilateor reduce c′(λ)/c(λ) − 1 by a Gaussian deviate as-suming ±20% scatter. This primarily affects thehigh-transmission (F > 0.8) end of the PDF.

For these Monte Carlo iterations, we used the identicalthermal model (γ = 1.6, T REF) as well as fixed the samerandom number seeds used for the selection of simulationskewers and generation of noise vectors in our spectra, inorder to ensure that the only variation between the dif-ferent iterations are from the randomly-sampled system-atics. Figure 18 shows 50 of these Monte Carlo iterations


on the transmission PDF for the 〈z〉 = 2.3, S/N=8− 10subsample.Figure 19 shows an example of the systematic contribu-

tion to the covariance matrix. The overall amplitude ofthe systematic contribution is considerably higher thanthat estimated from the bootstrap resampling (c.f. Fig-ure 7), indicating that we are in the systematics-limitedregime. We also see significant anti-correlations at al-most the same level as the positive correlations, whichare due mostly to correlations between transmission binson either side of ‘pivot points’ as the transmission PDFvaries from the systematics — these anti-correlations willsomewhat counteract the increased size of the diagonalcomponents. In the subsequent analysis, we will use anerror covariance matrix, C = Cboot + Csys, in whichthe systematics covariance matrix estimated in this sub-section is added to the bootstrap covariance matrix (de-scribed in § 3.3) estimated from the BOSS transmissionPDFs.We have at this point yet to address one more pa-

rameter which can significantly change the shape of ourmodel transmission PDFs, namely the Lyα forest mean-transmission assumed in the mock spectra, 〈F 〉Lyα. How-ever, this is an important astrophysical parameter whichwe did not want to treat as a ‘systematic’, so the nextsub-section will describe our treatment of 〈F 〉Lyα.

5.4. Modifying the Mean-transmission

In the initial comparison of the model transmissionPDFs shown in Figure 14, the models show a discrep-ancy with the data at higher transmission bins F &0.6. Such differences can be alleviated by varying themean-transmission of the pure Lyα forest, 〈F 〉Lyα ≡exp(−τLyα), i.e. ignoring the contribution from met-als and LLS. This quantity can be varied directly inthe simulation skewers (Section 4.1). When we vary〈F 〉Lyα in the simulations, the quantity 〈F 〉cont, whichis used to normalize the continuum level of the mockquasar spectrum, is always kept fixed to 〈F 〉eff(z) =exp[−(τLyα+τmetals+τLLS)] as derived from Becker et al.(2013) (see Section 3.2). However, since we are apply-ing the same 〈F 〉cont to both the real and mock spec-tra, 〈F 〉cont can be best thought of as a normalizationthat does not actually need to match 〈F 〉eff . Onceboth the real and mock spectra have been normalizedby 〈F 〉cont, the transmission PDF retains information onthe respective contributions from the Lyα forest, met-als and LLSs regardless of the assumed 〈F 〉cont, becausethese contributions affect the shape of the PDF in dif-ferent ways. In principle, it is possible to vary theseall components to infer their relative contributions, butdue to the crudeness of our metal and LLS models, wechoose have only 〈F 〉Lyα as a free parameter while keep-ing 〈F 〉metals = exp(−τmetals) and 〈F 〉LLS = exp(−τLLS)fixed. The possible variation of these latter two compo-nents are instead incorporated into the systematic uncer-tainties determined in Section 5.3. The effect of varying〈F 〉Lyα is illustrated in Figure 20, where we plot the sameIGM model with different underlying values of 〈F 〉Lyα inthe simulation skewers whilst keeping fixed the contribu-tion from metals, LLSs etc.We therefore explore a range of 〈F 〉Lyα around

the vicinity of that estimated by Becker et al. (2013),〈F 〉Lyα,B13, and at each value of 〈F 〉Lyα evaluate the

Fig. 20.— Variation of the model transmission PDFs (curves)with respect to changing the mean-transmission, 〈F 〉Lyα, of theLyα forest simulations. The model PDFs were generated fromthe γ = 1.6, T REF model, while the error bars show the cor-responding transmission PDFs from BOSS data. In the bottompanel, the dashed horizontal lines indicate ±1σ discrepancies be-tween models and data, although we caution against ‘chi-by-eye’due to the significantly non-diagonal covariances in the errors. Thecentral 〈F 〉Lyα value shown here corresponds to that estimated byBecker et al. (2013), while the other two are evaluated at ±1σ oftheir reported errors. The mean-transmission value, 〈F 〉cont, as-sumed in the mean-flux regulated continuum fitting is constant inall cases. Note that the χ2 values, which are for 23 d.o.f., are muchimproved over the previous data comparisons, since they now in-clude the improved LLS/pLLS model as well as the full covariancematrix including systematic uncertainties.

χ2 summed over all the S/N subsamples for each 〈z〉and γ combination. In addition, we now adopt the up-dated LLS/pLLS model described in §5.2, while the χ2

evaluation now uses the full covariance matrix includingboth the bootstrap and systematics (§5.3) uncertaintiesto compare with the transmission PDFs measured fromthe BOSS data.The models are compared with the BOSS data as we

vary 〈F 〉Lyα, and for each 〈F 〉Lyα we compute the to-tal chi-squared summed over all three S/N subsamples,where each subsample contributes 27− 1 − 2 = 24 d.o.f.(c.f. Equations 20 and 21) along with a further reductionof one d.o.f. since we have effectively fitted for the LLS b-parameters in § 5.2, for a total of ν = 71 d.o.f. The resultof this exercise is shown in Figure 21 which shows the χ2

values for the T REF models with different γ — we onlyvary γ and not T0 because the F & 0.6 portions of thetransmission PDF that change the most with 〈F 〉Lyα donot vary as much with respect to changes in T0 (c.f. Fig-ure 17). Examples of the corresponding best-fit modelPDFs in one S/N subsample are shown in Figure 22,where we see that varying 〈F 〉Lyα can indeed change

24 Lee et al.

Fig. 21.— χ2 values for the T REF models (with different γ) plot-ted as a function of Lyα forest mean-transmission values, 〈F 〉Lyα,

used to normalize the simulation skewers. The quoted χ2 values(with ν = 71 d.o.f.) were obtained by summing over the χ2 for thedifferent S/N subsamples at each redshift. The fiducial transmis-sion values inferred from Becker et al. (2013) is shown as the solidvertical lines, while the dot-dashed vertical lines denote their 1σerrors. The dashed lines in the 〈z〉 = 3 panel denote the inflatederror bars we use to account for the quasar selection bias shown inFig. 23. In §6 we will marginalize over the uncertainties in 〈F 〉Lyαto obtain our final results.

the shape of the F & 0.6 portion of the transmissionPDF sufficiently, improving the fits in those transmissionranges compared to the fiducial models (Figure 14).In all our redshift bins, the best-fitting models seen

in Figure 21 are γ = 1.6 with χ2 = [69, 67, 54] for 70d.o.f.24 at 〈z〉 = [2.3, 2.6, 3.0] (for the combined datausing all S/N bins), respectively implying probabilitiesof P = [52%, 59%, 92%] of getting larger values25. Atthe higher redshifts best-fitting mean-transmission forthe γ = 1.6 case is pushed to significantly discrepantvalues with respect to the fiducial Becker et al. (2013)values (Figure 21).The γ = 1.3 model also provide acceptable fits to

the models, with χ2 = [71, 73, 58] for 70 d.o.f. (P =[43%, 40%, 84%]) at 〈z〉 = [2.3, 2.6, 3.0], but at the twohigher redshift bins this requires 〈F 〉Lyα values that areincreasingly discrepant compared to Becker et al. (2013)(+2.3σ and +5σ respectively at 〈z〉 = [2.6, 3.0] ). Theisothermal γ = 1.0 models are disfavored at the two lowerredshift bins, with best-fit values of χ2 = [98, 97] for 70d.o.f. (P = [2%, 2%]) at 〈z〉 = [2.3, 2.6], whereas at〈z〉 = 3, the error bars on the PDF are sufficiently largethat acceptable fits are obtainable using γ = 1.0, withχ2 = 68 for 70 d.o.f. (P = 54%). However, this requires a+5σ discrepancy in 〈F 〉Lyα with respect to Becker et al.(2013). In Figure 22, one sees that fitting for 〈F 〉Lyαallows the γ = 1.0 models to be in good agreement withthe data in the F > 0.7 portion of the PDF, but gives riseto discrepancies in the 0.4 . F . 0.7 range which limitsthe goodness-of-fit, and cannot easily be compensated bymodifying the metals or LLS model.From Figure 21, it is clear that as we move to higher

redshifts, we require increasingly higher 〈F 〉Lyα rela-tive to the fiducial Becker et al. (2013) values in or-der to agree with the data: at 〈z〉 = 2.3, our best-fitmean-transmission for the γ = 1.6 model agrees withBecker et al. (2013), but at 〈z〉 = 3 there is a significantdeviation of +2σ with respect to the Becker et al. (2013)measurement. The same trend is true for the best-fitγ = 1.3 and γ = 1.0 models, but these require evengreater discrepancies with respect to the fiducial 〈F 〉Lyα.One possible explanation for this discrepancy is the ef-

fect on the Becker et al. (2013) measurement of u-bandselection bias in the SDSS quasars. This was first notedby Worseck & Prochaska (2011), who found that thecolor-color criteria used to select SDSS quasars preferen-tially selected quasars, specifically in the redshift range3 . zqso . 3.5, that have intervening Lyman-breaks at

λrest < 912 A. The 3 . zqso . 3.5 SDSS quasars are thusmore likely to have intervening LLS in their sightlines,yielding an additional contribution to the Lyα absorptionand hence causing Becker et al. (2013) to possibly un-derestimate 〈F 〉Lyα when stacking the impacted quasars.Becker et al. (2013) mentioned this effect in their paperbut argued that it was much smaller than their esti-

24 In this particular section, when we quote the χ2 for the best-fitting 〈F 〉Lyα the d.o.f. is further reduced by 1 compared to the

other χ2 summed over the S/N subsamples.25 These χ2 values are very small for the degrees of freedom,

suggesting that we may have overestimated the size of our system-atic errors, but as we shall see this does not affect our ability toplace constraints on γ and merely makes our conclusions ratherconservative.


Fig. 22.— Model transmission PDFs (curves) with the best-fit Lyα forest mean-transmission 〈F 〉Lyα for different γ values from the T REF

family of models (using the improved LLS/pLLS model). These are for the S/N=8-10 subsample and compare with the correspondingBOSS data transmission PDFs (error bars). The upper two panels in each plot show the transmission PDFs in linear and logarithmicordinate axes, respectively, while the bottom panels show residuals divided by the errors, with dashed horizontal lines indicating the ±1σregion relative to the data. The best-fitting 〈F 〉Lyα values correspond to the minima in Fig. 21, but here we have labeled them relative to

〈F 〉Lyα,B13, the fiducial Becker et al. (2013) values and errors. The χ2 values quoted are for 23 d.o.f. (taking into account the fitting ofthe LLS b-parameter), and were computed using the full error covariances including both bootstrap and systematic terms.

Fig. 23.— The red and black curves show the excess Lyα absorp-tion expected from sightlines of zqso = 3.2 and zqso = 3.4 quasars,respectively, relative to the mean IGM transmission. This is causedby the SDSS selection bias described in Worseck & Prochaska(2011), which yield above-average numbers of intervening LLS.These are derived from the same curves shown in Figure 17 ofWorseck & Prochaska (2011), but replotted as ratios smoothed bya boxcar function over 12 pixels for clarity. The top axis labels theLyα absorption redshift corresponding to each wavelength, whilethe shaded region indicates the wavelength range of our 〈z〉 = 3.0bin. The dashed-line shows, for comparison, the relative errorson the Lyα forest mean transmission estimated by Becker et al.(2013). The discrepancy due to the SDSS bias is significant com-pared to the Becker et al. (2013) errors.

mated errors by referencing theoretical IGM transmissioncurves estimated by Worseck & Prochaska (2011) (Fig-ure 17 in the latter paper).Dr. G. Worseck has kindly provided us with these

transmission curves, TIGM(λ), which were generated forboth the average IGM absorption and that extractedfrom SDSS quasars affected by the color-color selectionbias. In Figure 23 we plot the relative difference between

the biased Lyα transmission deduced from zqso = 3.2and zqso = 3.4 quasars and the true mean IGM trans-mission, using the Worseck & Prochaska (2011) trans-mission curves. It is clear that at Lyα absorption red-shifts of zabs ≈ 3, the excess LLS picked up from suchquasars contribute an additional ∼ 1% compared to themean IGM decrement, a discrepancy that is of the samemagnitude as the error bars in the Becker et al. (2013)measurement, indicated by the dashed line.This could partially explain the higher 〈F 〉Lyα required

to make our 〈z〉 = 3 models fit the data in Figure 21.Note that we expect this UV color selection bias to bemuch less significant in our BOSS data, since we haveselected bright quasars in the top 5th percentile of theS/N distribution. Given that such quasars have highsignal-to-noise ratio photometry, their colors separatemuch more cleanly from stellar contaminants. Further-more, such bright quasars are much more likely to havebeen selected with multi-wavelength data (e.g., includ-ing near-IR and radio in addition to optical photometrysee Ross et al. 2012). For both of these reasons, we ex-pect our quasars to be much less susceptible to biases incolor-selection related to the presence of an LLS. A care-ful accounting of this bias is beyond the scope of thispaper, but from now on we will inflate by a factor oftwo the corresponding errors on 〈F 〉Lyα at 〈z〉 = 3 toaccount for this possible bias in the mean transmissionmeasurements (dashed vertical lines in bottom panel ofFigure 21).Another possibility that could explain a bias in the

〈F 〉Lyα measured by Becker et al. (2013) is their as-sumption that the metal contamination of the Lyα for-est does not evolve with redshift. While there are fewclear constraints on the aggregate metal contaminationwithin the forest, assuming that the metals actually de-crease with increasing redshfit (e.g., in the case of C IV,Cooksey et al. 2013), then the assumption of an unevolv-

26 Lee et al.

ing metal contribution calibrated at z ≈ 2.3 would leadto an underestimate of 〈F 〉Lyα at higher redshifts, whichcould explain the trend we seem to be seeing.It is clear from the previous discussion that there is

some degeneracy between γ and 〈F 〉Lyα in our transmis-sion PDFs. However, we are primarily interested in γ,while the 〈F 〉Lyα has been extensively measured over theyears allowing strong priors to be placed. In the next sec-tion, we will therefore marginalize over 〈F 〉Lyα in orderto obtain our final results.

6. RESULTS

Due to the uncertainties in 〈F 〉Lyα described in the pre-vious sub-section, for a better comparison between trans-mission PDFs, p, from models with different [γ, T0] wewill marginalize the model likelihoods, L = exp(−χ2/2),over the Lyα forest mean-transmission, 〈F 〉:

L(p |γ, T0) =

∫

∞

−∞

L(p |γ, T0, 〈F 〉) A(〈F 〉) dF, (24)

where A(〈F 〉) is the prior on 〈F 〉 (for clarity in theseequations, 〈F 〉 is used as a shorthand for 〈F 〉Lyα). Weassume a Gaussian prior:

A(〈F 〉) = 1

σF

√

(2π)exp

[

− (〈F 〉 − 〈F 〉B13)2

2σ2F

]

, (25)

where 〈F 〉B13 and σF are the optically-thin Lyα forestmean-transmission and associated errors, respectively,estimated from Becker et al. (2013). Note that for 〈z〉 =3, we have decided to dilate the error bars by a factorof 2 to account for the suspected quasar selection biasdiscussed in the previous section.For each model, we generate transmission PDFs with

different 〈F 〉Lyα (similar to Figure 21) and evaluate thecombined χ2 summed over different S/N. We interpolatethe χ2 over 〈F 〉Lyα to obtain a finer grid, which thenallows us to numerically integrate Equation 24 using five-point Newton-Coates quadrature.At this stage, we also analyze models with different

IGM temperatures at mean-density, T0. Hitherto, wehave been working only with the central T REF model(T0(z = 2.5) ∼ 16000K) , but we now also comparemodels from the T HOT and T COLD simulations, that haveT0(z = 2.5) ∼ 11000K and T0(z = 2.5) ∼ 21500K, re-spectively. Each of these temperature models also sampletemperature-density relationships of γ = [1.0, 1.3, 1.6] fora model grid of 3× 3 parameters at each redshift.The marginalized χ2 values for all the models are tab-

ulated in Table 3, and plotted as a function of γ in Fig-ure 24. In general, the T REFmodels with γ = 1.6 providethe best agreements with the data at all redshifts withχ2 ≈ 60−70 for 69 d.o.f.. The T HOTmodels (with higherIGM temperatures at mean density) provide fits of com-parable quality, and indeed at 〈z〉 = 2.3 the T HOT modelwith γ = 1.3 gives essentially the same goodness-of-fitas the γ = 1.6 T REF model. The cooler T COLD mod-els are less favored by the data, and at 〈z〉 = 2.6 giveunreasonable fits to the data with χ2 = 89 for 69 d.o.f.(P = 5%), but at other redshifts they are acceptable fitsto the data. In other words, the transmission PDF doesnot show a strong sensitivity for T0, which we shall show

TABLE 3Marginalized χ2 for ν = 71 d.o.f.

〈z〉 = 2.3

γ T COLD T REF T HOT

(T0 = 13000K) (T0 = 18000K) (T0 = 23000K)

1.6 87.7 72.9 79.51.3 103.4 76.0 71.81.0 174.2 105.5 88.4

〈z〉 = 2.6


(T0 = 11000K) (T0 = 16000K) (T0 = 21500K)

1.6 88.8 72.0 71.41.3 118.0 82.6 91.81.0 203.3 127.3 111.1

〈z〉 = 3.0


(T0 = 9000K) (T0 = 14000K) (T0 = 19000K)

1.6 61.7 65.1 62.51.3 77.6 72.7 63.81.0 119.5 77.7 85.8

later is due to degeneracy with our LLS model in thelow-transmission end of the transmission PDF.The more important question to address is the possi-

bility of isothermal or inverted temperature-density re-lationships (γ ≤ 1) as suggested by some studies on thetransmission PDF of high-resolution, high-S/N echellequasar spectra (e.g., Bolton et al. 2008; Viel et al. 2009;Calura et al. 2012). It is clear from Table 3 and Fig-ure 24 that for all T0 models the isothermal, γ = 1.0models disagree strongly with the BOSS data. The clos-est match for an isothermal IGM is the T REF model at〈z〉 = 3.0, which yields χ2 = 78 for 69 d.o.f., or a prob-ability of 21% of obtaining the data from this model.However, relative to the γ = 1.6 model at 〈z〉 = 3.0which gives the minimum χ2 at that redshift, we find

∆χ2 ≈ 16 for the isothermal model, i.e. a√

∆χ2 = 4σdiscrepancy from the best-fit model. The isothermalmodel is also strongly disfavored at the other redshifts,where we find ∆χ2 ≈ [15, 40] at 〈z〉 = [2.3, 2.6] or√

(∆χ2) ≈ [3.9σ, 6.3σ]. Since the shape of the trans-mission PDF varies continuously as a function of γ (see,e.g., Bolton et al. 2008; Lee 2012), these results implythat inverted (γ < 1) IGM temperature-density slopesare even more strongly ruled out.

7. DISCUSSION

In this paper, we have studied the 〈z〉 = 2.3−3 Lyα for-est transmission probability distribution function (PDF)from 3373 BOSS DR9 quasar spectra. Although this isa relatively small subsample selected to be in the top95th percentile in terms of S/N, they provide 2 orders-of-magnitude larger Lyα forest path length than high-resolution, high-S/N data sets previously used for thispurpose, providing unprecedented statistical power fortransmission PDF analysis.In order to ensure accurate characterization and al-


Fig. 24.— χ2 values (for 71 d.o.f.) from models with different γand T0 at different redshifts, after marginalizing over uncertaintiesin the mean-transmission 〈F 〉 of the Lyα forest. Models with γ =1.6 are generally favored, although γ = 1.3 with the T HOT model isalso acceptable at 〈z〉 = 2.3. The same quantities are also tabulatedin Table 3.

low subsequent modelling of the spectral noise, we haveintroduced a novel, probabilistic method of combiningthe multiple exposures that comprise each BOSS obser-vation, using the raw sky and calibration data. Thismethod significantly improves the accuracy of the noiseestimation, and additionally allows us to generate mockspectra with noise properties tailored to each individualBOSS spectrum, but self-consistently for different Lyα

forest realizations. We believe that our noise model-ing — which yields noise estimates accurate to ∼ 3%across the relevant wavelength range — is the most care-ful treatment of spectral noise in multi-object fiber spec-tra to-date, and we invite readers with similarly strin-gent requirements in understanding the BOSS spectralnoise to contact the authors. In the future, the spec-tral extraction algorithm described by Bolton & Schlegel(2010) may solve some of the issues which affected us, butthis has yet to be implemented.For the continuum estimation, we used the mean-

flux regulated/principal component analysis (MF-PCA)method introduced in Lee (2012). This method, whichreduces the uncertainty in the continuum estimation toσcont . 5%, fits for a continuum such that the result-ing Lyα forest has a mean-transmission 〈F 〉 matchedto external constraints, for which we use the precisemeasurements by Becker et al. (2013). While MF-PCAdoes require external constraints for 〈F 〉, we argue thatso long as both the real quasars and mock spectra arecontinuum-fitted in exactly the same way, the shape ofthe transmission PDF retains independent informationon the Lyα forest mean-transmission.To compare with the data, we used the detailed hy-

drodynamical simulations of Viel et al. (2013a), that ex-plore a range of IGM temperature-density slopes (γ ≈1.0 − 1.6) and temperatures at mean density (T0(z =2.5) ≈ [11000, 16000, 21500]K). We processed the simu-lated spectra to take account the characteristics of theindividual BOSS spectra in our sample, such as spec-tral resolution, pixel noise, and continuum fitting er-rors. We also incorporate the effects of astrophysical ‘nui-sance’ parameters such as Lyman-limit systems (LLSs)and metal contamination. The LLSs are modeled byadding 1016.5 cm−2 . NHI . 1020.3 cm−2 absorbers intoour mock spectra, based on published measurements ofthe observed incidence lLLS(z) (Ribaudo et al. 2011) andH I column density distribution f(NHI) (Prochaska et al.2010). Meanwhile, contamination from lower-redshiftmetals are modeled in an empirical fashion by insert-ing λrest > 1216 A absorbers observed in lower-redshiftSDSS/BOSS quasars into the same observed wavelengthsof our mock spectra.Our initial models did not provide satisfactory agree-

ment with the transmission PDF measured from theBOSS spectra, with discrepancies at both the high-transmission and low-transmission bins. However, thedifferences between data and models were consistentacross the different S/N subsamples, indicating that ournoise modelling is robust. To resolve the discrepancies atthe low-transmission end of the PDF, we explored vari-ous modifications to our LLS model. Firstly, we steep-ened the column-density distribution slope of partial LLS(16.5 < log10(NHI) < 17.5 systems) to βLLS = −2 a valuesuggested from the mean-free path of ionizing photons(Prochaska et al. 2010). This change relieved the tensionbetween model and data in the F ≈ 0.1 − 0.4 bins, butimplies increasing the number of pLLS by nearly an orderof magnitude, but this is not unreasonable given the cur-rent uncertainties on this population (Janknecht et al.2006; Prochaska et al. 2010). We believe that the ne-cessity of a pLLS distribution with βLLS ≈ −2 to fitthe BOSS Lyα transmission PDF supports the claims

28 Lee et al.

of Prochaska et al. (2010) regarding the column-densitydistribution of this population.However, after adding pLLSs a major discrepancy re-

mained in the saturated F ≈ 0 bins, which we addressedby adjusting the effective b-parameter assumed in all theoptically-thick systems in our model. We found that aneffective value of b = 45 km s−1 gave the best-fit to ourmodel26.At the high-transmission (F & 0.6) end of the model

transmission PDFs, we found that modifying the Lyαforest mean-transmission in the simulations, 〈F 〉Lyα, al-lowed much better agreement with the BOSS data. At〈z〉 = [2.3, 2.6], the 〈F 〉Lyα that gave the best-fittingmodel PDFs were within 1σ of the Becker et al. (2013)measurements, but at 〈z〉 = 3 we required a value thatwas ∼ 2σ larger. We argue that this discrepancy couldbe due to a color-color selection bias in the 3 . zqso . 3.5SDSS quasars used by Becker et al. (2013), which pref-erentially selected sightlines with intervening LLS, giv-ing rise to additional Lyα absorption (and thus lower〈F 〉Lyα) at a level comparable to the errors estimated byBecker et al. (2013). Our BOSS spectra, on the otherhand, should be comparatively unaffected on account ofbeing the brightest quasars in the survey, hence theythey separate more cleanly from the stellar locus in color-space, and were more likely to have been selected withadditional criteria (radio, near-IR, variability etc) be-yond color-color information (Ross et al. 2012).To deal with these uncertainties, we decided to

marginalize over the mean-transmission in our χ2 analy-sis. At 〈z〉 = 2.3, the preferred model is for a hot IGMwith (T0 = 23000K) along with γ = 1.3 (P ≈ 45%),although the intermediate-temperature model (T0 =18000K) with γ = 1.6 is nearly as good a fit withP ≈ 82%. The preferred models at 〈z〉 = [2.6, 3.0]are for γ = 1.6 at temperatures at mean-density ofT0 = [21500, 9000]K (P = [46%, 78%], respectively. Wefind that the isothermal (γ = 1) temperature-density re-lationship is strongly disfavored at all redshifts regardless

of T0, with discrepancies of√

∆χ2 ∼ 4 − 6σ comparedto the best-fit models.One might be skeptical of the results given the various

assumptions we had to make in modelling astrophysicalnuisance parameters. To test the robustness of our re-sults to systematics, we generated 20 iterations of modeltransmission PDFs sampling all nine of our [T0, γ] mod-els (i.e. 180 PDFs in total) in the 〈z〉 = 2.6, S/N=8-10bin, where each iteration has a random realization of thesystematics (LLS, metals, continuum errors etc) drawnin the same way as our Monte-Carlo estimate of system-atic uncertainty (§5.3). We then asked how many timeseach T0 or γ model gave the lowest χ2 when comparedwith the data. For this test we only evaluated the χ2 atthe fiducial 〈F 〉Lyα without marginalization.The results of this test is shown in Figure 25. In the top

panel, the T REF and T HOT models are favored ∼ 40% ofthe time but the T COLD has ∼ 15% of being favored de-pending on the (random) choice of systematics. In otherwords, there is significant degeneracy between our sys-

26 Note that we have quoted an effective b-parameter, whichmust not be confused with the b from individual kinematical com-ponents, which is often quoted by workers carrying out Voigt profileanalysis of high-resolution spectra.

Fig. 25.— Histogram indicating the fraction of times a given T0

(top) or γ (bottom) model is favored for the 〈z〉 = 2.3, S/N=8-10 transmission PDF when the systematics levels in the modelare randomly sampled 20 times. While different systematics couldlead to different best-fitting models for T0, the models with γ =1.6 are always preferred. This indicates some degeneracy in oursystematics model with T0, but our conclusions on γ are robust.

tematics model and T0. We suspect this is driven largelyby the choice of the LLS b-parameter, which changes theshape of the transmission PDF in a similar way to T0

(compare Figure 16 with Figure 17). In contrast, thebottom panel of Figure 25 shows that whatever system-atics we choose, γ = 1.6 is always favored indicating arobust constraint.There is however some degeneracy between γ and

the Lyα forest mean transmission, 〈F 〉Lyα. While wemarginalize over the latter quantity, the choice of priorcan, in principle, affect the results. However, at 〈z〉 =[2.3, 2.6], the chi-squared minimum of the γ = 1.0 PDFmodel as a function of 〈F 〉Lyα is χ2 ≈ 100 for 71 d.o.f.(Figure 21), which has a probability of P ≈ 1%. Inother words, even if we fine-tuned 〈F 〉Lyα in an attemptto force the isothermal model as the best-fit model atthese redshifts, it would still be an unacceptable fit,and the γ = 1.3 model would still be preferred overit. This is less clear-cut at 〈z〉 = 3, where the errorbars are large enough to permit a reasonable minimumchi-squared of χ2 ≈ 70 for 71 d.o.f. using the γ = 1model, but this requires a value of 〈F 〉Lyα = 0.71, whichis 5σ discrepant from the value reported by Becker et al.(2013). While this 〈F 〉Lyα measurement is dependent


on corrections for metals and LLS absorption (and in-deed we argue that they have neglected a subtle biasrelated to SDSS quasar selection), they have attemptedto incorporate these uncertainties into their errors andwe have no particular reason to believe that they haveunderestimated this by a factor of > 5. A quick sur-vey of the available measurements on the forest mean-transmission from the past decade yield 〈F 〉Lyα(z = 3) ≈0.65−0.69 (Kim et al. 2007; Faucher-Giguere et al. 2008;Dall’Aglio et al. 2008), albeit with larger errors. The useof any of these measurements as priors for our analysiswould therefore disfavor an IGM with γ ≤ 1, (which re-quires 〈F 〉Lyα(z = 3) ≥ 0.71), unless all the availableliterature in the field have significantly underestimatedthe mean-transmission.There are several cosmological and astrophysical ef-

fects that we did not model, that could in principle af-fect our conclusions on γ. Since the Lyα forest trans-mission PDF essentially measures the contrast betweenhigh-absorption and low-absorption regions of the IGM,this can be degenerate with the underlying amplitude ofmatter fluctuations which is specified by a combination ofσ8 and ns, the matter fluctuation variance on 8 h−1 Mpcscales and the slope of the amplitude power spectrum,respectively. While these parameters are increasinglywell-constrained (e.g., Planck Collaboration et al. 2013),there is still some uncertainty regarding the level of thefluctuations on the sub-Mpc scales relevant to the Lyαforest which could be degenerate with our γ measure-ment. Bolton et al. (2008) explored this degeneracy be-tween σ8 and γ in the context of transmission PDF mea-surements from high-resolution spectra, and found thatthe PDF is less sensitive to plausible changes in σ8 com-pared to γ, e.g. modifying σ8 by ∆σ8 ± 0.1, affected theshape of the PDF less than a modification of ∆γ ± 0.25(Figure 2 in their paper). This degeneracy is in fact fur-ther weakened when an MCMC analysis of the full pa-rameter space is considered, as shown by the likelihoodcontours in Viel et al. (2009).The astrophysical effects that could be degenerate with

γ include galactic winds and inhomogeneities in the back-ground UV ionizing field. The injection of gas into theIGM by strong galaxy outflows could in principle mod-ify Lyα forest statistics at fixed γ; this was studied us-ing hydrodynamical simulations by Viel et al. (2013b),who concluded that the effect on the PDF is small com-pared to the uncertainties in high-resolution PDF mea-surements. Our BOSS measurement has roughly thesame errors as those from high-resolution spectra oncesystematic uncertainties are taken into account, thereforeit seems unlikely that galactic winds could significantlybias our conclusions on γ. Meanwhile, fluctuations in theUV ionizing background, Γ, that are correlated with theoverall density field could also be degenerate with thetemperature-density relationship (c.f. Equation 1). Thiseffect was studied by McDonald et al. (2005a) in sim-ulations using an extreme model that considered onlyUV background contributions from highly-biased AGN,which maximizes the inhomogeneities. They concludedthat while these UV fluctuations affected forest trans-mission statistics at z ∼ 4, the effect was small at z . 3,the redshift range of our measurements.Various observational and systematic effects could also,

in principle, affect our constraints on γ. For exam-

ple, our modeling of the BOSS spectral resolution as-sumes a Gaussian smoothing kernel which might affectour constraints if this were untrue. However, in theiranalysis of the 1D forest transmission power spectrum,Palanque-Delabrouille et al. (2013) examined the BOSSsmoothing kernel and did not find significant deviationsfrom Gaussianity. There are also possible systematicscaused by our simplified modeling of LLS and metal con-tamination in the data, for example in our assumption ofa single b-parameter for all LLSs and our neglect of veryweak metal absorbers. However, we believe that the testperformed in Figure 25 samples larger differences in thetransmission PDF than those caused by our model sim-plifications, e.g. it seems unlikely that going from a singleLLS b-parameter to a finite b-distribution could cause togreater differences in the flux PDF than varying the sin-gle b-parameter by ±50% as was done in Figure 25. Asfor continuum-estimation, we carry out the exact samecontinuum-fitting procedure on the mock spectra as onthe real quasar spectra, which leads no overall bias sincein both cases the resulting forest transmission field isforced to have the same overall transmission, 〈F 〉cont.The only uncertainty then relates to the distribution ofc′/c− 1, i.e. the per-pixel error of the estimated contin-uum, c′, relative to the true continuum, c. In reality theshape of this distribution could be different between thedata and mocks, whereas within our mocks frameworkwe could only explore overall rescalings of the distribu-tion width. Again, we find it unlikely that differences inthe transmission PDF caused by the true shape of thec′/c − 1 distribution could be so large as to be compa-rable to the effect caused by varying the width of thecontinuum error distribution, that we have examined.While we do not think that the effects described in

the previous few paragraphs qualitatively affect our con-clusion that the BOSS data is inconsistent with isother-mal or inverted IGM temperature-density relationships(γ ≤ 1), when taken in aggregate these systematic un-certainties do weaken our formal 4 − 6 σ limits againstγ ≤ 1 and need to explicitly considered in future analy-ses.

7.1. Astrophysical Implications

How does this compare with other results on the ther-mal state of the IGM? McDonald et al. (2001) analyzedthe transmission PDF from 8 high-resolution, high S/Nspectra and compared with now-obsolete hydrodynami-cal simulations. They found the data to be consistentwith a temperature-density relationship (TDR) with theexpected values of γ ≈ 1.5 (Hui & Gnedin 1997). Morerecently, Bolton et al. (2008) and Viel et al. (2009) car-ried out analyses of the transmission PDF measuredfrom a larger sample (18 spectra) of Lyα forest sight-lines measured by Kim et al. (2007) and found evidencefor an inverted TDR (γ < 1). Viel et al. (2009) foundthat at z ≈ 3.0, the temperature-density relation washighly inverted (γ ≈ 0.5), and remained so as low asz ≈ 2.0 although at the lower redshifts the data wasmarginally consistent with an isothermal IGM. They sug-gested the difference between their results and those ofMcDonald et al. (2001) was due to the now-obsolescentcosmological parameters and less-detailed treatment ofintervening metals in the earlier study. However, Lee(2012) then pointed out that there is a sensitivity of

30 Lee et al.

the measured values of γ from the transmission PDFon continuum-fitting. Since continuum-fitting of high-resolution data generally involves manually placing thecontinuum at Lyα forest transmission peaks which donot necessarily reach the true continuum, it is conceiv-able that continuum biases combined with underesti-mated jacknife errors bars (e.g., Rollinde et al. 2013)could have led Bolton et al. (2008) and Viel et al. (2009)to erroneously deduce an inverted temperature-densityrelation (see Bolton et al. 2014 for a detailed discussionon this point). In our analysis we have fitted our con-tinua using an automated process that is free from thesame continuum-fitting bias, although it does require anassumption on the underlying Lyα forest transmissionwhich we have marginalized over in our analysis.Most recent measurements of the transmission PDF

from high-resolution data have continued to favor anisothermal or inverted γ — Calura et al. (2012) analyzedthe transmission PDF from a sample of z ≈ 3.3 − 3.8quasars and also found an isothermal TDR at z = 3, al-though combining with the Kim et al. (2007) data drovethe estimated γ to inverted values at z < 3. However,Rollinde et al. (2013) carried out a re-analysis of thetransmission PDF from various high-resolution echelledata sets, which included significant overlap with theKim et al. (2007) data. They argue that previous anal-yses have underestimated the error on the transmis-sion PDF, and found the observed transmission PDF tobe consistent with simulations that have γ ≈ 1.4 over2 < z < 3 — this discrepancy is probably also drivenby a different continuum-estimation from the Kim et al.(2007) measurement.The use of other statistics on high-resolution spectra

have however tended to disfavor an isothermal or invertedTDR. Rudie et al. (2012) analyzed the lower-end of theb-NHI cutoff from individual Lyα forest absorbers mea-sured in a set of 15 very high-S/N quasar echelle spectra,and estimated γ ≈ 1.5 at z = 2.4. Bolton et al. (2014)compared the Rudie et al. (2012) measurements to hy-drodynamical simulations and corroborated their deter-mination of the TDR slope.Garzilli et al. (2012) analyzed the Kim et al. (2007)

sample and found that while the transmission PDF sup-ports an isothermal or inverted TDR, a wavelet analy-sis favors γ > 1. Note, however, that the b-NHI cutoffand the transmission PDF are sensitive to different den-sity ranges, with the PDF probing gas densities predom-inantly below the mean (e.g., Bolton et al. 2014).Our result of γ ≈ 1.6 at 〈z〉 = [2.3, 2.6, 3.0] are thus in

rough agreement with measurements that do not involvethe transmission PDF from high-resolution Lyα forestspectra (with the exception of Rollinde et al. 2013). Ourvalue of γ at 〈z〉 = 3 is somewhat unexpected be-cause one expects a flattening of the TDR close to theHe II reionization epoch at z ∼ 3 (Furlanetto & Oh2008; McQuinn et al. 2009; but see Gleser et al. 2005;Meiksin & Tittley 2012), but γ = 1.3 is not strongly dis-

favored (√

(∆χ2) ∼ 2.6)Taken at face value, the TDR during He II reion-

ization can be made steeper by a density-independentreionization and/or a lower heating rate in the IGM(Furlanetto & Oh 2008), which could be reconciled withan extended He II event (Shull et al. 2010; Worseck et al.

2011).Our constraints on γ appear to be in conflict with the

prediction of the theories of Broderick et al. (2012) andChang et al. (2012), who elucidated a relativistic pair-beam channel for plasma-instability heating of the IGMfrom TeV gamma-rays produced by a population of lumi-nous blazars. This mechanism provides a uniform volu-metric heating rate, which would cause an inverted TDRin the IGM (Puchwein et al. 2012) since voids would ex-perience a higher specific heating rate compared withheating by He II reionization alone. This picture hasbeen challenged by the recent study of Sironi & Giannios(2014), who dispute the amount of heating this mecha-nism could provide, since they found that the momentumdispersion of such relativistic pair beams allows ≪ 10%of the beam energies to be deposited into the IGM.However, in this paper we have assumed relatively sim-

ple TDRs in which the bulk of the IGM in the densityrange 0.1 . ∆ . 5 follows a relatively tight power-law.We have therefore not studied more complicated T −∆relationships, e.g., with a spread of temperatures at fixeddensity (e.g., Meiksin & Tittley 2012; Compostella et al.2013) that might be caused by He II reionization or otherphenomena. It is therefore possible that such compli-cated TDRs could result in Lyα forest transmission PDFsthat mimic the γ ≈ 1.6 power-law; this is something thatneeds to be examined in more detail in future work.

7.2. Future Prospects

Looking forward, the subsequent BOSS data releaseswill significantly enlarge our sample size, e.g., DR10(Ahn et al. 2014) is nearly double the size of the DR9sample used in this paper, while the final BOSS sample(DR12) should be three times as large as DR9. In par-ticular, the newer data sets should be sufficiently largefor us to analyze the transmission PDF and constrain γduring the epoch of He II reionization at z > 3. Thiswould be a valuable measurement, since high-resolutionspectra are particularly affected by continuum-fitting bi-ases at these redshifts (Faucher-Giguere et al. 2008; Lee2012).The analysis of the optically-thin Lyα forest trans-

mission PDF from these expanded data sets will havevanishingly small sample errors, and the errors will bedominated by systematic and astrophysical uncertain-ties. At the high-transmission end, our uncertaintiesare dominated by the scatter of the continuum-fitting,which is dominated by the question of whether our quasarPCA templates, derived from low-luminosity low-redshiftquasars (Suzuki et al. 2005), or high-luminosity SDSSquasars (Paris et al. 2011), respectively, are an accuraterepresentation of the BOSS quasars. This uncertaintyshould be eliminated in the near-future by PCA tem-plates derived self-consistently from the BOSS data (NaoSuzuki et al. 2014, in prep). The modelling of metalcontamination could also be improved in the near fu-ture by advances in our understanding of how metals aredistributed in the IGM (e.g., Zhu et al. 2014), althoughmetals are a comparatively minor contribution to the un-certainty in our transmission PDF.We also aim to improve on the rather ad hoc data anal-

ysis in this paper, in which we accounted for some un-certainties in our modelling by incorporating them intoour error covariances (e.g., LLSs, metals, continuum er-


rors), while 〈F 〉Lyα was marginalized over a fixed grid.In future analyses, it would make sense to carry out afull Markov Chain Monte Carlo treatment of all theseparameters which would rigorously account for all theuncertainties and allow straightforward marginalizationover nuisance parameters.Since this paper was initially focused on modelling

the BOSS spectra, for the model comparison we usedonly simulations sampling a very coarse 3 × 3 grid inT0 and γ parameter space, and were unable to take ac-count for uncertainties in other cosmological (σ8, ns etc)and astrophysical (e.g., Jeans’ scale, Rorai et al. 2013;or galactic winds, Viel et al. 2013b) parameters in ouranalysis. However, methods already exist to interpolateLyα forest statistics from hydrodynamical simulationsgiven a set of IGM and cosmological parameters (e.g.,Viel & Haehnelt 2006; Borde et al. 2014; Rorai et al.2013). In the near future we expect to do joint anal-yses using other Lyα forest statistics in conjunction withthe transmission PDF, such as new measurements of thesmall-scale (k & 0.2 s km−1) 1D transmission power spec-trum (Walther et al. 2014, in prep.), moderate-scale(0.002 s km−1 . k . 0.2 s km−1) transmission powerspectrum in both 1D (e.g., Palanque-Delabrouille et al.2013) and 3D (from ultra-dense Lyα forest surveys usinghigh-redshift star-forming galaxies, Lee et al. 2014a,b),the phase angle probability distribution function de-termined from close quasar pair sightlines (Rorai et al.2013), and others. Such efforts would require a finegrid sampling the full set of cosmological and IGM ther-mal parameters in order to ensure that the interpola-tion errors are small compared to the uncertainties inthe data (see e.g., Rorai et al. 2013). Efforts are un-derway to utilize massively-parallel adaptive-mesh re-finement codes (Almgren et al. 2013) to generate suchparameter grids to study the IGM (Lukic et al. 2014)However, one of the findings of this paper is the impor-tance of correct modelling of LLS, in particular partialLLS (1016.5 cm−2 . NHI . 1017.5 cm−2), in account-ing for the shape of the observed Lyα transmission PDF.Since our hydrodynamical simulations did not include ra-diative transfer and cannot accurately capture opticallythick systems, we had to add these in an ad hoc mannerbased on observational constraints which are currentlyrather imprecise. In the near future, we would want touse hydrodynamical simulations with radiative transfer(even if only in post-processing, e.g., Altay et al. 2011;McQuinn et al. 2011; Altay et al. 2013; Rahmati et al.2013) to self-consistently model the optically-thick ab-sorbers in the IGM. With the unprecedented statisticalpower of the full BOSS Lyα forest sample, this couldprovide the opportunity to place unique constraints onthe column-density distribution function of partial LLS.

8. SUMMARY/CONCLUSIONS

In this paper, we analyzed the probability distributionfunction (PDF) of the Lyα forest transmitted flux using3393 BOSS quasar spectra (with 〈S/N〉 ≥ 6) from DataRelease 9 of the SDSS-III survey.To rectify the inaccurate noise estimates in the stan-

dard pipeline, we first carried a custom co-addition ofthe individual exposures of each spectrum, using a prob-abilistic procedure that also separates out the signal andCCD contributions, allowing us to later create mock

spectra with realistic noise properties. We then esti-mated the intrinsic quasar continuum using a mean-fluxregulated technique that reduces the scatter in the es-timated continua by forcing the resultant Lyα forestmean transmission to match the precise estimates ofBecker et al. (2013), although we had to make minor cor-rections on the latter to account for our different assump-tions on optically-thick systems in the data. This nowallows us to measure the transmission PDF in the data,which we do so at 〈z〉 = [2.3, 2.6, 3.0] (with bin widths of∆z = 0.3), and split into S/N subsamples of S/N = [6-8,8-10, 10-25] at each redshift bin.The second part of the paper describe finding a trans-

mission PDF model which describes the data, basedon detailed hydrodynamical simulations of the optically-thin Lyα forest that sample different IGM temperature-density relationship slopes, γ, and temperatures at mean-density, T0 (where T (∆) = T0∆

γ−1). Using these simula-tions we generate mock spectra based on the real spectra.These take into account the following instrumental andastrophysical effects:

Lyman-Limit Systems: These are randomly addedinto our mock spectra based on published incidencerates (Ribaudo et al. 2011) and column-densitydistributions (Prochaska et al. 2010), including alarge population of partial LLS (1016.5 cm−2 ≤NHI ≤ 1017.5 cm−2) with a power-law distributionof roughly f(NHI) ∝ N−2

HI . We assumed an effec-tive b = 45 km s−1 for the velocity width of theseabsorbers.

Metal Contamination: We measure metal absorptionrom the 1260 A . λ . 1390 A restframe region oflower-redshift quasars at the same observed wave-length, then add these directly into our mock spec-tra.

Spectral Resolution and Noise: Each mock spec-trum is smoothed by the dispersion vector of thecorresponding real spectrum (determined by theBOSS pipeline), and we apply corrections whichbring the spectral resolution modeling to within ∼1% accuracy. We then introduce pixel noise basedon the noise parameters estimated by our prob-abilistic co-addition procedure on the real data,which also achieves percent level accuracy on mod-eling the noise.

Continuum Errors: Since we generate a full mock Lyαforest spectrum including the simulated quasarcontinuum (based on the continua fitted to the ac-tual data), we can apply our continuum-estimationprocedure on each mock to fit a new continuum.The difference between the new continuum and theunderlying simulated quasar continuum yields anestimate of the continuum error.

We then compare the model transmission PDFs withthe data, using an error covariance that includes bothbootstrap errors and systematic uncertainties in themodel components described above. At 〈z〉 = 3.0 wefind a discrepancy in the assumed Lyα forest mean-transmission, 〈F 〉Lyα, between our data and that derivedfrom Becker et al. (2013), which we argue is likely caused

32 Lee et al.

by a selection bias in the SDSS quasars used by the lat-ter. We therefore marginalize out these uncertainties in〈F 〉Lyα to obtain our final results.The models with an IGM temperature-density slope

of γ = 1.6 give the best-fit to the data at all ourredshift bins (〈z〉 = [2.3, 2.6, 3.0]). Models with anisothermal or inverted temperature-density relationship(γ ≤ 1) are disfavored at the

√

(∆χ2) = [3.9, 6.3, 4.0]σat 〈z〉 = [2.3, 2.6, 3.0], respectively. Due to a degeneracywith our LLS model, we are unable to put robust con-straints on T0 but we have checked that our conclusionson γ are robust to such systematics as can be consideredwithin our model framework. There are other possiblesystematics we did not consider that could in principleaffect our measurement, such as cosmological parameters(σ8, ns) and astrophysical effects (galactic winds, inho-mogeneous UV ionizing background), but we argue thatthese are unlikely to qualitatively affect our conclusions.

We thank Michael Strauss, J. Xavier Prochaska, GaborWorseck, and Joop Schaye for useful comments and dis-cussion. We also thank the members of the ENIGMAgroup (http://www.mpia-hd.mpg.de/ENIGMA/) at theMax Planck Institute for Astronomy (MPIA) for helpfuldiscussions. JFH. acknowledges generous support fromthe Alexander von Humboldt foundation in the contextof the Sofja Kovalevskaja Award. The Humboldt founda-tion is funded by the German Federal Ministry for Ed-ucation and Research. The hydrodynamic simulationsin this work were performed using the COSMOS Su-percomputer in Cambridge (UK), which is sponsored bySGI, Intel, HEFCE and the Darwin Supercomputer ofthe University of Cambridge High Performance Comput-ing Service (http://www.hpc.cam.ac.uk/), provided byDell Inc. using Strategic Research Infrastructure Fund-

ing from the Higher Education Funding Council for Eng-land. COSMOS and DARWIN are part of the DIRAChigh performance computing facility funded by STFC.MV is supported by the FP7 ERC grant “cosmoIGM”GA-257670, PRIN-MIUR and INFN/PD51 grants. JSBacknowledges the support of a Royal Society UniversityResearch Fellowship. BL acknowledges support from theNSF Astronomy and Astrophsics Fellowship grant AST-1202963.Funding for SDSS-III has been provided by the Alfred

P. Sloan Foundation, the Participating Institutions, theNational Science Foundation, and the U.S. Departmentof Energy Office of Science. The SDSS-III web site ishttp://www.sdss3.org/.SDSS-III is managed by the Astrophysical Research

Consortium for the Participating Institutions of theSDSS-III Collaboration including the University ofArizona, the Brazilian Participation Group, BrookhavenNational Laboratory, University of Cambridge, CarnegieMellon University, University of Florida, the FrenchParticipation Group, the German Participation Group,Harvard University, the Instituto de Astrofisica deCanarias, the Michigan State/Notre Dame/JINA Par-ticipation Group, Johns Hopkins University, LawrenceBerkeley National Laboratory, Max Planck Institutefor Astrophysics, Max Planck Institute for Extrater-restrial Physics, New Mexico State University, NewYork University, Ohio State University, PennsylvaniaState University, University of Portsmouth, PrincetonUniversity, the Spanish Participation Group, Universityof Tokyo, University of Utah, Vanderbilt University,University of Virginia, University of Washington, andYale University.

REFERENCES

Abazajian, K., Adelman-McCarthy, J. K., Agueros, M. A., et al.2005, AJ, 129, 1755

Abazajian, K. N., Adelman-McCarthy, J. K., Agueros, M. A.,et al. 2009, ApJS, 182, 543

Ahn, C. P., Alexandroff, R., Allende Prieto, C., et al. 2012, ApJS,203, 21

—. 2014, ApJS, 211, 17Almgren, A. S., Bell, J. B., Lijewski, M. J., Lukic, Z., & Van

Andel, E. 2013, ApJ, 765, 39Altay, G., Theuns, T., Schaye, J., Booth, C. M., & Dalla Vecchia,

C. 2013, MNRAS, 436, 2689Altay, G., Theuns, T., Schaye, J., Crighton, N. H. M., & Dalla

Vecchia, C. 2011, ApJ, 737, L37

Anderson, L., Aubourg, E., Bailey, S., et al. 2014, MNRAS, 441,24

Becker, G. D., Bolton, J. S., Haehnelt, M. G., & Sargent,W. L. W. 2011, MNRAS, 410, 1096

Becker, G. D., Hewett, P. C., Worseck, G., & Prochaska, J. X.2013, MNRAS, 430, 2067

Bernardi, M., Sheth, R. K., SubbaRao, M., et al. 2003, AJ, 125,32

Bird, S., Vogelsberger, M., Sijacki, D., et al. 2013, MNRAS, 429,3341

Bolton, A. S., & Schlegel, D. J. 2010, PASP, 122, 248

Bolton, A. S., Schlegel, D. J., Aubourg, E., et al. 2012, AJ, 144,144

Bolton, J. S., Becker, G. D., Haehnelt, M. G., & Viel, M. 2014,MNRAS, 438, 2499

Bolton, J. S., Oh, S. P., & Furlanetto, S. R. 2009, MNRAS, 395,736

Bolton, J. S., Viel, M., Kim, T., Haehnelt, M. G., & Carswell,R. F. 2008, MNRAS, 386, 1131

Borde, A., Palanque-Delabrouille, N., Rossi, G., et al. 2014,ArXiv e-prints, arXiv:1401.6472

Bovy, J., Hennawi, J. F., Hogg, D. W., et al. 2011, ApJ, 729, 141Broderick, A. E., Chang, P., & Pfrommer, C. 2012, ApJ, 752, 22Busca, N. G., Delubac, T., Rich, J., et al. 2013, A&A, 552, A96Calura, F., Tescari, E., D’Odorico, V., et al. 2012, MNRAS, 422,

3019Cen, R., Miralda-Escude, J., Ostriker, J. P., & Rauch, M. 1994,

ApJ, 437, L9Chang, P., Broderick, A. E., & Pfrommer, C. 2012, ApJ, 752, 23Compostella, M., Cantalupo, S., & Porciani, C. 2013, MNRAS,

435, 3169Cooksey, K. L., Kao, M. M., Simcoe, R. A., O’Meara, J. M., &

Prochaska, J. X. 2013, ApJ, 763, 37Croft, R. A. C., Weinberg, D. H., Katz, N., & Hernquist, L. 1998,

ApJ, 495, 44Dall’Aglio, A., Wisotzki, L., & Worseck, G. 2008, A&A, 491, 465Dave, R., Hernquist, L., Katz, N., & Weinberg, D. H. 1999, ApJ,

511, 521Dawson, K. S., Schlegel, D. J., Ahn, C. P., et al. 2013, AJ, 145, 10Desjacques, V., Nusser, A., & Sheth, R. K. 2007, MNRAS, 374,

206Draine, B. T. 2011, Physics of the Interstellar and Intergalactic

Medium (Princeton University Press)Eisenstein, D. J., Weinberg, D. H., Agol, E., et al. 2011, AJ, 142,

72Faucher-Giguere, C., Prochaska, J. X., Lidz, A., Hernquist, L., &

Zaldarriaga, M. 2008, ApJ, 681, 831

http://www.mpia-hd.mpg.de/ENIGMA/

http://www.hpc.cam.ac.uk/

http://www.sdss3.org/


Font-Ribera, A., McDonald, P., & Miralda-Escude, J. 2012,JCAP, 1, 1

Font-Ribera, A., & Miralda-Escude, J. 2012, JCAP, 7, 28Furlanetto, S. R., & Oh, S. P. 2008, ApJ, 682, 14Garzilli, A., Bolton, J. S., Kim, T.-S., Leach, S., & Viel, M. 2012,

MNRAS, 424, 1723Gleser, L., Nusser, A., Benson, A. J., Ohno, H., & Sugiyama, N.

2005, MNRAS, 361, 1399Gnedin, N. Y., & Hui, L. 1998, MNRAS, 296, 44Gunn, J. E., & Peterson, B. A. 1965, ApJ, 142, 1633Gunn, J. E., Siegmund, W. A., Mannery, E. J., et al. 2006, AJ,

131, 2332Horne, K. 1986, PASP, 98, 609Hui, L., & Gnedin, N. Y. 1997, MNRAS, 292, 27Hui, L., Gnedin, N. Y., & Zhang, Y. 1997, ApJ, 486, 599Hui, L., & Haiman, Z. 2003, ApJ, 596, 9Janknecht, E., Reimers, D., Lopez, S., & Tytler, D. 2006, A&A,

458, 427Jenkins, E. B., & Ostriker, J. P. 1991, ApJ, 376, 33Kim, T., Bolton, J. S., Viel, M., Haehnelt, M. G., & Carswell,

R. F. 2007, MNRAS, 382, 1657Komatsu, E., Smith, K. M., Dunkley, J., et al. 2011, ApJS, 192,

18Lee, K.-G. 2012, ApJ, 753, 136Lee, K.-G., Hennawi, J. F., White, M., Croft, R. A. C., & Ozbek,

M. 2014a, ApJ, 788, 49Lee, K.-G., & Spergel, D. N. 2011, ApJ, 734, 21Lee, K.-G., Suzuki, N., & Spergel, D. N. 2012, AJ, 143, 51Lee, K.-G., Bailey, S., Bartsch, L. E., et al. 2013, AJ, 145, 69Lee, K.-G., Hennawi, J. F., Stark, C., et al. 2014b, ApJ, 795, L12Lewis, A., Challinor, A., & Lasenby, A. 2000, ApJ, 538, 473Lidz, A., Faucher-Giguere, C.-A., Dall’Aglio, A., et al. 2010, ApJ,

718, 199Lukic, Z., Stark, C., Nugent, P., et al. 2014, ArXiv e-prints,

arXiv:1406.6361Lundgren, B. F., Brunner, R. J., York, D. G., et al. 2009, ApJ,

698, 819Lynds, R. 1971, ApJ, 164, L73McDonald, P., & Eisenstein, D. J. 2007, Phys. Rev. D, 76, 063009McDonald, P., Miralda-Escude, J., Rauch, M., et al. 2001, ApJ,

562, 52—. 2000, ApJ, 543, 1McDonald, P., Seljak, U., Cen, R., Bode, P., & Ostriker, J. P.

2005a, MNRAS, 360, 1471McDonald, P., Seljak, U., Cen, R., et al. 2005b, ApJ, 635, 761McDonald, P., Seljak, U., Burles, S., et al. 2006, ApJS, 163, 80McQuinn, M., Lidz, A., Zaldarriaga, M., et al. 2009, ApJ, 694, 842McQuinn, M., Oh, S. P., & Faucher-Giguere, C.-A. 2011, ApJ,

743, 82McQuinn, M., & White, M. 2011, MNRAS, 415, 2257Meiksin, A., & Tittley, E. R. 2012, MNRAS, 423, 7Meiksin, A. A. 2009, Reviews of Modern Physics, 81, 1405Miralda-Escude, J., Cen, R., Ostriker, J. P., & Rauch, M. 1996,

ApJ, 471, 582Miralda-Escude, J., & Rees, M. J. 1994, MNRAS, 266, 343Noterdaeme, P., Petitjean, P., Carithers, W. C., et al. 2012, A&A,

547, L1O’Shea, B. W., Bryan, G., Bordner, J., et al. 2004, ArXiv

Astrophysics e-prints, arXiv:astro-ph/0403044Palanque-Delabrouille, N., Yeche, C., Borde, A., et al. 2013,

A&A, 559, A85Paris, I., Petitjean, P., Rollinde, E., et al. 2011, A&A, 530, A50

Paris, I., Petitjean, P., Aubourg, E., et al. 2012, A&A, 548, A66Penton, S. V., Shull, J. M., & Stocke, J. T. 2000, ApJ, 544, 150Petitjean, P., Webb, J. K., Rauch, M., Carswell, R. F., &

Lanzetta, K. 1993, MNRAS, 262, 499Pieri, M. M., Frank, S., Weinberg, D. H., Mathur, S., & York,

D. G. 2010, ApJ, 724, L69

Pieri, M. M., Mortonson, M. J., Frank, S., et al. 2014, MNRAS,441, 1718

Planck Collaboration, Ade, P. A. R., Aghanim, N., et al. 2013,ArXiv e-prints, arXiv:1303.5076

Prochaska, J. X., Herbert-Fort, S., & Wolfe, A. M. 2005, ApJ,635, 123

Prochaska, J. X., O’Meara, J. M., & Worseck, G. 2010, ApJ, 718,392

Prochaska, J. X., & Wolfe, A. M. 1997, ApJ, 487, 73Puchwein, E., Pfrommer, C., Springel, V., Broderick, A. E., &

Chang, P. 2012, MNRAS, 423, 149Rahmati, A., Pawlik, A. H., Raicevic, M., & Schaye, J. 2013,

MNRAS, 430, 2427Rauch, M. 1998, ARA&A, 36, 267Rauch, M., Carswell, R. F., Chaffee, F. H., et al. 1992, ApJ, 390,

387Ribaudo, J., Lehner, N., & Howk, J. C. 2011, ApJ, 736, 42Ricotti, M., Gnedin, N. Y., & Shull, J. M. 2000, ApJ, 534, 41Rollinde, E., Theuns, T., Schaye, J., Paris, I., & Petitjean, P.

2013, MNRAS, 428, 540Rorai, A., Hennawi, J. F., & White, M. 2013, ApJ, 775, 81Ross, N. P., Myers, A. D., Sheldon, E. S., et al. 2012, ApJS, 199, 3Rudie, G. C., Steidel, C. C., & Pettini, M. 2012, ApJ, 757, L30Rudie, G. C., Steidel, C. C., Shapley, A. E., & Pettini, M. 2013,

ApJ, 769, 146Schaye, J., Aguirre, A., Kim, T.-S., et al. 2003, ApJ, 596, 768Schaye, J., Theuns, T., Leonard, A., & Efstathiou, G. 1999,

MNRAS, 310, 57Schaye, J., Theuns, T., Rauch, M., Efstathiou, G., & Sargent,

W. L. W. 2000, MNRAS, 318, 817Schmidt, M. 1965, ApJ, 141, 1295Schneider, D. P., Richards, G. T., Hall, P. B., et al. 2010, AJ,

139, 2360Seljak, U., Makarov, A., McDonald, P., et al. 2005, Phys. Rev. D,

71, 103515Shull, J. M., France, K., Danforth, C. W., Smith, B., &

Tumlinson, J. 2010, ApJ, 722, 1312Sironi, L., & Giannios, D. 2014, ApJ, 787, 49Slosar, A., Font-Ribera, A., Pieri, M. M., et al. 2011, JCAP, 9, 1Slosar, A., Irsic, V., Kirkby, D., et al. 2013, JCAP, 4, 26Smee, S. A., Gunn, J. E., Uomoto, A., et al. 2013, AJ, 146, 32Springel, V., Di Matteo, T., & Hernquist, L. 2005, MNRAS, 361,

776Stoughton, C., Lupton, R. H., Bernardi, M., et al. 2002, AJ, 123,

485Suzuki, N. 2006, ApJS, 163, 110Suzuki, N., Tytler, D., Kirkman, D., O’Meara, J. M., & Lubin, D.

2005, ApJ, 618, 592Telfer, R. C., Zheng, W., Kriss, G. A., & Davidsen, A. F. 2002,

ApJ, 565, 773Theuns, T., Leonard, A., Efstathiou, G., Pearce, F. R., &

Thomas, P. A. 1998, MNRAS, 301, 478Theuns, T., Schaye, J., Zaroubi, S., et al. 2002, ApJ, 567, L103Tytler, D., Kirkman, D., O’Meara, J. M., et al. 2004, ApJ, 617, 1Vanden Berk, D. E., Richards, G. T., Bauer, A., et al. 2001, AJ,

122, 549Viel, M., Becker, G. D., Bolton, J. S., & Haehnelt, M. G. 2013a,

Phys. Rev. D, 88, 043502Viel, M., Bolton, J. S., & Haehnelt, M. G. 2009, MNRAS, 399,

L39Viel, M., & Haehnelt, M. G. 2006, MNRAS, 365, 231Viel, M., Haehnelt, M. G., & Springel, V. 2004, MNRAS, 354, 684Viel, M., Schaye, J., & Booth, C. M. 2013b, MNRAS, 429, 1734Worseck, G., & Prochaska, J. X. 2011, ApJ, 728, 23Worseck, G., Prochaska, J. X., McQuinn, M., et al. 2011, ApJ,

733, L24York, D. G., Adelman, J., Anderson, Jr., J. E., et al. 2000, AJ,

120, 1579Zhu, G., Menard, B., Bizyaev, D., et al. 2014, MNRAS, 439, 3139

APPENDIX

In this Appendix, we describe our probabilistic procedure for combining the multiple BOSS exposures of each

34 Lee et al.

spectrum27 while simultaneously estimating the noise variance in terms of a parametrized model. We assume the noisein each pixel can be described by

σ2λi = A1Sλi (Fλ + sλi

) +A2S2λiσ

2RN,effσdisp(λ) (1)

whereSλi = Sλi (1− exp(−A3λ+A4)) . (2)

The true object flux Fλ and Aj=1−4 are noise parameters which we will determine given the individual exposure spectrafλ,i, sky flux estimates sλ,i, and calibrations vectors Sλ,i (which convert between detector counts and photons). σRN,eff

is the effective read noise which we fixed to σRN,eff = 12; this can be thought of as an effective number of pixels timesthe true read noise of the CCD squared, which we multiplied by the spectrograph dispersion σdisp(λ) to approximatelyaccount for the change in spot-size as a function of wavelength. Equation 2 parametrizes wavelength-dependent biasesin the calibration vector.We search for the model that best describes the multiple exposure spectra fλi, where our model parameters are Aj

from Eq. (1) and Fλ is the true flux of the object. In what follows, we will outline a method for determining theposterior distribution P (Aj ,Fλ|fλi) using a Markov Chain Monte Carlo (MCMC) method. From this distribution, wecan obtain both an accurate model for the noise via Eq. (1), and our final combined spectrum. The estimates for Aj

can also be used to self-consistently generate pixel noise in mock Lyα forest spectra.The probability of the data given the model, or the likelihood, can be written

L(A,Fλ)=P (fλi|Aj ,Fλ)

=∏

λi

1√2πσλi

exp

(

(fλi −Fλ)2

2σ2λi

)

. (3)

Note that individual exposure data fλi are on the native wavelength grid of each CCD exposure, whereas theBOSS pipeline interpolates and then combines these individual spectra into a final co-added spectrum, defined ona wavelength grid with uniform spacing. Furthermore, flexure and other variations in the spectrograph wavelengthsolution will result in small (typically sub-pixel) shifts between the individual exposure wavelength grids. In Eq. (3)our model Fλ must be computable at every wavelength fλi of the individual exposures. We are free to choose thewavelengths at which Fλ is represented, but this choice is a subtle issue for several reasons. First, note that we want toavoid interpolating the data, fλi, onto the model wavelength grid, as this would correlate the data pixels, and requirethat we track covariances in the likelihood in Eq. (3), making it significantly more complicated and challenging toevaluate. Similarly, it is undesirable to interpolate our model Fλ, as this would introduce correlations in the modelparameters, making it much more difficult to sample them with our MCMC. Finally, note that Fλ also representsour final co-added spectrum, so we might consider opting for a a uniform wavelength grid, similar to what is doneby the BOSS pipeline. Our approach is to simply determine the model flux Fλ at each wavelength of the individualexposures fλi. Shifts among the individual exposure wavelength grids result in a more finely sampled model grid. Forthe reasons explained above, we use nearest grid point (NGP) interpolation, so that the fλi are evaluated on the Fλ

grid (and vice versa) by assigning the value from the single nearest pixel.In our MCMC iterations, we use the standard Metropolis-Hastings criterion to sample the parameters Aj , with trials

drawn from a uniform prior. For the Fλ, we exploit an analogy with Gibbs sampling, which dramatically simplifiesMCMC for likelihood functions with a multivariate Gaussian form. Gibbs sampling exploits the fact that given amultivariate distribution, it is much simpler to sample from conditional distributions than to integrate over a jointdistribution. To be more specific, the likelihood in Eq. (3) is proportional to the joint probability distribution of thenoise parameters Aj and Fλ, but it is also proportional to the conditional probability distribution of the Fλ at fixedAj . With Aj fixed the probability of Fλ is then

P (Fλ|A, fλi) ∝∏

λi

1√2πσλi

exp

(

(fλi −Fλ)2

2σ2λi

)

, (4)

which is very nearly a multivariate Gaussian distributions for Fλ with a diagonal covariance matrix. The equationabove slightly deviates from a Gaussian because the σλi depend on Fλ via Eq. (1). In what follows, we ignore thissmall deviation, and assume that the conditional PDF of the Fλ (at fixed Aj) is Gaussian.Given that Eq. 4 is a multivariate Gaussian with diagonal covariance, the Gibbs sampling of the Fλ becomes trivial.

Since, Eq. (4) can be factored into a product of individual Gaussians, we need not follow the standard Gibbs samplingalgorithm, whereby each parameter is updated sequentially holding the others fixed. Instead we need only hold Aj

fixed (since the likelihood is not Gaussian in these parameters), and we can sample all of the Fλ simultaneously. Thissimplification, which dramatically speeds up the algorithm, is possible because the conditional distribution for Fλ

can be factored into a product of Gaussians for each pixel Fλ, thus the conditional distribution at any wavelength iscompletely independent of all the others.

27 Defined as unique combinations of plate number, fiber number and MJD of observation.


Completing the square in Eq. (4) we can then write

P (Fλ|Aj , fλi) ∝∏

λ

exp

(

(fopt,λ −Fλ)2

2σ2opt,λ

)

(5)

where

fopt,λ ≡ 1

σ2opt,λ

∑

i

fλiσ2λi

and1

σ2opt,λ

≡∑

i

1

σ2λi

. (6)

The expressions above for fopt,λ and σ2opt,λ simply represent the optimally combined flux estimator and the resulting

variance. Thus one can think of our MCMC algorithm as performing an optimal combination of the individual exposurespectra fλi, whereby the noise is simultaneously determined via an iterative procedure.

Thus the basic steps of our algorithm can be summarized as follows:• Initialize, by creating a model λ grid from all unique wavelengths in the individual exposures, and use NGPinterpolation to assign a fλi to this grid for each exposures.

• Choose a starting guess for noise parameters Aj . For the starting Fλ use Fλ = fopt,λ from Eq. (6), but with themodel σλi replaced by the noise delivered by the pipeline

• Begin the MCMC loop:

1. Use the current values of Aj and Fλ to compute the variance σ2λi for each exposure via Eq. (1).

2. Compute fopt,λ and σ2opt,λ from Eq. 6.

3. Take a Gibbs step for each wavelength of Fλ = fopt,λ+ gλσopt,λ simultaneously, where gλ is a vector of unitvariance Gaussian deviates.

4. Use NGP to interpolate the model Fλ onto each individual exposure fλi wavelength grid.

5. Compute the likelihood L(Aj ,Fλ) according to Eq. (3)

6. Take trial steps in the Aj according to Aj,try = Aj + gjdAj , where dAj is a stepsize and gj is a Gaussiandeviate between zero and one, drawn for each individual noise parameter Aj .

7. Compute the likelihood at L(Aj,try,Fλ)

8. Apply the Metropolis-Hastings criteria to the likelihood difference. If it is satisfied then accept the values ofAj as part of the Markov chain. If not, then use the previous values. Note that the Fλ are always accepted,because they are Gibbs sampled.

• Use only the second half of the chain for the posterior distributions, as the first half is the burn in phase.

Our MCMC algorithm directly determines the posterior distribution P (Aj ,Fλ|fλi), which provides all the informa-tion we need to construct mock spectra using Eq. (1) as described in §4.5.The distribution of P (Fλ|fλi), on the other hand, contains everything we need to know about the combined spectrum.

Namely, we can define

Fλ ≡∫

P (Fλ|fλi)FλdFλ (7)

as the combined spectrum, and

σ2λ ≡

∫

P (Fλ|fλi)(Fλ − Fλ)2dFλ (8)

as its variance. If the formal noise returned by BOSS pipeline were actually the true noise in the data, then our Fλ inEq. (8) would be equivalent to the optimally combined noise and our variance the optimal variance, i.e. according toEq. (6). In practice, the BOSS pipeline does not return the true noise and so our Fλ is optimal whereas the pipelineflux is sub-optimal, and our σ2

λ is an empirical estimate of the actual noise in the data.

arXiv:1405.1072v2 [astro-ph.CO] 5 Dec 2014 · 2018. 9. 19. · At time of writing, the full BOSS...

Documents

Transcript of arXiv:1405.1072v2 [astro-ph.CO] 5 Dec 2014 · 2018. 9. 19. · At time of writing, the full BOSS...