1) Since I was a wee astronomer I was taught that not using "relatively" is
a virtue, as it is a word with hardly any information content. Your paper
is far from virtuous according to that metric. I counted 9 instances, may
have missed some :-) I think it can be removed from all those sentences with
no loss of information to appease the ghosts of Strunk & White.

*** N(relatively)=10!  I chopped out seven of them.

I also have a rather self-serving suggestion for a much neglected reference
to an observation of a submm-mm excess in a low metallicity galaxy with
a reasonably serious attempt at understanding the cause that predates the
list you now have in the intro: Bolatto et al. 2000, ApJ, 532, 909.

*** Thanks.  It's now included.

2) S3.2, "SPIRE observations for six of our galaxies were observed..."
Maybe obtained?

*** Fixed

3) S3.4, apertures are "chosen by eye" to encompass all emission, but then
there is an aperture correction (i.e., there is emission that is missed).
That seems a contradiction. Perhaps a little rephrasing would help.

4) Is the identification of background sources done independently for
each band, or jointly? At 500 um one may be more likely to miss 
background galaxies because of the poor resolution, and that may be important for those 
dwarfs you point out have significant contamination. It may be also important to 
quantify the 500 um excess properly (more on that later).

*** Jointly

5) I understand why you use it, but the 3.6 um seems like a really bad 
template for the ISM. You give the caveat, and since the correction is small that is
probably OK.

6) S4.1, reddening curve of Li & Draine (2001). Maybe it'd be good to 
mention out to what wavelength you corrected for Galactic extinction. Also, do 
PACS and SPIRE use the IRAS system or the MIPS system? (you say no color corrections
are applied, I just what to know how large they are likely to be).

*** I added a footnote quantifying the reddening corrections.  The MIPS calibration assumes f_nu \propto nu^2, whereas the PACS and SPIRE calibrations assume f_nu \propto nu^{-1}.  IRAS assumes f_nu \propto nu^{-1}.  I accounted for these differences in the comparison.

7) I like the nice concise explanation of the Draine model. Note however an
inconsistency: in pg. 12 we say that dust with U>Umin is the PDR component,
but in the Summary (before eq. 9) we say that U>100 for fPDR.

*** Yeah, Wolfire pointed that out as well.  I was following the definitions laid out in Draine & Li 2007 and Draine et al. 2007, but as you point out, there is an inconsistency.  I've removed "PDR" from the discussion surrounding Equation 9, but I still want to compute the U>100 fraction for consistency with the results published in Draine's SINGS paper.

8) S4.4: In discussing Fig. 4, you seem to miss the most apparent point,
which is that there are clear systematic trends with 70/160 "temperature"
and the largest systematic differences occur for "cooler" galaxies (maybe
that isn't true, but certainly the scatter is much less for cooler 
galaxies).

*** Hmmm ... There is no trend for gamma, and I'm not sure about the others.  But I agree that the dispersion is smaller for cooler galaxies.  Text modified accordingly (in a couple spots).

9) In the second paragraph we say that masses for metal-rich galaxies get
smaller with Herschel. Then we say that Galametz finds that metal-rich
galaxies have the SED peak beyond 160 um (i.e., they are cool), so 
Herschel is necessary. To me this is a bit of a puzzle: it seems to suggest that
without Herschel one tends to think the dust in on average colder
than it really is, so assign the galaxies bigger masses. I don't know
why the system would behave that way. But one thing is clear: looking
at Figs. 4 and 5 it is transparent that we are exchanging Umin for mass
(the plots are essentially mirrored). So what the inclusion of Herschel
is doing is driving Umin to larger values for metal-rich galaxies, which
means that we need less dust mass. It is also driving Umin to smaller
values for low metallicity galaxies, which increases their dust mass.
And I image that is all due to the delta(U-Umin) in Eq. 4, which encompasses
most of the dust in the system (Gamma changes a bit in the fits, but
probably not enough to compensate). I'd have expected Umin to be mostly
set by the peak of the SED, but it seems that it's really set by the
RJ tail that comes from Herschel. I haven't played with the fits,
so I don't really know how orthogonal Umin is to the rest of the free
parameters, but I wouldn't have necessarily predicted this behavior :-)

*** The mirroring of U_min and M_dust is based on the fact that I compute M_dust using Equations 33 & 34 of Draine & Li 2007.  I'll need to check with Bruce and see if he thinks it should be calculated differently.

10) S4.5. People don't necessarily use modified BB fits because they are 
"quick and simple". They use them because they don't have the entire SED that is
needed to fit a Draine model!

11) For the BB fits, an important point is how are the SED points
weighted in the fit.

*** They are weighted by their uncertainties.  I added that nugget to the figure caption.

12) Isn't it surprising that using the wrong beta (beta=2 instead
of beta=1.6) gives a better result (only 20% off, instead of 70%).

*** The temperature drops a bit in the beta=2.0 and 70<lambda<500 scenario, resulting in colder dust and thus more mass.  So one could argue that the fits are poorer in this case, and artificially result in better agreement.  I now note this in the text.

13) At the end of S4.5, you seem to imply that warm dust accounts
for 1/2 the mass of dust (since that's the order of the discrepancy)...
That seems hard to believe.

*** You weren't the only person to pick up on that poor logic.  I've reworded the text with a better explanation:
"Figure~\ref{fig:Draine_v_BB} shows a primary reason for the discrepancy: even when limited to $\lambda \geq 100\micron$ photometry, single-temperature blackbody fits overestimate the dust temperature, thus underestimating the dust mass.  The single-temperature model does not account for the contribution of warm dust emitting at shorter wavelengths and the temperatures are driven towards higher values in the attempt to fit both the short and long wavelength far-infrared emission.  This effect is accentuated for galaxies in which the bulk of the dust is cool, because a single modified blackbody poorly fits both the cool dust peak and the warmer dust emitting at shorter wavelengths."

14) In S4.6, and also in the Summary, you fall back to identifying
a submm excess with cold dust. I know this is Maud's explanation, but
it is only one of the explanations (as you discuss in the intro) and
it may not even be the best explanation (or a physical explanation :-).
So I think it'd be better to rephrase statements like low metallicity
sources "potentially harbor the coldest dust" to include the other
possibilities. I'm also a bit concerned about the magnitude of the
excess, given that it seems like the 500 um detections of these galaxies
are 3-4 sigma.

*** I have modified that discussion.

15) Concerning the submm excess, there are a couple of Planck papers that
are very relevant and should be referenced: "The Planck View of Nearby
Galaxies", and "Origin of the submillimetre excess dust emission in the 
Magellanic Clouds".  I think you referenced the latter in the intro.

*** In fact I referenced both.  I've now added the arxiv #s, since those two papers are still not in the journals.

16) Summary, why would a dust grain care whether the radiation
field is hard or not? It should only care about the energy density.
The Bot (2010) reference is parenthesized within the parenthesis.

*** Fixed

17) "superior ability". Not that I disagree with the Draine fits
being more physical :-) I agree that the key is the fact that they
include a range of radiation fields, and indeed most dust is illuminated
by a low radiation field. I'd just remove the "superior" :-)