Ir para o conteúdo

Software livre Brasil

Tela cheia
 Feed RSS


28 de Maio de 2009, 0:00 , por Desconhecido - | Ninguém está seguindo este artigo ainda.
In this blog I share technical information, write about projects I'm involved with, or about cool/fun/interesting stuff I find.

pristine-tar updates

9 de Outubro de 2017, 15:01, por Antonio Terceiro


pristine-tar is a tool that is present in the workflow of a lot of Debian people. I adopted it last year after it has been orphaned by its creator Joey Hess. A little after that Tomasz Buchert joined me and we are now a functional two-person team.

pristine-tar goals are to import the content of a pristine upstream tarball into a VCS repository, and being able to later reconstruct that exact same tarball, bit by bit, based on the contents in the VCS, so we don’t have to store a full copy of that tarball. This is done by storing a binary delta files which can be used to reconstruct the original tarball from a tarball produced with the contents of the VCS. Ultimately, we want to make sure that the tarball that is uploaded to Debian is exactly the same as the one that has been downloaded from upstream, without having to keep a full copy of it around if all of its contents is already extracted in the VCS anyway.

The current state of the art, and perspectives for the future

pristine-tar solves a wicked problem, because our ability to reconstruct the original tarball is affected by changes in the behavior of tar and of all of the compression tools (gzip, bzip2, xz) and by what exact options were used when creating the original tarballs. Because of this, pristine-tar currently has a few embedded copies of old versions of compressors to be able to reconstruct tarballs produced by them, and also rely on a ever-evolving patch to tar that is been carried in Debian for a while.

So basically keeping pristine-tar working is a game of Whac-A-Mole. Joey provided a good summary of the situation when he orphaned pristine-tar.

Going forward, we may need to rely on other ways of ensuring integrity of upstream source code. That could take the form of signed git tags, signed uncompressed tarballs (so that the compression doesn’t matter), or maybe even a different system for storing actual tarballs. Debian bug #871806 contains an interesting discussion on this topic.

Recent improvements

Even if keeping pristine-tar useful in the long term will be hard, too much of Debian work currently relies on it, so we can’t just abandon it. Instead, we keep figuring out ways to improve. And I have good news: pristine-tar has recently received updates that improve the situation quite a bit.

In order to be able to understand how better we are getting at it, I created a "visualization of the regression test suite results. With the help of data from there, let’s look at the improvements made since pristine-tar 1.38, which was the version included in stretch.

pristine-tar 1.39: xdelta3 by default.

This was the first release made after the stretch release, and made xdelta3 the default delta generator for newly-imported tarballs. Existing tarballs with deltas produced by xdelta are still supported, this only affects new imports.

The support for having multiple delta generator was written by Tomasz, and was already there since 1.35, but we decided to only flip the switch after using xdelta3 was supported in a stable release.

pristine-tar 1.40: improved compression heuristics

pristine-tar uses a few heuristics to produce the smaller delta possible, and this includes trying different compression options. In the release Tomasz included a contribution by Lennart Sorensen to also try the --gnu, which gretly improved the support for rsyncable gzip compressed files. We can see an example of the type of improvement we got in the regression test suite data for delta sizes for faad2_2.6.1.orig.tar.gz:

In 1.40, the delta produced from the test tarball faad2_2.6.1.orig.tar.gz went down from 800KB, almost the same size of tarball itself, to 6.8KB

pristine-tar 1.41: support for signatures

This release saw the addition of support for storage and retrieval of upstream signatures, contributed by Chris Lamb.

pristine-tar 1.42: optionally recompressing tarballs

I had this idea and wanted to try it out: most of our problems reproducing tarballs come from tarballs produced with old compressors, or from changes in compressor behavior, or from uncommon compression options being used. What if we could just recompress the tarballs before importing then? Yes, this kind of breaks the “pristine” bit of the whole business, but on the other hand, 1) the contents of the tarball are not affected, and 2) even if the initial tarball is not bit by bit the same that upstream release, at least future uploads of that same upstream version with Debian revisions can be regenerated just fine.

In some cases, as the case for the test tarball util-linux_2.30.1.orig.tar.xz, recompressing is what makes it possible to reproduce the tarball (and thus import it with pristine-tar) possible at all:

util-linux_2.30.1.orig.tar.xz can only be imported after being recompressed

In other cases, if the current heuristics can’t produce a reasonably small delta, recompressing makes a huge difference. It’s the case for mumble_1.1.8.orig.tar.gz:

with recompression, the delta produced from mumble_1.1.8.orig.tar.gz goes from 1.2MB, or 99% of the size to the original tarball, to 14.6KB, 1% of the size of original tarball

Recompressing is not enabled by default, and can be enabled by passing the --recompress option. If you are using pristine-tar via a wrapper tool like gbp-buildpackage, you can use the $PRISTINE_TAR environment variable to set options that will affect any pristine-tar invocations.

Also, even if you enable recompression, pristine-tar will only try it if the delta generations fails completely, of if the delta produced from the original tarball is too large. You can control what “too large” means by using the --recompress-threshold-bytes and --recompress-threshold-percent options. See the pristine-tar(1) manual page for details.


14 de Agosto de 2017, 15:08, por Antonio Terceiro

I’m back from Debconf17.

I gave a talk entitled “Patterns for Testing Debian Packages”, in which I presented a collection of 7 patterns I documented while pushing the Debian Continuous Integration project, and were published in a 2016 paper. Video recording and a copy of the slides are available.

I also hosted the ci/autopkgtest BoF session, in which we discussed issues around the usage of autopkgtest within Debian, the CI system, etc. Video recording is available.

Kudos for the Debconf video team for making the recordings available so quickly!

Papo Livre #1 - meios de comunicação

6 de Junho de 2017, 12:44, por Antonio Terceiro

Acabamos de lançar mais um episódio do Papo Livre: #1 – meios de comunicação.

Neste episódio eu, Paulo Santana e Thiago Mendonça discutimos os diversos meios de comunicação em comunidades de software livre. A discussão começa pelos meios mais “antigos”, como IRC e listas de discussão e chega aos mais “modernos”, passo pelo meio livre e meio proprietário Telegram, e chega à mais nova promessa nessa área, Matrix (e o seu cliente mais famoso/viável, Riot).

Debian CI: new data retention policy

28 de Maio de 2017, 21:17, por Antonio Terceiro

When I started debci for Debian CI, I went for the simplest thing that could possibly work. One of the design decisions was to use the filesystem directly for file storage. A large part of the Debian CI data is log files and test artifacts (which are just files), and using the filesystem directly for storage makes it a lot easier to handle it. The rest of the data which is structured (test history and status of packages) is stored as JSON files.

Another nice benefit of using the filesystem like this is that I get a sort of REST API for free by just exposing the file storage to the web. For example, getting the latest test status of debci itself on unstable/amd64 is as easy as:

$ curl
  "run_id": "20170528_173652",
  "package": "debci",
  "version": "1.5.1",
  "date": "2017-05-28 17:43:05",
  "status": "pass",
  "blame": [],
  "previous_status": "pass",
  "duration_seconds": "373",
  "duration_human": "0h 6m 13s",
  "message": "Tests passed, but at least one test skipped",
  "last_pass_version": "1.5.1",
  "last_pass_date": "2017-05-28 17:43:05"

Now, nothing in life is without compromises. One big disadvantage of the way debci stored its data is that there were a lot of files, which ends up using a large number of inodes in the filesystem. The current Debian CI master has more than 10 million inodes in its filesystem, and almost all of them were being used. This is clearly unsustainable.

You will notice that I said stored, because as of version 1.6, debci now implements a data retention policy: log files and test artifacts will now only be kept for a configurable amount of days (default: 180).

So there you have it: effective immediately, Debian CI will not provide logs and test artifacts older than 180 days.

If you are reporting bugs based on logs from Debian CI, please don’t hotlink the log files. Instead, make sure you download the logs in question and attach them to the bug report, because in 6 months they will be gone.

Papo Livre Podcast, episodio #0

23 de Maio de 2017, 12:28, por Antonio Terceiro

Podcasts têm sido um dos meus passatempos favoritos a um tempo. Eu acho que é um formato muito interssante, por dois motivos.

Primeiro, existem muitos podcasts com conteúdo de altíssima qualidade. Meu feed atualmente contém os seguintes (em ordem de assinatura):

Parece muito, e é. Ultimamente eu notei que estava ouvindo episódios com várias semanas de atraso, e resolvi priorizar episódios cujo tema me interessam muito e/ou que dizem respeito a temas da atualidade. Além disso desencanei de tentar escutar tudo, e passei a aceitar que vou deletar alguns itens sem escutar.

Segundo, ouvir um podcast não exige que você pare pra dar atenção total. Por exemplo, por conta de uma lesão no joelho que me levou a fazer cirurgia reconstrução de ligamento, eu estou condenado a fazer musculação para o resto da minha vida, o que é um saco. Depois que eu comecei a ouvir podcasts, eu tenho vontade de ir pra academia, porque agora isso representa o meu principal momento de ouvir podcast. Além disso, toda vez que eu preciso me deslocar sozinho pra qualquer lugar, ou fazer alguma tarefa chata mas necessária como lavar louça, eu tenho companhia.

Fazia um tempo que eu tinha vontade de fazer um podcast, e ontem oficialmente esse projeto se tornou realidade. Eu, Paulo Santana e Thiago Mendonça estamos lançando o Podcast Papo Livre onde discutiremos software livre em todos os seus aspectos.

No primeiro episódio, partindo da notícia sobre a vinda de Richard Stallman ao Brasil nas próximas semanas, discutimos as origens e alguns conceitos fundamentais do software livre.